
How we built a GDPR-compliant website analytics platform without using cookies - pauljarvis
https://usefathom.com/anonymization/index.html
======
pauljarvis
We are incredibly open to any ideas, comments or concerns on how we're doing
this. This is a big step up from what we had previously, but there’s always
room for improvement. Happy to hear thoughts in the comments.

~~~
lmkg
Hi Paul, thanks for being open about this. I have a big, important question.

ICO, the agency in charge of enforcing GDPR and related legislation in
England, released guidance earlier this month on the topics of cookies. One of
the most notable parts of this guidance is that "device fingerprinting" is
treated the same as a cookie[1]. And also that website analytics requires
consent to use cookies or similar technologies[2] ("similar technologies"
including device fingerprinting).

Now, the above guidance is related to PECR rather than GDPR, which is what
your post is about. But, given the above, do you think that your software is
compliant/exempt from PECR or do you think that organizations will still have
to take extra steps to be compliant with privacy legislation?

[1] [https://ico.org.uk/for-organisations/guide-to-
pecr/guidance-...](https://ico.org.uk/for-organisations/guide-to-
pecr/guidance-on-the-use-of-cookies-and-similar-technologies/what-are-cookies-
and-similar-technologies/#cookies5)

[2] [https://ico.org.uk/for-organisations/guide-to-
pecr/guidance-...](https://ico.org.uk/for-organisations/guide-to-
pecr/guidance-on-the-use-of-cookies-and-similar-technologies/how-do-we-comply-
with-the-cookie-rules/#comply15)

~~~
orra
I think that's a fair question.

AFAICT, v1 of PECR awkwardly applies whenever the cookie is not functionally
directly necessary for the service that the user is using. PECR applies even
if, like here, the cookie is just for counting unique numbers of visitors, and
is not used for fingerprinting individuals.

The draft v2 of PECR contains an exemption for first party analytics. I think
this maybe strikes a nice balance: explicit consent would still be required
for the more-harmful third party analytics.

Not sure when v2 of PECR will happen. It is years overdue. Perhaps it is a
priority for the newly elected European Parliament and the new Commission?

~~~
krageon
What makes analytics first-party? When the first party serves them, or only if
the data never leaves machines under the direct and exclusive control of the
first party?

~~~
orra
I don't know the answer to that.

FWIW, my guess would be that the definition is fairly strict.

Now, I don't think that would prevent the first party using data processors,
but I suspect the first party would have to exercise a lot of control over the
processor. This would be in contrast to a service like Google Analytics, where
the company's control and choice is simply limited to take it or leave it.

~~~
krageon
A data processor agreement is not usually negotiated all that hard and this is
indeed not really possible with truly large companies (and let's face it,
that's where most companies get their enterprise software). Therefore I feel
that expecting a lot of control being exercised is a bit of a pipe dream.

------
unilynx
Why all the trouble with hashes - can't you just do it on the client and not
having to store any data at all?

"For tracking unique page views"

    
    
      if(!sessionStorage[location.href]) {
        sessionStorage[location.href]=1;
        navigator.sendBeacon("/unique-pagehit?" + encodeURIComponent(location.href));
      }
    

"For tracking unique site views"

    
    
      if(!sessionStorage["Hi!"]) {
        sessionStorage["Hi!"]=1;    
        navigator.sendBeacon("/unique-sitehit");
      }
    

"For tracking previous requests"

I'm not sure I fully understand what is being measured (is it session-only?).
For the duration someone watched a page, you can use sendBeacon in
onBeforeUnload. To detect a bounce, set a Math.random() in a session variable,
send it at the start of the page, and have every page load send the previously
stored random variable. Then count the unique random keys you received on the
server - those are the bounces.

I know, in practice you'll need to trim sessionStorage, sanitize URLs, use
something less-colliding than Math.random, dealing with new tabs, some
polyfills and other robustness, etc... but I don't yet see why the tracking
mentioned needs any user ids or hashing at all.

~~~
JackWritesCode
Great idea but we can't use sessionStorage under PECR, which is why we made
this move. Plus we got rid of anything being stored on the user's machine.

------
moose333
As a user of the open source version of Fathom, I'm a little concerned by the
lag in publishing this update to the community edition. I assumed development
work was happening in the open on Github, but I guess that's not the case?

~~~
JackWritesCode
Whole new language & codebase. Old developer left, we don't write Go.

~~~
pauljarvis
So it's not as easy as just pushing the update to the repo. We are still
committed to open-source, but we also have a business to run and need to make
a living here (we're two dudes who care about privacy, not a huge company with
deep pockets) :)

The community version is getting a full update soon. We just have to focus on
profit a bit (this keeps us in business and able to update the repo).

~~~
bluesmoon
Speaking as someone who built an analytics business while still maintaining an
open source version of the code, I can confirm that it's hard. Sometimes it
starts off as one small patch that only makes sense for the hosted service,
but you can't push to the OS branch until the code is refactored to protect
secrets.

I cannot give you any tips on managing your time between the two, but you may
want to consider raising your prices. Back in 2011, we ran our basic account
at $50/mo, Business at $150/mo and Enterprise at $CallMe/mo. We could probably
have upped that after a year with all the new features we'd put in, but we
were acquired before the 2y mark and dropped the first 2 plans.

We still maintain the open source version of the code, but neither of the
original founders work on it (for me it's gone back to being a hobby because
there are now people paid full time to work on it). We still get questions
about the lag in updates. We typically do quarterly bulk pushes to the open
source version now.

~~~
JackWritesCode
Thanks for sharing your story. It is hard and we have to fight to not resent
OSS because of 0.001% of the OS community. Open-source software has
contributed significantly to our lives, and we love it, so we're going to be
pursuing OS Version 2 regardless of a few angry people.

~~~
gigatexal
Armchair CEO here but charge a fair rate and if that means double do it.
People will respect it and pay it if the product or service is helping them
make money and I’d rather pay a higher amount and know I’m helping fund a tool
that I use to help me be productive than get something cheap.

------
ares2012
This is a common solution to the problem of PII, but without any information
on returning users I would argue that it's value as an analytics platform is
limited. Few are the tools where you can grow the business without knowing the
difference between a first-time and return user which is the reason cookies
were invented in the first place.

However, since such businesses already need to collect personal info as part
of your account creation it shouldn't be hard to build analytics on top of
that existing PII. If they are already collecting PII it doesn't seem to save
much to have their analytics tool avoid it?

~~~
JackWritesCode
Fathom Analytics is intentionally limited, and the limitations you point out
are 100% intentional. There are many businesses who can't use our product, but
millions that can :)

------
AndrewStephens
Most schemes of this kind are just more complicated cookies that people hope
will avoid the GDPR provisions by dint of being obfuscated.

What the article is discussing looks (at first brush) to be a sensible way of
aggregating users up-front before it hits the database, rather than later. So
no personal data is stored.

Does this meet the requirements for a site to avoid notifying users under the
GDPR? I have no idea.

Even with the best of intentions, if you use a service like this then you are
relying on them a) doing what they claim, and b) not screwing up (by leaving
logs around, etc).

If I use this service and data from my users gets leaked by Fathom, who gets
blamed? The users were on my site, so I guess it is I that gets fined. Maybe
the risk is worth it, maybe it isn't.

~~~
robgough
In response to your final question, this[1] document from the UK's ICO has
some interesting info. Essentially you're either a Data Controller (that would
be your site in this example) or a Data Processor (Fathom, in this case --
probably?!).

"64\. The ICO cannot even take action directly against a processor who is
entirely responsible for a data breach, for example by failing to deliver the
security standards the controller has required it to put into place. However,
in these cases the ICO may decide not to take any enforcement action against
the controller if it believes it has done all it can to protect the personal
data it is responsible for and to ensure the reliability of its processor, for
example through a written contract. However, whilst the ICO cannot take action
against the processor, the data controller could take its own civil action
against its data processor, for example for breach of contract."

Though it goes on to say that in some circumstances, the processor can
_become_ a controller, in which case the ICO can go after it.

[1]: [https://ico.org.uk/media/for-
organisations/documents/1546/da...](https://ico.org.uk/media/for-
organisations/documents/1546/data-controllers-and-data-processors-dp-
guidance.pdf)

~~~
JackWritesCode
And even if there was a data breach where myself & Paul were held at gunpoint
and told export the database, there's no personal data to do anything with.
Not even in our Redis queue! Our database is very boring, anonymous and
simple, and we like it that way.

------
labawi
If visits expire after 30 min, why not rotate the salt every 30 min? Keep
current and previous salt, update as needed.

I would have more faith in privacy, if you didn't store the salt in the DB or
permanent storage. If you manage to statically load-balance the users (e.g.
hash site, ip, user-agent, don't forget site), the hash could be in-memory
only. Sessions would break on server restart, but that's more of a feature.

To move thing further, you might not even need to store the hashes in the DB.
Keep them in server memory only and (real-time) update aggregate data in DB.

~~~
JackWritesCode
The visit expires 30 minutes after the visitor lands. The expiration isn't
generic.

It's an interesting idea. We have multiple servers under the load balancers,
so we'd be able to store them in Redis, but that is no better than permanent
storage, since Redis could still be breached and you'd see it with ease.

------
i_anon
Hi Jack and Paul - love what you're doing! This solution is so needed.

I wondered whether you could explain what makes your hashing different from
the hashing used by Facebook for their custom audiences tool which was deemed
unsuitable for anonymisation as per
[https://www.spiritlegal.com/en/news/details/e-commerce-
retai...](https://www.spiritlegal.com/en/news/details/e-commerce-retail-
facebook-custom-audience-not-allowed-without-consent.html)

------
mrweasel
Couldn’t people just parse the log files from their webservers?

~~~
bashy
Definitely. Helps with people who disable JS as well. I've used this before
[https://goaccess.io](https://goaccess.io)

~~~
JackWritesCode
bashy from laracasts? Hey there! And I've heard the disabled JS piece before.
How many people are disabling javascript?

~~~
bashy
Yes that's me. Hi there.

I use fathom on one of my sites and I like it. I was just replying to someone
who said people could just use access logs instead. Some don't like adding JS
to their site I guess.

Ad blockers will also block requests to the tracker.js if you use
analytics.domain for example.

------
SCLeo
Looking at their live demo
([https://stats.usefathom.com/#!p=1w&g=hour](https://stats.usefathom.com/#!p=1w&g=hour)),
I can see a lot of traffic is coming from ycombinator. So...

(I mean, I don't have a point here but I find it pretty interesting. xD)

~~~
hedora
Duck Duck Go traffic is 10% of the Google traffic.

Is that typical for privacy-focused tech sites? I would have expected a lower
percentage.

(DDG fan here, but people look at me funny when they see me using it...)

------
billabul
sorry but, isn't that a (unnecessarily complex) cookie?

~~~
Spivak
Look, entire industries exist for being complaint with the letter but not the
spirit of the law so I'm sure that this in no way meets the definition of a
cookie as far as the GPDR is concerned.

 _However_ , this is absolutely a cookie. Scraping just enough information
from the browser to create a unique but stable hash and then having the
browser compute it every request isn't at all different from that browser
information acting as the cookie.

------
vmlpvf
The data is not anonymous. Anonymity is actually very hard to claim (read
k-anonymity, differential privacy, etc).

Nevertheless, the chances of identifying someone are probably pretty low, and
it´s a good effort to make analytics more privacy friendly.

~~~
JackWritesCode
So it's practically anonymous. Nobody has enough computing power to brute it &
the data is deleted, typically, in 30 minutes.

~~~
nybble41
To de-anonymize the data you don't actually need to brute force the 256-bit
hash. If the other pieces of data are known (salt, site, page, day of year)
and you can make a shrewd guess at the user agent then you'd only need to
brute force the 32-bit IP address( _).

(_) Assuming IPv4. Obviously IPv6 addresses would be much harder to brute
force, but still easier than a 256-bit hash.

~~~
JackWritesCode
I do take your point but when we get into this area, it becomes a big question
of trust. Because if I gave you a hash, you would need to guess the 256 bit
salt in addition to all the other possibilities.

I mean, hey, here's a hash:
26246226167b9f190d3a1ce726efe07ae18bbf0480a78d19390b9aaf13f25cb0

Imagine you just got hold of it through a data breach. I'll give you $1,000 if
you can get it de-hashed before midnight Chicago time ;)

~~~
nybble41
It's true that you'd need to know the salt first, but the server _does_ know
the salt, and if you obtained the hash through a data breach then you probably
obtained the current day's salt as well.

~~~
JackWritesCode
So if someone had unlimited access to our servers, that would be a problem.
One piece to note is that page views get deleted after around 30 minutes.

So one of the reasons we posted to HN was for conversations like this. Reading
what you put makes me think we need to do more when creating the
PageRequestSignature and the SiteRequestSignature, because if someone had
access to our server, got the hash, stopped all cron jobs from processing
data, then they could work it out after a huge amount of time / computing
power. But to be honest, at this point, they could also add log($_SERVER) and
get the entire request body of a user, so we would have much bigger problems
in that scenario.

Anyway, you've given me a few new thoughts on hardening from data breaches.
Because, yes, if they know the hash & have full access to our server then it
becomes easier. So we almost need to move it to the point where they won't
know the salt that was used for a particular user. So we'd not recycle a
single salt, we'd recycle multiple salts that are based on, perhaps, the first
2-3 digits of a users IP address combined with the last 2-3 digits.... Then it
would be much harder to break without first knowing the $_SERVER dump (the
user's IP etc.). Obviously this wouldn't stop a "complete control of server"
attack where they could just start logging everything but it would really ruin
a brute forcers day because they wouldn't know what salt to start with.

What do you think of that idea? I'm running on little sleep so be nice ;) Also
thanks so much for all your feedback so far, it's so appreciated!

~~~
nybble41
The case I would be (slightly) concerned about would be where an attacker has
obtained limited, read-only access to your database, so they have the current
salt and the hashes related to the last 30 minutes of activity but not the
ability to simply log whatever data they want. At that point brute-forcing the
32-bit address space and a small set of common user agent strings would allow
them to determine the IP addresses for specific page views.

Of course, if an attacker has access to the salt and is interested in the
pages viewed by a specific _known_ IP address then this all becomes much
simpler. Then the only unknown is the user agent, which is relatively low-
entropy.

> So we'd not recycle a single salt, we'd recycle multiple salts that are
> based on, perhaps, the first 2-3 digits of a users IP address combined with
> the last 2-3 digits....

If an attacker is already brute forcing the IP address and has the ability to
obtain the salt then they would just use the correct salt for each IP address.
Making the salt depend on other data already included in the hash doesn't
change the size of the search space.

~~~
JackWritesCode
I think what’s been helpful for us with posting this here is to hear of
different ideas for how someone might hack Fathom. When we came up with it,
our starting point wasn’t “you have the salt and IP address”, go break a hash.
It was on the assumption that you don’t have the salt or IP. I think we can
improve what we’ve built. 720 salts improves resilience in a few areas but not
in the scenario you are painting here. The scenario you’re painting here has
made me think of additional ideas though.

If they had the salt, IP and user agent, they’d have to also brute force every
possible hostname and pathname, which would be insane. But I suppose they’d
only have to do a few million based on the data we have on hostnames /
pathnames...

Lots of ideas for improvement are popping into my head and I love how this
community keeps challenging you to improve things. We had feedback on Reddit
but it was much angrier!

The next step is to take your feedback and look into how we would defend
against the scenario you’ve provided. Thank you!

------
tomp
Maybe I'm missing something but (1) I don't think this is GDPR compliant, and
(2) why so complicated?

Regarding (1),

 _> Brute forcing a 256 bit hash would cost 10^44 times the Gross World
Product (GWP). [...]

> We have rendered the data anonymous to the point where we could not identify
> a natural person from the hash.

> It's possible that GDPR does not apply to Fathom since data is made
> completely anonymous. Even if GDPR did still apply, we reiterate the stance
> that there is legitimate business interest to understand how your website is
> performing._

This seems to imply a profound confusion between the difference of hashing vs.
anonymity. Just because it's hashed doesn't mean it's anonymous! You don't
need to "brute-force" the hash, you just need to find a user that matches your
hash... which is 1 in 7 billion (or so), much more tractable. This is also the
principle e.g. MD5 rainbow tables are based on...

They claim to change the hash every 24 hours, so it's equivalent to having a
session cookie with 24-hour expiration (session cookies are "anonymous" by
their definition, they don't have any user information and they're impossible
to "brute force", they "just" _enable tracking_ ). I've no idea if 24-hour
session cookies are GDPR-compliant...

Regarding (2), given that this seems (again, I might be misunderstanding)
equivalent to a 24-hour session cookie, why not just do that? However, then
you're ... drumroll ... giving control to the user. Why not just _give control
to the user, period?!_ For example, by storing the list of pages visited in
Local Storage, and only pinging the server once for each page(view) every 24
hours?

~~~
JackWritesCode
For GDPR compliance notes please see:
[https://usefathom.com/data/](https://usefathom.com/data/)

> You don't need to "brute-force" the hash, you just need to find a user that
> matches your hash... which is 1 in 7 billion (or so), much more tractable.
> This is also the principle e.g. MD5 rainbow tables are based on...

Not quite. We use a SHA256 hash as our salt, and that changes each day, so
you'd need to brute force that.

In terms of how many possible combinations there are for this salt, please
see:
[https://stackoverflow.com/a/49520766](https://stackoverflow.com/a/49520766)
\- you would need to brute force it and try each possible combination with
every single possible IP / User Agent / Site combination to break a hash. This
is why it's not theoretically impossible but it's practically impossible.

We would love to approach things in an easier way but PECR doesn't want
cookies, even anonymous ones.

Now, one thing that we have uncovered thanks to someone on here is that we
need to increase our resistance to data breaches. If someone had complete,
unlimited access to all our data / servers, including the daily salt, then
they could de-hash page views from the last 30 minutes. I have no idea how
long that would take. There are 4,294,967,296 (?) possible IP addresss, and
then over 3M (?) user agents, so it'd be an absurd, pointless exercise....
Anyway, we're going to be bringing in multiple salts that depend on the user
IP address, meaning that, in the event of a data breach, a hacker won't know
which salt has been used for a hash :) Perhaps we base the salts on the first
3 digits of an IP address? That would mean we'd have 720 possible SHA256
salts!

~~~
hedora
You can get an order of magnitude on hash collision resistance by rolling
every two hours. Maintain “two” backend databases to gracefully track sessions
between roll overs.

Also, for non GPDR IP blocks, maybe just store a per client salt in a
cookie(!) and then xor it with the rotating server salt.

~~~
JackWritesCode
I’m not worried about hash collision with sha256. Any reason why I should be?
:)

And we can’t use cookies because of PECR!

------
saagarjha
> Tracking page views alone, without visits, is completely useless and means
> that you won’t have insight into how many people visit your site / pages
> each day.

What's the difference between a page view and a visit?

~~~
JackWritesCode
If I came onto your website and refreshed one of your pages 500 times, that
would count as 500 page views but only 1 visit.

------
jacquesm
At first glance this appears to be a well thought out solution. It will never
be able to give you some of the stuff that GA can give you but that is by
design. The problem that I see is that as long as GA is able to claim they are
GDPR compliant there will be very few websites that will see this as a
necessity and so adoption will be relatively low. But, and this is just an
idea, one of the things that company could do is to proudly present a 'zero
retention' button or logo assuming they do not have other trackers on their
pages. That way it might become a distinguishing factor for the adopters and
that might drive further adoption.

Thanks for building this, I will promote it.

~~~
JackWritesCode
A fantastic idea. We've recently designed a button for websites to show users
that they care but I really love this idea of a 'zero retention' button :)

------
st3ve445678
Looks decent, but pricing is insanely high for the extremely limited set of
stats.

~~~
pauljarvis
Totally fair, as that's an opinion :) Luckily our customers are happy with the
price, and other folks use the open-source (100% free) version. Cheers!

~~~
paulgb
I actually think your prices are pretty reasonable for business use, but I
wonder if you've considered a "personal" plan of, say, $6/month for up to 10k
hits, non-commercial use only.

I've been looking for a privacy-first tracker for some personal sites for a
while, but nobody is offering pricing that makes sense at the lower end.

------
CHsurfer
I think the GDPR was enacted into law not to prevent cookies, but to prevent
collecting data on regular people. This seems to circumvent the technicalities
of the law but not the spirit. The risk is that they enact a new law that puts
even further restrictions on website operators.

I'm not sure this is a good idea.

~~~
JackWritesCode
Thanks for the concern here. We are GDPR compliant (and may be exempt from
it). See here: [https://usefathom.com/data/](https://usefathom.com/data/)

~~~
jfk13
You might like to edit the line on that policy page that refers to "the most
privacy-focused manor"... while a privacy-focused manor is an interesting
idea, I suspect you meant "manner". :)

~~~
i_anon
Equally, I'm not sure what it means to be GDPR "complaint" but I'm thinking
it's probably supposed to be "compliant" ;)

------
felixfbecker
It's so exciting that thanks to GDPR we are now seeing innovate analytics
solutions that respect privacy.

~~~
pauljarvis
Agreed!

------
EGreg
I am not sure what exactly they did here. How do they persist the hash between
requests?

My guess is they use localStorage and sending the hash to their servers with
each request.

So we are talking about a mechanism that’s just like a cookie.

As long as they don’t have any PII and can’t figure out who the user was, then
I think the GDPR gives them an exception.

But “without cookies” claim is dubious!

~~~
JackWritesCode
We have been GDPR compliant for many months but our aim here was to meet
E-Privacy demands.

We don't use localStorage... Read the blog post, we don't use cookies.

~~~
EGreg
Oh, apologies, you are right! You don't use any kind of cookie mechanism. I
just read it again:

    
    
      Random SHA256 String (daily regenerated)
      IP Address
      User Agent
      Site ID
      Day of the year
    

My only question is about this "Random SHA256 String", where is it stored
between requests?

~~~
JackWritesCode
Redis Cache. It's a Fathom-wide random string that is used to prevent rainbow
table attacks. The salt is refreshed at midnight every day.

~~~
EGreg
Thanks, makes total sense.

So basically the only drawback I see is that all employees from behind one
corporate NAT using same browser will count as one user.

But don’t see a way around that if you can only use IP and User Agent strings.

------
itronitron
why are you calling it an analytics platform when it isn't one?

------
gcbw2
This is still logging everything the GDPR says you can't without asking for
consent, but you made your search convoluted (but not less efficient if you
have all the pieces) to (suggest|lie?) that you need to break the hash and
that's why you don't need consent.

None of the information you are using on the hash wouldn't be in the search
query itself! ip, user agent, path, date, etc. So there is no way to reverse
the hash. You just hash your search query and compare in O(1) time.

The _only_ piece of information that realistically makes the hash slightly
difficult to get is the random number refreshed every day. But either you
store it (and i have no reason to believe you do not) or it make the brute
force effort trivial as I only need to generate the hash with that variable
now.

~~~
JackWritesCode
You're focused mostly on Recital 26, which was only a theory of mine, outside
of that we are GDPR compliant anyway. I likely shouldn't have included it
since that isn't our primary ground for processing. Please see:
[https://usefathom.com/data/](https://usefathom.com/data/)

And yes the daily hash gets stored until midnight. But what are you talking
about with 'search query' containing IP, user agent etc.?

~~~
gcbw2
If a search query on your data would contain all the components of the
original hash, i don't have to walk backwards and break the hash. i just have
to hash my query terms in the same way.

Also I suggested you store the daily hash forever. But even if you really
erase it every day, as you say, If you or an attacker makes the same request
every day at a predetermined time, when you/they get your logs, you/they can
use that predictable request to get the daily secret too.

I consider the information to be stored in plain text, and that you would have
to have requested permission just the same. You pretty much have an
identifiable user (via IP/UA/access time) stored in your logs.

Anonymization is removal of information, not encoding it in a convoluted hash.

~~~
JackWritesCode
So that needs to be our next target point (access logs). We want to move to a
position to keep no access lgos.

And a hacker could indeed "win" if they broke into our system, got the salt
and exported the DB. We didn't focus on this in our article, as it's
unbelievably unrealistic, but it's still possible. Our next step is to address
that.

Without the hash, it's practically impossible to brute force.

~~~
gcbw2
Not talking about a hacker. I am stating that the described hash dance offers
no exclusion from GDPR as saying "we promise we won't look" would do.

My point about brute forcing being useless, is that you hold all the
information needed to re-create the hash. All but one tiny piece that is the
random number. so brute force is a very effective O(<tiny piece size>). And
since it is stored in your locally available data, there is no rate
constraints.

~~~
JackWritesCode
> I am stating that the described hash dance offers no exclusion from GDPR as
> saying "we promise we won't look" would do.

Under your logic, you would never trust us because we could just add
$log->write(UserIp, UserAgent, Hostname, Path) in plain text. Trust is very
important and what you do with the data is important under GDPR.

And we don't hold all the information to re-create the hash, that's the thing.

I thought a lot about "Oh but you could just do this, this and this" but, no,
that argument doesn't hold. Our obligation under GDPR is what we actually do
with data.

------
kitchenkarma
This is very weak reasoning, because you cannot identify an individual by IP
either. This project looks like trying to exploit loopholes. The idea behind
GDPR is to make sure companies log only data they need. This project looks
into logging the data but without expressing why this is even necessary.
Therefore I don't think this is compliant with GDPR.

~~~
vonmoltke
> because you cannot identify an individual by IP either

Yes you can, particularly if you correlate across different websites.

~~~
kitchenkarma
You are conflating identification of a person by behaviour analysis with
matching an ID. What is the ID is irrelevant here - may as well be a hash.
That just proves my point that this project is not compliant.

