Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Trackless - A GDPR-Friendly Google Analytics Opt-In Button (github.com/ascorbic)
49 points by ascorbic on May 27, 2018 | hide | past | favorite | 59 comments



> Consent requires a positive opt-in. Don’t use pre-ticked boxes or any other method of default consent.

[...]

> You must ask people to actively opt in. Don’t use pre-ticked boxes, opt-out boxes or other default settings.

Source:

https://ico.org.uk/for-organisations/guide-to-the-general-da...


The GDPR sets a high standard for consent. But you often won’t need consent. If consent is difficult, look for a different lawful basis. (ibid.)

Anonymous data is specifically excluded from GDPR. Google Analytics provides an IP anonymization feature. If you're absolutely confident that your users can't be personally identified based on the data being sent to Google Analytics, then you don't need consent.

https://gdpr-info.eu/recitals/no-26/

https://support.google.com/analytics/answer/2763052?hl=en

https://support.google.com/analytics/answer/6366371?hl=en&re...


There is a broader issue with web sites that incorporate external content that I haven't yet seen addressed.

The moment you load a resource on your page from an external source, you lose almost all control of what the operator of that external source does with any personal data that your visitor's browser sends to them, any cookies it sends with its reply, or what it does more generally in the case of executable resources.

Given that modern web sites routinely incorporate external assets for a multitude of reasons, has anyone ever found any official, authoritative guidance on who is the data controller or data processor in such cases, how they are expected to meet any obligations they have in terms of transparency and obtaining consent, or the related question of who is responsible for giving notifications or obtaining consent if required under the "cookie law"?


The way I understand it it's extremely simple. Say you have a site. You are the contact point with the data subject, and therefore you are the controller. Anyone you subcontract spying on your users to is a data processor acting on your behalf, and therefore you are responsible for their behavior. You should have agreements in place with all your external resource providers that touch personal data. If any personal data leaks to them, you are responsible for notifying users of this and obtaining their consent if required. In most cases for external assets where no personal data is expected to flow to the asset provider (say loading fonts from a CDN) it's sufficient for the asset provider to give you their assurance that they don't collect or store data from visitors you send their way. If you have an adverterrorist-operated spyware embed like most ad networks on your site, then it's your responsibility to ensure that the adverterrorists are handling the data in a compliant way, and you need to notify your users of your relationship and obtain their consent to pass their data to a third party. Just because you are using a third party to do the spying does not remove your responsibility.


Say you have a site. You are the contact point with the data subject, and therefore you are the controller.

But that's not what how the controller is defined in the regulations. To be the controller, you must be "the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data". If you don't even have any way to know what personal data a third party is collecting or how it's being used, and you're linking to content that is freely available but over which you have no control, you're not even close to fitting that definition.

You should have agreements in place with all your external resource providers that touch personal data.

But that fundamentally breaks most of the modern WWW, which is not a reasonable thing to do. You can't even have a personal blog linking to a jQuery CDN to expand or contract your sidebar or Google Fonts to make things look pretty at that point.

Just because you are using a third party to do the spying does not remove your responsibility.

If they're spying at your request and on your behalf, that's one thing.

But it is inherent in the technologies of the web that third parties may be doing all kinds of things without your knowledge, consent or control. Moreover, even if you have somehow satisfied yourself that there is nothing inappropriate going on when you first incorporate external content in your page by reference, there is in general no technical mechanism to guarantee that the situation will not change later. In some limited cases tools like subresource integrity can help, but they only address specific parts of the general issue.


GDPR is explicitly agnostic to technical details (recital 15). The why is at least as important as the what. See the definition of controller in Article 4:

"‘controller’ means the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data"

If you're embedding a JS library from a CDN, then you have a lawful basis for passing the IP address of your user to a third party under Art. 6(1)(f). As long as you've performed a reasonable risk assessment about this activity and have records to prove it, you should satisfy your obligations as a controller under Chapter 4. If they go rogue and add a bunch of tracking scripts to the library, they're liable. You'd still need to notify about the breach.

If you're embedding a Javascript ad unit that does a bunch of tracking, you probably don't have a legitimate interest under Art. 6(1)(f), so you'll need consent. You're intentionally passing a bunch of personal data to a third party, so your responsibilities with regards to risk assessment are far greater. You and the ad provider probably constitute joint controllers under Art. 26.

https://gdpr-info.eu/recitals/no-15/

https://gdpr-info.eu/art-6-gdpr/

https://gdpr-info.eu/chapter-4/

https://gdpr-info.eu/art-26-gdpr/

IANAL etc.


If you're embedding a JS library from a CDN, then you have a lawful basis for passing the IP address of your user to a third party under Art. 6(1)(f).

But if you're embedding a JS library from a CDN, then as a matter of fact, you aren't passing any data about your user to the third party at all. The user's browser is doing that as part of its normal operation.

Moreover, as another matter of fact, you cannot have either any knowledge or any control over what happens next regarding any personal data the third party is collecting or how it is being processed, unless you have some separate arrangement with the third party that goes well beyond mere linking or embedding.

Logically, it doesn't seem to make much sense for you to be either the controller or the processor in that instance. However, if the third party plays either role, they may have no mechanism to communicate with your site visitor to fulfil their obligations either.


The law doesn't care how a browser works. The law doesn't care that it's the browser making the HTTP request. The law doesn't care if the JS library is delivered by pixies travelling through the ether on rainbows. You put the embed in that page, so you're responsible for what happens next. You acted (intentionally or negligently) to cause the user's browser to make a HTTP request to a third party, which might cause their personal data to be unlawfully processed.

You're not absolutely and totally responsible for anything that might possibly happen under any circumstances, but you're required to implement appropriate technical and organisational measures to ensure and to be able to demonstrate that processing is performed in accordance with this Regulation (Art. 24).

If you embed a JS library from Google Cloud CDN and compile a written risk assessment with copies of Google's privacy policy for that product and their EU-US Privacy Shield and ISO 27018 certifications, you're probably fine no matter what happens.

If you embed a JS library from SuspiciousCDN.ru because someone on 4chan gave you the link and your user data ends up on WikiLeaks, you're going to have some serious explaining to do.


But as I've already noted in other comments, this heavy-handed interpretation breaks a large part of the modern web, including many common practices that are in visitors' interests and not doing anything shady at all. Expecting every instance of embedding on the entire web, even among freely available pages using freely available services, to go full lawyer not just initially but on an ongoing basis, simply isn't a reasonable approach to take. An interpretation that puts significant extra liability on those freely offered sites and services regardless of whether they are intentionally doing anything shady is how you get extreme reactions like people geoblocking the EU.


If you're fine with running untrusted third-party code on your website, that's your prerogative. The European Union will shed no tears if you choose geoblocking over responsible business practices.


There is a difference between untrusted and unverified or informal.

Do I expect visitors to my personal blog to have any security or privacy problems because I use Google Fonts to make it look nice? No.

Do I have any sort of formal agreement that is binding on Google to guarantee that, or that obliges them to notify me if they change their privacy policy in this respect? Also no.

And exactly the same applies to, for example, numerous popular JS libraries that are hosted for free on reputable CDNs.


You're still the controller b/c you control the website they landed on. Presumably the user was trying to interact with you (as the website) and you get to say what does or does not go on to it.

It really all does revolve around control: as the website owner you can't control their ISP, Browser, VPNusage, etc. but you could trivially change your site from using Google Analytics to using a different system (or disable it altogether).


You're still the controller b/c you control the website they landed on.

But the regulations don't say anything about websites. Being a controller is about whether you determine the purposes and means of processing (even if someone else is then doing that processing). Merely embedding third party content on your site doesn't even give you knowledge of that processing, never mind any control over it.

Now, if you have a formal agreement with some other service that they will process personal data for some purpose and you will embed something in your page that gives them access to that data, obviously in that situation you're acting as controller. But huge amounts of the embedding that happens in the real world don't have those formal arrangements, and if you say no-one can ever embed anything any more without formal legal agreements to protect themselves, you break a large part of the modern web both technically and culturally. I don't think that was the intent of the new regulations, nor do I think it is a reasonable thing to do.


You don't understand it. It is your site, your users. If you enable 3rd party illegal tracking of your users by ANY means, it is your responsability too. To cover your back, you need to sign a legally valid contract (or they need to send you conformation) that they respect GDPR and assess their way of doing it (at least in this early stages, as very often, they are just trying to workaround it, which puts you in danger) to be absolutely sure about them. Analyitics, ad providers, CDNs, SaaS... all of them.

Take it as, "I control the door to a bank vault, if I allow robbers in, I will be a complice to a crime as the crime couldn't be commited without your help". Negligence or direct intent, it can be costly. Assess your 3rd party sources very carefully, I have already removed GA and replaced them with local analytics (https://matomo.org/) as I can't trust them, they are trying to downplay GDPR and there is already a complaint written against them (https://noyb.eu not for GA though), and I have read the PDFs, they are right and quite objectively, they are guilty. I dont want to be in a same boat with them.


That is one possible interpretation, but like many things around the GDPR, it is not what the regulation literally says nor how the technology actually works in practice, so other reasonable interpretations are also possible. I am asking whether there is any official, authoritative guidance on this.


Look, GDPR is not about technical means, it is about a concept. If the ICO proves to you that you are conceptually violating the GDPR by enabling 3rd party to violate it and you don't have your back covered, you wont have much to defend you with. You need to have a proof that you have done everything in your power to defend your users right to privacy and you were cheated by 3rd party. This is why all the fuss about GDPR was in last 6 months, you can't downplay the concept as it isnt saying anything what "script" or "service" (or cookies as an ultimate abuse of "concept of law" and an example why GDPR was written this way) you can use or not, it is just talking about user right to privacy and for you as data contoller, it is your duty to defend it.

Yes there is a guidance, it is called GDPR, it is THE only guidance, just take the concepts, I can give you this link, it is the best I was able to find, it will help understand the GDPR, but for each and every site, owner needs to decide on its own: https://www.youtube.com/watch?v=-stjktAu-7k


Sorry, but it's not that simple. A lot of the fuss about the GDPR is because it introduces significant uncertainty combined with the potential for severe penalties if your interpretation differs from the regulators. It is not unreasonable to look for concrete, actionable guidance to reduce that uncertainty.

The modern web depends on embedding third party content for many reasons, most of which have nothing to do with invading anyone's privacy and many of which are directly in the visitor's interests. It is not helpful to undermine that whole ecosystem and expect everyone to start having formal contracts in place before they can take advantage of any of those services. Nor is it reasonable to expect services offered for free that aren't doing anything shady to take on significant liability and/or other commitments anyway through formal agreements with their users. Why would they do that, instead of just (as obviously quite a few places already have) geoblocking the EU to remove themselves from the scope of the onerous rules?


Silhuette, I am sorry, I have tryed to help you, thank others, maybe you/others will believe a lawyers in following months, but they wont be free. (And special thanks to HN, preventing me to answer with its policy of "answering too fast", I had an explanation for you, but I was unable to answer)

To the morons (no, it is not insult, it is empirical fact) downvoting me, it is not me, it is GDPR, face the reality, it is not my fault that you are too reluctant to understand it and biting people trying to help you out wont help. Downvoting me wont change GDPR or change anything, you will just loose a valuable source of information as you did just now. Go to the first psychiatrist and it will tell you that a reality will be as it is even if you close your eyes (or shoot the messenger =/).

Don't forget to upvote me, when you figure out I was right and you get a warning/fine.


We've banned this account for breaking the site guidelines.

If you don't want to be banned, you're welcome to email hn@ycombinator.com and give us reason to believe that you'll follow the rules in the future.


> Google Analytics provides an IP anonymization feature.

The &aip=1 feature - in spite of it's name - does not provide any useful anonymity! As you can see in Google's own documentation (your 2nd link), when aip=1 GA claims that "the last octet of the user IP address is set to zero".

At best this can only group your IP with the neighboring 255 addresses. Google still logs the upper 24-bits of the address, which is probably enough to discover e.g. your ASN and geolocation. In practice, IP addresses usage is not perfectly uniform, so your actual "anonymity" is less than the theoretical maximum of 1-in-256. In general, the HTTP headers, cookies, etc will have at least 8 bits of unique entropy that more than makes up for losing the least interesting 8 bits of your IPv4 address.

This feature isn't designed to provide actual anonymity. The documentation even suggests the feature was designed to minimally satisfy certain legal or contractual obligations:

>> This feature is designed to help site owners comply with their own privacy policies or, in some countries, recommendations from local data protection authorities, which may prevent the storage of full IP address information.

Notice that this mentions pre-GDPR "recommendations" and that compliance is the goal, not user anonymity.

(side note: that documentation doesn't even acknowledge IPv6. Does the aip=1 feature even exist for IPv6?)


None of that is particularly relevant.

Recital 26:

The principles of data protection should apply to any information concerning an identified or identifiable natural person... To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments.

If I give you an IP with the last octet redacted, how would you use that information to identify a natural person? If you can think of a method, how long does it take? How much does it cost? Is it reasonably likely to be used?


> how would you use that information to identify a natural person?

That depends a lot on 1) the other data that submitted in the same set of analytics events. and 2) the data found in other databases that might correlate with the data in #1.

> how long does it take?

How long does it take to run a SELECT statement that JOINs a handful of large tables? This could be any amount of time, but I suspect anybody with a lot of resources like Google can probably run this kind of query (e.g. map all analytics records to personal gmail accounts) ad-hoc in minutes. A better idea would be to integrate the correlation into the handling of analytics events.

> How much does it cost?

How much does it cost to run a large query on your DB? The only real expenses would derive from the volume of analytics events want to process per second. Mapping a single analytics event to existing databases would be approximately free.

> Is it reasonably likely to be used?

I have very little doubt that at least Google and FB do this kind of re-correlation in some situations. I have no how common the practice would be.

--

These questions suggest you might be missing just how trivial this problem is to solve. Google already has massive databases that identify a "natural person" (like a gmail account associated with a mobile telephone number for 2FA). Unrelated to GA, the databases handling regular gmail activity can store [IP addr, other TCP/IP headers, HTTP headers, accurate (~1s) timestamps] simply because your browser made a HTTP request over a TCP socket to fetch the text of your email.

With all those resources available, Google receives a GA event, notices aip=1, and dutifully sets the least significant 8 bits to 0. At that point they simply use the other 24 bits to search the recent logs for matching HTTP requests. This may already select a unique account, but in general it -probably selects about 200 to 500. (256 from the ambiguity of not using 8 bits of address, multi0plied by the average number of gmail users behind the same NATed address)

That was the easy part, which defines the real problem as finding the real account out of a selection of a few hundred. So start trying to correlate the rest of the available data. Did the GA event contain a UserAgent string that is unique with respect the few hundred in our search space? If that wasn't unique, repeat with every other HTTP header. If still not unique, try longer tuples where the entire tuple must match. Repeat for any other available data.

I could get into the interesting ways you could exploit non-random IP numbers (how does your router rewrite TCP Source Port? Do your TCP Initial Sequence Numbers reveal your OS?[1]), but that level of analysis probably isn't necessary. An important question at thi8s point is how much error is acceptable? Even if the previous searches did not result in a unique match, they probably reduced the search space down to only a handful of candidates. Start apply Bayes Theorem[3] or other statistical analysis methods; is there a match with an acceptable confidence? What about a larger network[4] of inferences?

There are many ways to approach the problem of finding the correct record out of a few hundred; I'm only sketching a fairly straightforward method. I'm sure Google and FB can do fancier things with better techniques such as machine learning. The point is that 24 bits of identifying entropy is a lot. It's already so close to being a unique identifier, constructi8ng an actual unique ID only requires adding a few bits of entropy, which probably available in the surrounding metadata and/or session data.

[1] The ISN shouldn't reveal much in modern OS. However, reading this[2] paper about how they used to be broken was really enlightening when I read it when it was originally published. The visuals demonstrate clearly how easily your difficult/random searches can collapse into a trivial search space.

[2] http://lcamtuf.coredump.cx/oldtcp/tcpseq/print.html

[3] https://en.wikipedia.org/wiki/Bayesian_inference

[4] https://en.wikipedia.org/wiki/Bayesian_network


That is exactly I was afraid of, google will have hard time defending this.

Check my post below, I would be glad if you have some idea, but as far as I am concerned, anonymising IP to keep getting uniform result is tehnically impossible.


May I ask how GA anonymizes ip address? What algorythm do they have in place as doing sha-x over 4 numbers (0-255 with skipping some) separates by dots is reversable in seconds on average pc and I wouldn't call it anonymization, rather obfuscation.

I am asking this as a friend of mine is having hard time accomplishing exactly that and is really a hard nut to crack, anonymization is by default irreversable and making such algorythm for 4 numbers (actually even less due to known ip address ranges for EU users + reserved ranges) is not simple. You can seed it but that key must remain unknown to google, while this is again getting very hard with javascript. The only way I see is sending all the data to local proxy script, anonymizing the data on your side and then sending it to GA.

I thing that if GA is doing just some hashing, this opens all the sites, using it, to a GDPR responsibility as data controllers including HN. And this can't be hidden under capet (imho) as a "I can't offer service without it" (legitimate interest).


May I ask how GA anonymizes ip address?

If you enable the Anonymize setting, the last octet (IPv4) or last 80 bits (IPv6) is set to zero by the analytics collector. The full IP is never stored or processed.

https://support.google.com/analytics/answer/2763052


They zero-out the last octet of the IP address and only process/store the first three.


Yes. While I applaud all the dialogues I have seen recently on websites allowing me to opt-out from all the tracking and personnalisation and stuff that they used to conduct covertly, it's a bit surprising that they have put this up specifically because of GDPR, but at the same time (for a very large majority of them) leave it opt-out which is clearly not compliant.


I address this in the FAQ. You can easily choose whether it's opt in or opt out.


So, a „GDPR-hostile Google Analytics Opt-Out button, unless you change the defaults“

Yes, I see why your marketing sense told you to choose the untruthful version.


I'm using it in opt-in mode myself, and I'd hope most people would too. You'll see I recommend that in the docs too. The first step is getting site owners to give any opt-in or out facility at all, which is why I didn't make that the default. I'm sorry if you think that's untruthful. I'll probably change the default based on the feedback here.


If opt-in is the only legal option, why do you default to opt-out?


The feedback here is pretty clear that I choose the wrong option, so I'll probably change it. The default is obviously to not install it, which gives no opt in or out at all.


Please consider also I have "Do not track" option turned on in my browser preferences. This should signal that I will not consent and am willing to take the consequences: even if it means you serve a "text only" version of the site.


Yes, you are right, the opt-out is violating GDPR (unless it is about changing mind after giving opt-in - this is again required and must be as easy as giving consent), you have to be preticked to "not giving consent" and user must actively click to give consent. Also you are missing explanation what giving consent means for the user including what data are used for what purpose.

Watch out with GDPR, this is not cookie law, and on top of it, you can't force it for user as a condition for entering site (like Forbes is doing - they will get a complain, already beeing finalized by some privacy organisation)


Matomo (formerly Piwik) offers an iframe you can insert in your privacy policy to handle the opt-out and opt-in.

Furthermore if the ePrivacy Regulation (ePR)[2], which was supposed to enter into force along-side the GDPR on May 2018 but was delayed, should be adopted in it's current form first party analytics like Matomo will not require consent. See [3]:

> The proposal also clarifies that no consent is needed for non-privacy intrusive cookies improving internet experience (e.g. to remember shopping cart history) or cookies used by a website to count the number of visitors.

[1] https://matomo.org/faq/general/faq_20000/

[2] https://en.wikipedia.org/wiki/EPrivacy_Regulation_(European_...

[3] https://ec.europa.eu/digital-single-market/en/proposal-epriv...


Don’t think you need this if you’re just using Google Analytics and have set it to only log anonymous IPs. You do need to have a cookie policy on your website such as this one on the ECB’s website: https://www.ecb.europa.eu/home/data-protection/html/index.en...


This is a good idea (even though it's actually not GDPR compliant by default), but I think it would be much more useful if it were a "click here to install AdBlock Origin" button.

Also, who in their right mind would click "enable Google analytics" in opt-in mode?


I'm planning on adding to one of my sites in opt-in mode. I can write a post on what that does to the analytics after it's been up for a month or so.


I would be extremely interested to read the results.


.. or any kind of privacy-related opt-in, if the opt out is readily available. Currently websites try to work around this by placing a "manage" link with countless options which makes it work to opt out, but that doesn't seem like something that will be allowed for long by the regulators.

Privacy opt ins are effectively donation buttons, and we know how well these work.


DNT is already opt-out functionality. Why force users to press extra buttons and provide localstorage? Does this even work in private browsing mode?


This is actually what the (hopefully) upcoming ePrivacy Regulation (not to be confused with the existing ePrivacy Directive) suggests [1]:

> Simpler rules on cookies: the cookie provision, which has resulted in an overload of consent requests for internet users, will be streamlined. The new rule will be more user-friendly as browser settings will provide for an easy way to accept or refuse tracking cookies and other identifiers.

[1] https://ec.europa.eu/digital-single-market/en/proposal-epriv...


Ug, legislation requesting/suggesting browser features now? Or am I misreading? I wish they'd just leave people alone.


Most people have never heard of DNT. This is an easy way to opt out in one click. If you're using private browsing you're probably also using an ad blocker, so you're fine anyway.


> GDPR-Friendly

> Opt-Out

I don't think thats how it works


It is also missing "Delete my data" and "Download my data" options.


A good point!

How do you access data that's held about you by/in Google Analytics, anyway?


Not sure about accessing collected data, but you can delete user data from Google Analytics using User Deletion API (https://developers.google.com/analytics/devguides/config/use...)


Great, another button I've to click (apart from the annoying cookie notices) before getting to the website content. If this becomes popular, I'd think about creating a browser extension that automatically opts out of this.


Instead, consider simply removing Google Analytifs from your website entirely.


There are legitimate reasons to need to know how your website is being used.


You can find them out without Google Analytics. Careful study of your HTTP logs or tools like Piwik are also suitable.


Why not just use what Google already provides? https://developers.google.com/analytics/devguides/collection...


Why would you want to opt out from Google Analytics seriously ?

They just record on what you click, and have no (and will never have) idea of who you are.


Google Anlytics provides only aggregate data to the site owners. But I think that Google themselves use it to improve their ad targeting algorithms, because why wouldn’t they?


Yes, exactly this. Even if site has enabled IP anonimity, Google still has the tracking cookies.


Yeah IPs are only used for determining location I believe.


If Advertising features in Google Analytics are enabled, then your data might be used for remarketing, also collected data might be shared with Google for benchmarking, tech. support and account specialists. You can disable this, though https://support.google.com/analytics/answer/1011397


Not necessarily true, they have a User ID feature that can tie your site's user accounts to their Google Analytics behaviour.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: