No Cookie for You

heroprotagonist · on Dec 17, 2020

This is fantastic. Thank you, GitHub.

I hope this is a good demonstration of a hands-off approach at Microsoft in regard to company culture.

I realize you likely still collect some analytics for yourself and that this change does nothing to alleviate that. EG, first party javascript. But it's great that it's divorced from 3rd parties.

Presumably Microsoft has access to those metrics, though? I wonder how deeply that gets parsed in conjunction with everything else they collect.

If only you could export some of that culture back to your corporate overlord. I'd love if MS Teams stopped exploding it's RAM usage until it eventually has to be killed if it's unable to get an OK response from its analytics endpoint.

And I'd love to turn off analytics in Windows altogether. Even getting to the minimal analytic configuration is an exercise in futility spread out across a million different settings, some of which decide to reset themselves in obfuscated ways sometimes. eg, some think updates reset them, either directly or by doing things like changing default programs to ones which require analytics (eg Office). Or a change to one setting requires additional changes elsewhere to be effective.

dessant · on Dec 17, 2020

GitHub still sends the same personal data to their own analytics endpoint, and the privacy policy which lists third-party data subprocessors [1] has not been updated.

See my comment below for details: https://news.ycombinator.com/item?id=25458635

Tracking cookies have little value for GitHub when they can collect data about users that have already been authenticated, and they send the username and user ID as part of their tracking request.

Inspect the request sent to collector.githubapp.com on every page load to see the type of personal data that is being collected on the client-side.

We have no visibility into how they associate this data with analytics data collected on the server-side, and that's where an updated privacy policy would also help.

[1] https://docs.github.com/en/free-pro-team@latest/github/site-...

dessant · on Dec 17, 2020

A GitHub spokesperson has issued this statement [1] about a request to api.github.com: "That endpoint tracks aggregate performance metrics, and does not rely on cookies or other unique identifiers".

GitHub is still sending our usernames and other unique IDs, our device data, and the pages we visit to the collector.githubapp.com endpoint.

GitHub's claims about not tracking users are false, they do identify users in tracking requests. See this tracking URL, it's full of unique identifiers and personal data, and it is currently sent after every page load, without user consent:

  https://collector.githubapp.com/github/page_view?dimensions[page]=https://github.com/&dimensions[title]
  =GitHub&dimensions[referrer]=https://github.com/sessions/two-factor&dimensions[user_agent]=Mozilla/5.0 
  (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0&dimensions[screen_resolution]=1000x518&dimensions[pixel_ratio]
  =1&dimensions[browser_resolution]=1000x518&dimensions[tz_seconds]=0&dimensions[timestamp]
  =1608247177900&dimensions[referrer]=https://github.com/sessions/two-factor&dimensions[request_id]=
  9CF8:4938:4516EA:5FD134:5FDBE77E&dimensions[visitor_id]=6475638196559144773&
  dimensions[region_edge]=fra&dimensions[region_render]=iad&&measures[performance_timing]=
  1---600-600-600-400-400-----600-0----1608247177200--1608247176900--1608247176900--400- 
  400&&dimensions[actor_id]=47727044&dimensions[actor_login]=dessantbot&dimensions[actor_hash]=
  a274a9ae03a3b361483e273a53aba70534c609670c058fe667d8bce4d6f33bad&dimensions[cid]=1507727009.1608247109

[1] https://www.theregister.com/2020/12/17/github_will_no_longer...

867-5309 · on Dec 18, 2020

this isn't about tracking users, it's about cookies. no cookies doesn't mean no tracking. it's just a workaround to improve UX. "visiting our website does not send any information to third-party analytics services" - but presumably third parties are still able to access this data on request. their privacy policy probably reflects this. if you visit a website and don't want to be tracked, make it as hard as possible for the host to do so, don't rely on what the host says. they can do anything they like with visitors' data. anyone who hosts websites will confirm this

WhyNotHugo · on Dec 18, 2020

Don't confuse "we don't set cookies" with "we don't set non-essential cookies".

They say no "non-essential" cookies, but an anonymous user just landing on the homepage gets a cookie with some unique-looking tokens.

I've seen many companies just hire the right lawyers that would sign off on all sorts of tracking cookies as "yeah, this is essential, since we can't track users without it, and tracking users is essential to our business model".

tripzilch · on Dec 18, 2020

> this isn't about tracking users, it's about cookies. no cookies doesn't mean no tracking. it's just a workaround to improve UX.

Except the GDPR and cookie directive, obviously, undeniably, unmistakably, weren't intended to give websites a "bad UX" obstacle to work around.

It's not even about cookies. It's about letting users AGREE to being tracked and then track them, OR (with the same amount of effort and without denying them service vs tracked people) DISAGREE and then not track them.

If they're still tracking me and keeping data about me that they can match to the PI that is my github account, then this "no cookie" thing is just more "letter of the law" bullshit.

I think it's pretty damn clear to Github and MS what the intention of these EU laws are. They can't just say "oh it's worded in a way that gives us wiggle room, so fuck your intentions". Well they can but they'll find out whose faces they told "fuck your intentions" to.

We're trying to protect consumers from tracking bullshit, here. Not throwing up obstacles for large corporations to work around.

dspillett · on Dec 18, 2020

> no cookies doesn't mean no tracking

It does though make tracking by third parties so they can sell things to me (or sell information about me to other parties for that use) more difficult. Not impossible though, of course.

dessant · on Dec 18, 2020

Yes, they can do anything with the user's data, if the user has consented, or if they are willing to break the law.

The tracking request you see above requires informed consent under GDPR, and GitHub does not ask for consent before collecting browsing and device data that is tied to GitHub usernames.

867-5309 · on Dec 18, 2020

consent is simple to gain, who reads the entire ToS and privacy policy?

the law is simple to break and appear as if you're not. they're a big company and will have this covered if needed

the bottom line is, do you place more trust in your local lawmakers and the website you are visiting than you do in yourself

dessant · on Dec 18, 2020

> consent is simple to gain, who reads the entire ToS and privacy policy?

That's not how informed consent works, you can't just mention the collection of personal data in a privacy policy. Consent must be explicitly requested for this type of tracking, and you must be able to reject it, and continue using the service.

> the bottom line is, do you place more trust in your local lawmakers and the website you are visiting than you do in yourself

The request can be blocked with uBlock Origin, but it's still important to draw attention to tracking that may be illegal, since not everyone has a content blocker installed.

867-5309 · on Dec 18, 2020

if you agree to terms which request consent, you are giving consent. how they are displayed to you and whether or not they are explicit enough or too hidden is subjective

you'll need a stronger arsenal than a content blocker to avoid modern fingerprinting, legal or otherwise

dessant · on Dec 18, 2020

Mentioning user tracking in a TOS or privacy policy that is mandatory to accept in order to use the service is no longer legal.

This article may help you understand what consent means under GDPR: https://www.privacypolicies.com/blog/gdpr-consent-examples/#...

eitland · on Dec 18, 2020

To add to this:

from my understanding of the rules even a lot of the informed consent popups today aren't compliant.

If I understand it correctly (and I think I am) the standard is that it should be equally easy to op out as to opt in, and the default should be opt out.

IMO this means I should just be able to dismiss any GDPR compliant box and the result should be no tracking.

merijnv · on Dec 18, 2020

Correct. Also, you cannot with hold access upon users not consenting, so there's literally zero incentive for users to ever consent for compliant providers. Which is kinda obvious with the GDPR's overall goal of making it impossible to use privacy as currency.

manigandham · on Dec 18, 2020

GDPR has lots of issues and this is one of the major ones. It can be easily argued that companies cannot be forced to service users and there has been no real precedent or enforcement around this.

FeepingCreature · on Dec 18, 2020

A company cannot be forced to service users. It can also decide to stop operating entirely, and die. A company can be forced to not use particular criteria to decide to service specific users, an idea with a long history - a common example is skin color.

manigandham · on Dec 20, 2020

This has nothing to do with immutable physical characteristics and such comparisons only highlight how silly the argument is.

Consent is a voluntary action. Usage itself is a form of consent. However a user disagreeing with what the company requires to provide that service but still being entitled to and actively using that service is not workable. User can decide to stop using a service entirely though, if they don't agree with the requirements.

merijnv · on Dec 18, 2020

You aren't forced to service users. You just cannot make consent the currency for your service. Either don't require consent or don't operate in the EU.

manigandham · on Dec 20, 2020

> "don't require consent "

That's meaningless. Usage is already a form of consent. The discrepancy is between the user and the company in what is consented. Forcing the company to provide service to the user even if the user disagrees with an upfront description of what the company requires to provide that service is a completely valid objection.

Also GDPR applies to any organization providing to citizens of the EU, not companies operating there, but that's yet another example of poor design which results in GDPR having little enforcement.

867-5309 · on Dec 18, 2020

it will appear legal if it is worded correctly, just the right side of ambiguity, proofread by a dozen lawyers and backed by a multi-million dollar body

also, to contradict your own tangential claim (from your non-authoritative link): "You _should_ ask for consent where you are offering a genuine choice over a non-essential service. Typical examples include:

-Using tracking/advertising cookies"

this document may help you understand the difference between should and must: https://www.ietf.org/rfc/rfc2119.txt

stupidcar · on Dec 18, 2020

Did you seriously just link an IETF document as the basis for an argument about the law? Never mind the difference between "should" and "must", do you understand the difference between an RFC and the law?

And there is no room for ambiguity in the actual law:

https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv%...

> Consent should be given by a clear affirmative act establishing a freely given, specific, informed and unambiguous indication of the data subject's agreement to the processing of personal data relating to him or her, such as by a written statement, including by electronic means, or an oral statement. This could include ticking a box when visiting an internet website, choosing technical settings for information society services or another statement or conduct which clearly indicates in this context the data subject's acceptance of the proposed processing of his or her personal data. Silence, pre-ticked boxes or inactivity should not therefore constitute consent. Consent should cover all processing activities carried out for the same purpose or purposes. When the processing has multiple purposes, consent should be given for all of them. If the data subject's consent is to be given following a request by electronic means, the request must be clear, concise and not unnecessarily disruptive to the use of the service for which it is provided.

867-5309 · on Dec 18, 2020

> Did you seriously just link an IETF document as the basis for an argument about the law?

of course not, it was an example to demonstrate the difference and easier to include one link for both definitions than e.g. two for each from a dictionary

> Never mind the difference between "should" and "must"

given the context I believe the difference is of paramount importance

> do you understand the difference between an RFC and the law?

slightly reworded first question but yes, I do, thanks

> And there is no room for ambiguity in the actual law: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv%...

that seems a good example for a better source which actually bolsters my point on bad sources, but alas, it's irrelevant. note that it refers to personal data and not (third time lucky) the original argument concerning tracking consent. in fact, I cannot even find any personal data in the OP's URL, probably because no personal data is required to create a GitHub account. let's just ignore that one for now

account42 · on Dec 18, 2020

Terminology in guidelines for following a new law != terminology in technical documents.

Not being able to get implicit consent by hiding some terms in a long legal document is the entire fucking point of the GDPR.

867-5309 · on Dec 18, 2020

as above

tripzilch · on Dec 18, 2020

> consent is simple to gain, who reads the entire ToS and privacy policy?

nobody because they are pretty much meaningless in the EU

we got laws to protect consumers, not laws for businesses to trick users into making some meaningless gesture

> the bottom line is, do you place more trust in your local lawmakers and the website you are visiting than you do in yourself

what do you mean by "local lawmakers"? these laws are EU-wide. or did you mean "local" to mean, "non-US"

anyway, these lawmakers are fighting the shitty corporations that pull this tracking stuff

and your bottom line is not really a choice one way or the other. I can use blockers and other plugins to protect myself, AND cheer on the people fighting the fuckfaces that think it's in any way honourable to make a profit by merely following the letter of our laws

but we got some really good consumer protections in the EU. and we try to keep it that way. we're not going to simply roll over because some US corporations are used to being able to track the hell out of US customers

nicky0 · on Dec 18, 2020

You're talking about GitHub monitoring what signed-in GitHub users do on the GitHub website, right?

dang · on Dec 18, 2020

I had to put some newlines in that monstrous link because it was breaking the page layout (sorry; it's our bug).

irrational · on Dec 17, 2020

What is the real value in a privacy policy? I assumed they were similar to EULAs - totally unenforceable. Are there actually any legal repercussions if they lie in their privacy policy? Or is it just ill will that might be accrued (and probably quickly forgotten) if they are found out to have violated their own privacy policy?

mumblemumble · on Dec 17, 2020

It sounds like you have misunderstood the purpose of a privacy policy. It is very rare that I encounter one that is designed to protect the user's privacy. Far more often, it's there to protect the company. "I have read and agree to the privacy policy," is a coded way of saying, "I have read and agree to waive my claims to privacy, as outlined in the privacy policy."

nobody9999 · on Dec 18, 2020

>Far more often, it's there to protect the company.

That's pretty much true. And why shouldn't a group try to limit their liability?

>"I have read and agree to the privacy policy," is a coded way of saying, "I have read and agree to waive my claims to privacy, as outlined in the privacy policy."

That's often, but not always true. For example, here's a [sanitized] privacy policy I wrote for a website I set up for a specific (noncommercial) purpose:

"[Site] Privacy Policy

No personal information^ will be stored on the https://www.[site] web server (except as specifically authorized), and every effort will be made to protect the integrity and privacy of such information.

[Site], its management or assignees will never sell personal information collected on this site, nor will they use such information for purposes other than specifically related to the operation of the [Site] website and/or to facilitate the dissemination of information regarding [purpose of site] and other group activities related to [potential users] and other [user purpose] related group activities.

Under no circumstances will street address or telephone number information be stored on the www.[site] by [Site], its management or assignees.

[Site], its management and assignees will never, under any circumstances reveal email addresses, street addresses and/or telephone numbers to anyone without explicit authorization. From time to time, [site] may offer services to allow [potential users] to contact each other. For these services, [Site], its management and assignees makes no warrantee of fitness for any purpose, including maintaining the privacy of users' personal information.

All personal information will be held in confidence and will only used for the purposes of the [potential users] [purpose of site] and official [membership organization] business.

This business includes (but is not limited to) providing personal information for inclusion (by the [membership organization]) in a printed work to be published at a later date. If this published work is then used for illegal and/or nuisance purposes, [Site], its management and assignees disavow any responsibility or liability for the use of that information by third parties for any purpose.

If a subscriber (limited to members of the [potential users]) chooses to share their personal information with other subscribers via any mechanism made available through the [Site] web site, mailing list or other conveyance provided by [Site], its management and assignees disavow any responsibility or liability for the use of that information by third parties for any purpose.

Under no circumstances will [Site], its management or assignees be liable or otherwise legally responsible for the theft, misuse or other unauthorized use of personal information.

Any person or entity registering on, providing contact information, or subscribing to the [Site] web site explicitly agrees to all the terms of this privacy policy.

This policy applies to the www.[Site] web site and the [Purpose of site]@[Site] mailing list.

If any portion of this policy is found, by any competent jurisdiction, to be invalid or unlawful, the remainder of this policy will continue to be in force.

The terms of this policy may be modified at any time at the discretion of [Site]. It is the responsibility of the subscriber to review the terms of this policy on a regular basis. Current versions of this policy can be found at https://www.[site]/privacy.html.

^Personal Information: Data such as street address, email address and telephone number which would enable direct contact with the subject of that information."

It does two specific things:

1. Informs users how their PII will (and will not) be used;

2. Clarifies the liability of those who own/run the site.

Unlike most "privacy" policies, there's nothing underhanded or privacy invading/data stealing involved.

I wish more privacy policies were like that.

tripzilch · on Dec 18, 2020

> And why shouldn't a group try to limit their liability?

When it's unethical to do so :)

... unrelated to your privacy policy btw, which I think is pretty good.

1vuio0pswjnm7 · on Dec 18, 2020

Violation of privacy policies alone does not give rise to a cause of action. However such violations could be useful as evidence in the context of suing on some other basis. Of course, there is no satisfactory basis to sue tech companies for violations of privacy. That is why privacy is being decimated by tech companies. There are no adequate laws to protect it. Privacy policies seem to be an effective way to placate the public. Users seem to take tech companies on their word.

tsimionescu · on Dec 17, 2020

GDPR should have given some legal teeth to privacy policies.

hetspookjee · on Dec 17, 2020

I haven't seen a lot of repercussions, if any actually. I would've figured something big would've happened right now with the antics that are up, but here we are - necessary to download an _opt-out_ extension for Google Analytics. This couldn't be a more blatant disregard for the EU laws than I could imagine. And at the same time here in the Netherlands we have the party responsible for enforing laws handing out one, almost disproportionally large fine, to a small organisation each year. Like a 800k fine to a tennis unity because they were too aggressive in their data grievance, while all the big guys are still going at it and then some. Sorry for the rant but it's hard to stay optimistic, so seeing something like Github making a good move in the right direction, and seeing the post of plausible.io on the front page, this seems like a good day on the front of privacy.

gcblkjaidfj · on Dec 17, 2020

> Tracking cookies have little value for GitHub when they can collect data about users that have already been authenticated

This is true to every advertiser or data seller, Including obvious ones like Google, FB, Amazon... and less obvious ones like your ISP, Apple, etc.

The industry call it persistent ID (as opposed to cookie, which are transient ID): https://digiday.com/marketing/wtf-persistent-id/ (random result, i do not endorse it)

The trick is: the publisher/intermediary have even more information about you, but they call you User-A instead of your name, so they can sell your history, zip, DNA, etc... just pretend not labeling the data with your name or some other personal identifiable information already listed in a Law somewhere makes everything fine.

cedilla · on Dec 18, 2020

History, ZIP and DNA already are personally identifiable information (PII). Pseudonymisation is in general not enough to avoid the GDPR and similar laws. And pseudonymisation would require the removal or obfuscation of all PII to the point that it is impossible to reconstruct the identity of the user.

There's no specific list of information regarded as PII, it's PII if it can be used to identify the user, even if only in combination of the other PII.

The GDPR is really quite broad there, other laws may be more lenient. However, the GDPR is not yet very strictly enforced or tested in court.

tripzilch · on Dec 18, 2020

> Pseudonymisation is in general not enough to avoid the GDPR and similar laws.

fortunately, "undermining the spirit of the law in order to continue to make a profit" is generally frowned upon in the EU, and lawmakers don't take too kindly to it. sometimes I get the feeling that in the US it's almost acceptable to publicly brag about doing this, like it's even more "socially" acceptable.

deepersprout · on Dec 18, 2020

> GitHub still sends the same personal data to their own analytics endpoint

I see nothing wrong with that. Analysing your users on your own site is no problem for me. I should know what users do on my property.

What's the problem you have with that?

chopin · on Dec 18, 2020

It's not GDPR compliant without consent. It doesn't matter whether you are using cookies or something else.

merijnv · on Dec 18, 2020

Why is it not GDPR compliant. You do not need consent under the GDPR. You need a (documented) "lawful basis for processing" personal information. Consent is just one of several lawful bases and honestly it's the most useless one, if you need consent your business model is screwed.

It's perfectly possible for GitHub to process personal information without explicit consent while not violating the GDPR. Several options come to mind:

1) consider analytics part of the "contract legal" basis, arguing that analytics to improve the usability of the website is a fundamental part of running a website.

2) The "legitimate interest" lawful basis, which states:

> processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject which require protection of personal data, in particular where the data subject is a child.

Arguing that improving the accessibility/usability is in the legitimate interest of both company and user.

I'm fairly confident that, depending on which and what detail of personal information, both of these justifications will be accepted by EU courts.

tripzilch · on Dec 18, 2020

> I'm fairly confident that, depending on which and what detail of personal information, both of these justifications will be accepted by EU courts.

I believe they must also show that they don't store this data strictly longer than necessary.

Which, in the case of analytics/usability would mean aggregating (and thus depersonalising) the data almost immediately.

And if they do that, it will indeed be fine. Both with the letter, as the spirit of the law.

buckmar · on Dec 17, 2020

That's a good point. Microsoft has been much less heavy handed than I expected. But your point about how the data is used, I am very curious too. I wonder if they'd be willing to make the privacy policy readable?..

einpoklum · on Dec 17, 2020

Microsoft would like for you to adopt an image of GitHub as an upright corporate endeavor - but remember they blocks/censor developers from world states that the US doesn't like.

belval · on Dec 17, 2020

Microsoft/GitHub as businesses incorporated in the United States are bound by the law of the United States. I am not sure of what you are insinuating here.

einpoklum · on Dec 17, 2020

Oh, great, they have an excuse.

Just like they have an excuse for allowing the NSA direct access to all of your data, right?

belval · on Dec 18, 2020

Yes, I don't understand this weird movement where businesses are expected to go against the government.

You disagree with your own government that's perfectly fine, and for the record I agree with you on the issues themselves, but if you want embargoes against Iran to be lifted or for the NSA to stop hoarding Americans' data you have to do the boring work of convincing the people to vote for people who share those ideas.

Real change will not come from corporations, it simply cannot because their mission if profitability, they support movements if there is no financial risk to do so.

suyash · on Dec 17, 2020

Good job, can more companies follow the lead now? Btw when I see that banner - I always reject the option and still have not experienced any bad experience from website.

an_opabinia · on Dec 17, 2020

If you don’t have to actually make money though, there isn’t really a point to the analytics third parties enable - eliminating bots and click fraud. Microsoft managers are incentivized to not identify bot or noise traffic, since their performance metrics do not separate those.

6510 · on Dec 17, 2020

It is harder than that. I would for example like to use google maps api - but I cant.

boogies · on Dec 18, 2020

> I hope this is a good demonstration of a hands-off approach at Microsoft in regard to company culture.

I might 1‰ buy that if they restored the Widevine repos they snuck down for Google under cover of the controversy caused by complying with the MS-funded RIAA’s quasi-legal youtube-dl takedown request.

natfriedman · on Dec 18, 2020

(GitHub CEO)

Hi everyone, thanks for all the enthusiasm about this change. We are happy to have removed cookie banners from GitHub, and not to participate in third-party tracking of user behavior.

Our privacy policies and subprocessor list will be updated next week following our customary 30 day user notice period. We do this in the open in a pull request, so you can see the changes now:

https://github.com/github/site-policy/pull/336

tom_mellior · on Dec 18, 2020

> We are happy to have removed cookie banners from GitHub

I'm a regular visitor to GitHub from the EU, most of the time not logged in and in private browsing mode, so I usually appear like a completely new entity that hasn't consented to anything. I only started noticing cookie banners on GitHub in the last month or two.

So... in the past, did you not have cookie banners because you didn't have tracking cookies until recently, and all this is a big publicity stunt? Or were you breaking the law up until a month or two ago by having tracking cookies but not asking for my consent?

dessant · on Dec 18, 2020

Hi! Please also look into the collector.githubapp.com analytics endpoint, the request does not seem to be compatible with GDPR in its current form. Either unique IDs tied to the user will have to be removed, or express consent will have to be requested.

https://news.ycombinator.com/item?id=25461825

merijnv · on Dec 18, 2020

This is just not true. See my comment elsewhere in this thread:

Why is it not GDPR compliant. You do not need consent under the GDPR. You need a (documented) "lawful basis for processing" personal information. Consent is just one of several lawful bases and honestly it's the most useless one, if you need consent your business model is screwed. It's perfectly possible for GitHub to process personal information without explicit consent while not violating the GDPR. Several options come to mind:

1) consider analytics part of the "contract legal" basis, arguing that analytics to improve the usability of the website is a fundamental part of running a website.

2) The "legitimate interest" lawful basis, which states:

> processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject which require protection of personal data, in particular where the data subject is a child.

Arguing that improving the accessibility/usability is in the legitimate interest of both company and user.

I'm fairly confident that, depending on which and what detail of personal information, both of these justifications will be accepted by EU courts.

tjoff · on Dec 18, 2020

1) consider analytics part of the "contract legal" basis, arguing that analytics to improve the usability of the website is a fundamental part of running a website.

Sure, you can argue that, but it has no merit.

The only reason that you can write that sentence with a straight face is due to the current affairs of the web. You know the thing that GDPR tries to rectify.

And analytics do not need personally identifiable information.

Try the three-part test suggested here: https://ico.org.uk/for-organisations/guide-to-data-protectio...

Purpose test: You can argue it has legitimate interest. And with a big enough loop-hole it might even pass despite it having no merit.

Necessity test: Absolutely not.

Balancing test: No chance.

pvtmert · on Dec 18, 2020

This should be pinned :)

graposaymaname · on Dec 18, 2020

Kudos! This a wonderful step. Hope more companies follow suit!

dessant · on Dec 17, 2020

Until now GitHub has sent client-side requests to Google Analytics with a client ID that was also sent in a second client-side request to an in-house analytics API at GitHub for augmenting and cross-referencing user data.

The client-side Google Analytics request no longer appears to be sent, but a request containing personal data is still sent to collector.githubapp.com.

The privacy policy page which lists third party data subprocessors and cookies used on GitHub [1] seems to be outdated. Does the announced change also mean that Google Analytics and other subprocessors have been eliminated, or has some of the tracking merely moved server-side?

[1] https://docs.github.com/en/free-pro-team@latest/github/site-...

spinningslate · on Dec 17, 2020

Came here to say this. Eliminating Google analytics is unequivocally a good thing. A strong B+ assessment. But the blog doesn't say anything about eliminating _tracking_. Personally, I can live with analysis that's used solely for product improvement. If that's all github is doing, then the score goes up to an A.

But if they're siphoning off data for any other purposes - whether passing to the mother ship or otherwise - then the score is down to D-. The intent of EU cookie regulation is to make tracking and data collection transparent and opt in. Changing the implementation to server side might comply with the letter of the law but it violates the spirit.

It would be good if GitHub would clarify their position.

natfriedman · on Dec 18, 2020

This is all detailed in our updated privacy policy: https://github.com/github/site-policy/pull/336

dessant · on Dec 18, 2020

Thanks for the update. It appears the data will continue to be sent to Google Analytics from the backend.

iliasbartolini · on Dec 19, 2020

Hi natfriedman, thanks for the transparency.

Removing Google Analytics is a good thing, thanks for that. I also appreciate that you use DoNotTrack to give users a choice (even if this is not available on Safari any more).

As it is explained the privacy policy. Basically you now use the same cookie "_octo" both for session management and first part tracking: https://github.com/github/site-policy/pull/336/files#diff-8b...

EU guidelines require that you offer granularity of choice for different “processing purposes”. See in this "Guidelines on consent under Regulation 2016/679" https://edpb.europa.eu/sites/edpb/files/files/file1/edpb_gui...

In section "3.1.3 Granularity" paragraph #44. "If the controller has conflated several purposes for processing and has not attempted to seek separate consent for each purpose, there is a lack of freedom. [...] When data processing is done in pursuit of several purposes, the solution to comply with the conditions for valid consent lies in granularity, i.e. the separation of these purposes and obtaining consent for each purpose"

You grouped cookies together and removed granularity of choice. I think this is against the spirit of the regulation.

Overall I think the the change is positive, but grouping cookies to avoid a banner is still against the regulation.

Jnr · on Dec 18, 2020

I know it is not a site wide issue and is probably a minor thing, but have you also looked into tracking done by embedded media, like embedded Youtube videos?

In case of Youtube there is a no-cookie domain that can be used for embedding and then, while client still sends the request to them, no additional cookie is set.

spinningslate · on Dec 18, 2020

Thanks for responding Nat. My interpretation from the PR:

You've stopped using cookies as a mechanism for marketing/tracking. But you're still doing it by other means.

Rationale:

1. You are still tracking and may share the data with 3rd parties. Justification: privacy statement [0] line 147. It states that data are "aggregated, non-personally identified" which might mean it's GDPR compliant. OTOH: you're presumably holding the non-aggregated data for aggregation purposes in the first place. IANAL but I think that needs consent. I don't know CCPA well enough to comment.

2. The sub-processors statement [2] says this includes Google (and Google Analytics specifically), LinkedIn and Eloqua (marketing analytics firm) among others.

3. Cookies and - possibly? - other client-side technologies are only used for running and improving the service. Justification: Line 238 in the privacy statement. (There's a typo in there btw: "complie" should be "compile" I think).

3. You respect DNT - line 244. Which, presumably, means you do not track user behaviour on any sites other than github? i.e. those in which you are a 3rd party.

4. However, the privacy statement line 31 [1] states: "GitHub may also collect User Personal Information from third parties." Interpretation: whilst you're not collecting 3rd party personal information using cookies, you are (or "may") do so through other means.

[0]: https://github.com/github/site-policy/pull/336/commits/fe1b6...

--

EDIT: clarified GDPR point.

[1]: https://github.com/github/site-policy/pull/336/commits/79a99...

[2]: https://github.com/github/site-policy/pull/336/commits/e98e3...

tripzilch · on Dec 18, 2020

> OTOH: you're presumably holding the non-aggregated data for aggregation purposes in the first place. IANAL but I think that needs consent.

I believe this is actually fine if they can show they don't hold this data longer than necessary and have a process for destroying it in a timely fashion.

But IANAL either

spinningslate · on Dec 18, 2020

thanks, didn't know that.

spinningslate · on Dec 18, 2020

to downvoters: can you explain why? That's not passive-aggressive, just desire to understand. The intent was an objective analysis of the revised privacy wording from the perspective of tracking. Do you disagree with the conclusions? Something else? Thanks.

feanaro · on Dec 17, 2020

Why do you think changing the implementation to the server side will comply with the letter of the law? Or perhaps I should ask which law.

GDPR doesn't differentiate between the client side or server side, you're simply not allowed to keep information on users unless they've consented to for it to be kept or it is required for a legitimate functionality to which they have consented.

adrr · on Dec 17, 2020

So I am not allowed to keep server logs without consent?

iamacyborg · on Dec 18, 2020

If you're processing server logs for marketing purposes, then no, you need consent to do that.

You also should be trying to scrub IP adresses from those logs as that counts as PII.

M2Ys4U · on Dec 18, 2020

>You also should be trying to scrub IP adresses from those logs as that counts as PII.

That counts as personal data.

The GDPR doesn't care about "PII", as that is a US legal term and not something defined or references in EU law.

iamacyborg · on Dec 18, 2020

Fair point.

feanaro · on Dec 17, 2020

That depends entirely on the nature of the information that is contained in your server logs.

rcxdude · on Dec 18, 2020

And the purposes for which those logs are being used.

TedDoesntTalk · on Dec 18, 2020

The "cookie law" that created the cookie banner mess predates GDPR by some years.

bcrosby95 · on Dec 17, 2020

Isn't using the data for product improvement still tracking? I don't personally care, but I'm not sure if the GDPR does.

dessant · on Dec 17, 2020

I've been planning to post about this issue a couple of months ago, when I've noticed that GitHub was sending personal data collected on the client-side to Google Analytics without user consent.

Then recently they have introduced a consent popup, which was actually one of the most refreshing cookie consent popups I have ever seen: it contained two buttons, Accept and Reject. This popup has now been removed.

I think a lot of developers may have clicked on Reject, though removing tracking cookies will not absolve GitHub of the requirement to continue asking for informed and free consent under GDPR, if they will continue to process personal data, and share it with third-party services from their servers.

The current website code points to the same data still being sent to GitHub, only the implementation has changed.

nicky0 · on Dec 18, 2020

When you say GitHub is collecting personal data in a JS request, what data do you mean exactly? IP address, browser, screen resolution type stuff?

franga2000 · on Dec 17, 2020

I love it! As a web developer, cookie warnings infuriate me probably more than they should as at least half of the time they aren't actually either required (only essential) or effective (doesn't actually compy, just annoys).

I've had clients straight up demand I should add an ugly cookie warning to the beautiful site I spent a month designing "because it's the law". Then, when I asked them to provide a full privacy policy to go with it, I've often gotten the response to "just leave it empty, nobody actually reads that". Thankfully, I'm stubborn enough to have always been successful in convincing them that maaaybe they should listen to the person who does this stff for a living and not a sensationalist Medium article...

Yaggo · on Dec 18, 2020

I wish browsers had built-in mechanism for showing the cookie banners. After all, cookies are just an HTTP header sent from server and it's up to the user-agent to handle it.

There could be a standard header such as cookie-privacy-policy which would point to url containing the policy in standadrd format (html?) and the browser could show it in standard way (by user's settings). Personally I would be happy with just a little "privacy policy" icon in url bar, similar to https lock icon and reader view icon (in Safari).

niutech · on Dec 18, 2020

Back in the days there was the P3P protocol (https://en.wikipedia.org/wiki/P3P) supported by IE and Edge, but it didn't work out and was abandoned.

There is also `Do Not Track` header but it is not respected by most of websites.

You can also reject all cookies in any web browser, but then majority of web pages will not work properly.

yason · on Dec 18, 2020

I accept but don't save any cookies except certain whitelisted ones.

So I get a lot of cookie policy banners and I always click the full 'accept all' option because at best it'll just eat into their database storage and I'll arrive with no stored cookies the next time I visit the site.

The browser allows me to accept all cookies or non-third-party cookies automatically but I still get these stupid cookie policy banners that cover half the screen at the worst.

I'd really like a standardized way to accept all cookie policies with no questions asked.

(And, for the matter, something that automatically says 'no' each and every time a site decides that the best first thing to do is to ask me to give some feedback of the site before I've even used the said site.)

Yaggo · on Dec 18, 2020

> I accept but don't save any cookies except certain whitelisted ones.

That's basically what happens in private mode (incognito), I guess. Would be nice if browsers used private mode by default, and you could "whitelist" certain sites you trust / want to remember your login.

niutech · on Dec 18, 2020

This is not what most people would like. But you can tell your browser not to save any cookies except some whitelisted sites, e.g. in Chrome: https://support.google.com/chrome/answer/95647?co=GENIE.Plat...

niutech · on Dec 18, 2020

Just install this extension: https://www.i-dont-care-about-cookies.eu/

Yaggo · on Dec 18, 2020

Sure, there exists an extension for pretty much everything, but it's not an ideal situation that you need to install an extension for stuff like this.

Also, having too many extensions slows down the browser (because they need to parse/manipulate DOM) and extensions themselves are also a security/privacy risk and finding the good ones for every browser can be tedious.

Besides, my mom has no idea what's "a browser extension".

niutech · on Dec 18, 2020

So tell her what it is :)

Most people just need one extension: uBlock Origin (or built-in Opera/Brave adblock) with a filter list from prebake.eu. No more ads and cookie banners. Easy as that.

pstch · on Dec 19, 2020

I'm using this setup, and I still get cookie/ToS banners all the time, especially using Google (I think I'm accepting their new terms of service 4/5 times each day).

niutech · on Dec 19, 2020

Just add these filters to uBlock Origin:

    www.google.com###lb
    www.google.com##html:style(overflow-y: visible !important;)

gnud · on Dec 18, 2020

Quite a lot of "cookie banners" are really banners to allow third parties to track you.

Under GDPR, this requires a clear, unambiguous consent, freely given. How can you understand what you consent to if you blanket-accept everything? And thus the consent is invalid. And they need a new banner.

cedilla · on Dec 18, 2020

Some News sites literally ask for consent to over a thousand purposes in dozens of categories. Ist's really wild top assume that that's consent, informered or otherwise.

tripzilch · on Dec 18, 2020

Oh but that behaviour is actually pretty clearly not compliant with the EU cookie law. It just hasn't been enforced (which isn't great).

They're not allowed to make it harder to withdraw consent than to give it.

I've also found, on the few times I humoured their "consent" system, found that each of these "tracking providers" (?) needed to make a request to a different domain to withdraw consent, and some of them simply wouldn't load.

will4274 · on Dec 19, 2020

To be clear, P3P didn't work because Mozilla and Google and poured gasoline on it, and then Facebook lit a match. Had competing browsers not been desperate to brand it as some sort of weird proprietary Micro$oft thing, we might have a better version of it today (as happened to most features of that era).

Yaggo · on Dec 18, 2020

> There is also `Do Not Track` header but it is not respected by most of websites.

The naivety of this approach almost makes me laugh. I mean, it's good intention, but really we cannot just trust the "bad" party. Active client-side measures are needed (e.g. as Safari does).

niutech · on Dec 18, 2020

You can install uBlock Origin and disable all third-party cookies in almost every web browser.

FriedrichN · on Dec 17, 2020

This is great. My experience is that many people claim to want analytics for their website but end up looking at it a couple of times and then never using it again. Meanwhile they're sponsoring and bolstering the position of internet tracking giants who - despite their claims - have no regard for user privacy.

Just sell your product instead of wasting time and money on bike shedding your website with whatever you believe is going to "skyrocket your sales".

vidarh · on Dec 17, 2020

And when people want analytics they often just really want headline numbers that do not require user tracking.

E.g. I've started using Fathom [1]. It's very basic, but for the sites I use it on all I really want to know is if traffic is going up or down or if any specific pages are suddenly getting lots of traffic.

[1] https://usefathom.com

markdown · on Dec 17, 2020

No pricing page, and jesus, not even a menu. They want us to sign in before they'll show us pricing? fuck right off with that bullshit.

vidarh · on Dec 17, 2020

The pricing page [1] is linked further down, and from the sitemap at the bottom of the page. You have a point they should have made it more obvious though.

https://usefathom.com/pricing

markdown · on Dec 18, 2020

Thanks.

Looks like they're only targeting large websites. I've never build a 100k/month traffic website, just lots of (maybe 300+) mom and pop websites.

Wish there was a $12/yr plan for websites that are lucky to get 500 visits a month.

vidarh · on Dec 18, 2020

You can use one subscription for however many sites you want (well,they use a drop-down to select site, so might not scale to huge numbers), and the pageview count applies to the total, so if you're the one who need the stats for all the sites, it can still work quite well.

But I agree, it'd be nice with a smaller option.

Seirdy · on Dec 17, 2020

I'd go a step further, and say that most sites don't need cookies either. I wrote about it elsewhere a few days ago:

> Avoid having to put annoying EU cookie consent dialogs on your website with one weird trick: > Don't use cookies on your website.

> If you want users to be able to sign in to access your premium paywalled/onlyfans content, put that on a subdomain that has cookies and requires login.

> (yes this isn't applicable to all websites with cookies; it's just a nice idea worth considering)

=> https://pleroma.envs.net/notice/A1sZxGnSQ2Oi0oWMy0 originally written here

nicky0 · on Dec 18, 2020

You don't even need cookie consent warnings for login cookies. Just for "drive by" ones.

Slikey · on Dec 18, 2020

Sadly this is often unfeasible if your site relies on reCAPTCHA. Are there any cookie-free GDPR compliant captcha options?

3pt14159 · on Dec 17, 2020

This is great! GitHub continues to, somehow, surprise me.

One question I do have, however, is whether or not the new homepage[0] which shows where people are when they open a PR actually reveals their present location. In the few samples I checked it did not seem that the presence of the person indicated matched their bio's location settings. If it is truly unmasking people's location I think it should be opt-in only, since it is private information. An employer or state may have issues with someone opening a PR from a specific country at a specific time, for example.

[0] It may be required to open this in an incognito browser.

sjs382 · on Dec 17, 2020

I spot-tested a three of them and for each one I tested, it matched the information from the user's GitHub profile.

The users might not know it's being used for marketing on the home page, but it seems to be (again, just spot-tested) information that they provided for their /public/ profile.

Edit: NEVERMIND! Just checked a fourth and code from someone with "United States" in their profile showed as coming from Minneapolis.

TheDong · on Dec 17, 2020

The data is available unauthenticated here: https://github.com/webgl-globe/data/data.json

If anyone wants to do any further analytics on it, it's easy enough to pull PRs and lat/lon from that.

It does look like lat/lon might be a fixed value for each city (from spot checking a couple). If it's not, that would be surprising and a pretty egregious leak of user info.

natfriedman · on Dec 18, 2020

Those locations come entirely from public profile bios, as provided by the user.

3pt14159 · on Dec 18, 2020

Then why do they sometimes not match?

renewiltord · on Dec 17, 2020

I thought that used the bio. Interesting that it didn't match.

jorams · on Dec 17, 2020

If they're not using Google Analytics anymore, it's probably time to remove the request to 'gascrolldepth.js' from the blog as well.

finishtime · on Dec 18, 2020

They are probably still leaving that option for themselves though

dry_soup · on Dec 17, 2020

A lot of people have the misconception that the EU cookie law applies to all cookies, but as the blog post correctly points out, that just isn't the case.

geek_at · on Dec 17, 2020

True. Also even if you do track your visitors you can use privacy friendly (and ideally selfhostable) Analytics like Plausible https://plausible.io/ so you won't need the banners either.

Just don't include facebook like buttons or any of these widgets

Wowfunhappy · on Dec 17, 2020

Does anyone happen to know of a service like this that is free (not self hosted) for non-commercial, low-traffic sites? Or which costs less than ~$10 per year.

I have a basic Github Pages site, and I currently don't know whether anyone is looking at it, beyond the very few who take the time to email me. I don't need (or want) to know anything about my visitors, but it would be nice to know that I'm not simply tossing stuff into the ether.

m1245 · on Dec 17, 2020

> Does anyone happen to know of a service like this that is free (not self hosted) for non-commercial, low-traffic sites?

Panelbear is privacy-friendly, and has a free plan with 5,000 page views per month. Commercial use is allowed.

https://panelbear.com

Full-disclosure: I’m running this service. Feel free to ask me anything :)

trungdq88 · on Dec 18, 2020

Panelbear is great, I'm using it. I have a small website with < 1000 page views a month and the free plan of planelbear is perfect!

Thank you for providing this service.

wussboy · on Dec 18, 2020

Exactly what I was looking for. Thank you.

bbellini · on Dec 17, 2020

I would recommend GoatCounter.

https://www.goatcounter.com/

Jaruzel · on Dec 18, 2020

And there was me hoping for one of those old-school page-counter.gifs, but with numbers made out of goats...

alexchamberlain · on Dec 17, 2020

I thought that tracking cookies needed permissions regardless of whether they were first party or 3rd party?

cuu508 · on Dec 17, 2020

Parent commenter says:

> I currently don't know whether anyone is looking at it

You don't need tracking cookies to track simple metrics like pageview numbers.

Jaruzel · on Dec 18, 2020

EU law states that you have to disclose you are using ANY cookies that are NOT REQUIRED for the correct functioning of the site (for the end user).

So yes, use of tracking cookies, first or third party, would require a Cookie Consent Banner.

oauea · on Dec 17, 2020

> not self hosted

you'll need a cookie banner then

ildon · on Dec 17, 2020

Not necessarily. Only if personal data is collected by the third party.

cameronh90 · on Dec 18, 2020

I thought that with the recent changes to PECR they that clarified any non-essential cookie-like technology needs permission, irrespective of whether it's first party or pseudonymous. And additionally that analytics does not count as strictly necessary.

That seems to be the advice of the UK ICO: https://ico.org.uk/for-organisations/guide-to-pecr/cookies-a...

M2Ys4U · on Dec 18, 2020

You need to notify users, and give them an opt-out, if the cookies are not strictly necessary for the provision of the service.

Analytics cookies are not strictly necessary.

Wowfunhappy · on Dec 18, 2020

All I want to see are pageviews. That shouldn’t require cookies/fingerprints.

oauea · on Dec 19, 2020

It shouldn't, but nowadays it always does.

Alternatively your pagecount will shoot to the millions if you have someone holding f5.

ljm · on Dec 17, 2020

To be fair, most of them probably do. It's not like the introduction of GDPR in Europe 2 years ago suddenly made all of the shit a marketing dept shoves into Google Tag Manager completely legit and above board.

These third parties will take what you give them and _also_ take what they can get from your browser if you're embedding their script. Are you going to proxy those scripts as well to stop them getting the user's IP address and then geolocating it to grab even more info?

The cookie warning banner is bullshit only in the sense that it achieves nothing. Accept it or deny it, it won't change a thing. Same with the tracking consent popups: despite the law saying they should be opt-in by default, they're still treated as opt-out by default, meaning that all of these sites _still_ collect your data because you're blacklisting individual sites from tracking, as opposed to whitelisting them. You need to set a cookie to say that you don't want tracking and not thousands of cookies to say you do want it?

That's being tracked... it's all wrong. Literally everything you offer as information, or don't offer, is another node in their graph.

nullsense · on Dec 17, 2020

It really shits me that a lot of them you can't even deny it. They just have a button like "I understand".

WTF is that...

ryukafalz · on Dec 17, 2020

Or they treat continuing to use the site as consent. Some of them are really passive-aggressive about it too. I've seen cookie banners with wording like "We use cookies, because duh, who doesn't in 2020? Click here or keep using the site to accept."

Completely at odds with the whole "informed consent" thing.

ljm · on Dec 18, 2020

And then they wonder why we use things like uBlock, which are pretty much the only tools we can rely on to genuinely revoke consent. Or revoke as much of it as possible.

ignoramous · on Dec 17, 2020

https://www.cloudflare.com/web-analytics/

blaisio · on Dec 17, 2020

I have nothing directly against cloudflare but I think it would be better to try to support one of the smaller analytics companies if possible. They are the ones who made products that got big companies like cloudflare interested in the space.

mixmastamyk · on Dec 17, 2020

An analytics service designed to add value to another product and does not need to be profitable in itself sounds like the best kind to me.

fs111 · on Dec 17, 2020

"We are democratizing web-analytics" wow, really? Well the people have voted and they want no analytics at all. Thank you very much.

Wowfunhappy · on Dec 17, 2020

Oh, this is perfect, thank you!

krlx · on Dec 17, 2020

I developed for myself krlx.fr/feu-analytics/ for exactly the same scenario.

It is self-hosted but on firebase and taking advantage of the free tier. Of course there is no personal data collected at any point.

There are still improvement to do, but as it works perfectly for me I have not be able to gather enough motivation to do that.

dfreire · on Dec 17, 2020

You can probably find some at https://github.com/onurakpolat/awesome-analytics

MarekKnapek · on Dec 17, 2020

Few years back I created some HelloWorld application on Google's AppEngine (requires Java, Python or Go) and was positively surprised about its statistics on theirs dashboard.

type0 · on Dec 17, 2020

I was also surprised but the number of different dumb bots that had tried to brute-force our app engine site on /wp-login.php

and it wasn't even running on wordpress

neilparikh · on Dec 17, 2020

I get requests to /wp-login.php (and the like) on my simple Haskell web app hosted on my university's servers. They're quite persistent and I'm not even sure how found the URL to my app in the first place (the format is something like universityname.com/~userid/projectname, and I haven't linked it anywhere).

camkego · on Dec 17, 2020

https://simpleanalytics.com/ Says this on their homepage: We don't use cookies or collect any personal data. So no cookie banners, GDPR, CCPA, or PECR to worry about.

Seems like a cool company/project to me.

But, it's not free :( $19/mo Still thought it's worth pointing out.

hydroxideOH- · on Dec 17, 2020

Netlify Analytics is $9/mo

Wowfunhappy · on Dec 18, 2020

My approximate budget was $10 per year, not month! :)

hydroxideOH- · on Dec 21, 2020

Fair enough! Probably not going to happen without self hosting.

elliekelly · on Dec 17, 2020

Make the visitor counter great again!

tuna-piano · on Dec 17, 2020

Looked for a few minutes and couldn't find the full answer. How does Plausible calculate unique users if it can't store some type of identifier on the page?

I see this... "We do not generate any persistent identifiers either. We generate a random string of letters and numbers that is used to calculate unique visitors on a website and we reset this string once per day."

But where is that ID stored?

marvinblum · on Dec 17, 2020

Probably like we do it for pirsch.io, by calculating a hashed fingerprint and throwing away the individual page hits once per day: https://github.com/pirsch-analytics/pirsch

YetAnotherNick · on Dec 17, 2020

What's the privacy benefit over storing a tracking cookie with expiry of a day? If at all, random cookie seems better for privacy as in your case if someone really wants it, they can recover the IP if the user agent is not rare by searching for all IP(4 billion IPv4), User-Agent(100 for popular browsers), the date(1 day as date is stored separately), and a salt(known to server), easily within reach of anyone.

marvinblum · on Dec 17, 2020

It doesn't use cookies. Fingerprints are calculated on each page hit.

The salt must be treated like a password to make sure it's not that easy to brute force it and no one should get access to your database of course ;) It's not the strongest anonymization, but good enough considering that the hits will be deleted once a day by batch processing.

tuna-piano · on Dec 17, 2020

Seems like a good method and actually more accurate than they do... seems like they just do a hash of IP.

marvinblum · on Dec 17, 2020

Hmm I think I've read something about it elsewhere and they also use more parameters than just the IP. Not sure.

abdullahkhalids · on Dec 17, 2020

> How can Plausible Analytics count unique visitors without cookies?

> So if you don’t use cookies how do you count the number of website visitors and report on metrics such as the number of unique users?

> Instead of tagging users with cookies, we count the number of unique IP addresses that accessed your website. Counting IP addresses is an old-school method that was used before the modern age of JavaScript snippets and tracking cookies.

> Since IP addresses are considered personal data under GDPR, we anonymize them using a one-way cryptographic hash function. This generates a random string of letters and numbers that is used to calculate unique visitor numbers for the day. Old salts are deleted to avoid the possibility of linking visitor information from one day to the next. We never store IP addresses in our database or logs.

...

> In our testing, using IP addresses to count visitors is remarkably accurate when compared to using a cookie. Total unique visitor counts were within 10% error range with IP-based counting usually showing lower numbers.

From here: https://plausible.io/blog/google-analytics-cookies#can-you-g...

jedberg · on Dec 17, 2020

A one way hash of an IPv4 address is no more private than the address itself. If you know the has algorithm, you can build a rainbow table of all the hashes in under a second. Even with a random salt it doesn't take long to build a rainbow table with all possible salts.

wpietri · on Dec 17, 2020

Doesn't that depend on the size of the salt?

jedberg · on Dec 17, 2020

To an extent, but there are easy ways to cut the search space. For example, you could make a unique request with garbage on it from a known IP every day, and then all you have to do is build a rainbow table for that one IP to find out what the salt is for each day, and then you can fully reconstruct the logs.

mattlondon · on Dec 17, 2020

If the salt is a random 64bit number (for example) then "finding out" the salt is not trivial.

wpietri · on Dec 17, 2020

And unless I'm missing something, it seems easy to add plenty of bits to the salt until it's no longer practical to reverse.

YetAnotherNick · on Dec 17, 2020

@mattlondon: The salt is known to plausible, that is the only way someone can hash it.

mh- · on Dec 17, 2020

This would be woefully inaccurate for websites with a large amount of mobile traffic (because of CGNAT), or university traffic, or etc.

icefo · on Dec 17, 2020

Don't universities have a huge number of IPs because they were the first to use internet ?

Mine gives one public ipv4 per device that access the internet on the network (with some exceptions). Strategies varies but if you have a lot of addresses why not use them.

jannes · on Dec 18, 2020

That might be true for some US universities, but it's definitely not true for the rest of the world.

BlueTemplar · on Dec 17, 2020

According to Google, IPv6 traffic is up to 30% these days.

markosaric · on Dec 17, 2020

you can see the exact method on our data policy: https://plausible.io/data-policy

colejohnson66 · on Dec 17, 2020

I’m guessing a cookie with an expiration of 24 hours, but I could be wrong

marvinblum · on Dec 17, 2020

I would like to add https://pirsch.io/ :)

bobm_kite9 · on Dec 17, 2020

This looks really nice, but what’s to stop it getting blocked like all the other trackers once apple/uBlock/etc. add it to their database?

marvinblum · on Dec 17, 2020

You cannot stop that. You can get around it for a while by serving the script yourself and setting a CNAME record for your domain to point to us. That's why we recommend integrating Pirsch into your backend so that it can't be blocked: https://docs.pirsch.io/get-started/backend-integration/

madjam002 · on Dec 17, 2020

IANAL, but my understanding is that you might still need a consent box even if you use Plausible.

I've only skimmed over the docs, but it looks like they derive a unique identifier from the IP address and user agent which changes every day. IP addresses still count as Personally Identifiable Information under GDPR, so deriving an identifier from this for a use case such as analytics would likely require consent. This is speculation though so I'd be interested to hear what others think.

If it is critical to the operation of the website (functionality like storing saved items in a shopping cart, or security), then you wouldn't need consent.

In reality though, Plausible looks great and using it is a huge improvement over Google Analytics for privacy.

M2Ys4U · on Dec 18, 2020

>IP addresses still count as Personally Identifiable Information under GDPR

The GDPR does not count anything as "Personally Identifiable Information", which isn't surprising as that's a US legal term.

What you mean is "Personal Data", and yes IP addresses are considered personal data under the GDPR.

>so deriving an identifier from this for a use case such as analytics would likely require consent.

Consent isn't the only legal basis for processing personal data, though, there are 5 others available.

feanaro · on Dec 17, 2020

> IP addresses still count as Personally Identifiable Information under GDPR, so deriving an identifier from this for a use case such as analytics would likely require consent.

Only if there is a bijection between the identifier and the IP address, so that you could re-derive the IP address from the identifier. Otherwise, I do not see how the identifier itself would count as PII.

This way of divorcing data from PII by replacing it with pseudonymous identifiers which cannot be linked back is a relatively standard technique for this.

slowwriter · on Dec 17, 2020

My understanding is that this kind of active consent that we see as popups everywhere on the web nowadays applies to cookies only. So I would assume that if you can track user activity without a cookie you wouldn't need it. It should probably be stated in the privacy policy though.

I'm not an expert in this even though I'm a webdev from the EU, so I'm also interested in other people's input.

feanaro · on Dec 17, 2020

GDPR doesn't care if you're accomplishing the tracking with a cookie or using a different mechanism. You're not allowed to do it either way, unless the user has consented.

slowwriter · on Dec 17, 2020

Since I’m being downvoted: The EU directive that specifically obligates websites to collect informed and active consent for the use of cookies is not GDPR, it’s the ePrivacy Directive.

I don’t believe that one should automatically conclude that just because a cookie requires active consent, any kind of ‘logging’ (local and temporary storage of IPs in order to track website usage) requires active consent. Those are two fundamentally different things.

I’m not saying you should hide the fact that you’re doing it. I’m saying it should be stated in the privacy policy.

Also remember that there is a big difference between ‘personally identifiable information’ and ‘sensitive information’ which are clearly separated concepts in GDPR. Not all collection of data requires active consent.

I did read my EU state’s guideline on GDPR in full, but I’m not an expert. I would suggest reading up on the ePrivacy Directive though, which is still in effect.

madjam002 · on Dec 18, 2020

Not sure why you're being downvoted, yeah cookies are handled by legislation other than GDPR (ePrivacy as you mentioned).

However regardless of whether you're using cookies, I still think you need to collect explicit consent as GDPR requires a lawful basis of processing, and I don't see how analytics would fall under any of the other lawful basis's other than consent (_maybe_ legitimate interests?)

If you are using cookies, then my understanding is you need to collect consent where necessary under _both_ ePrivacy and GDPR.

louhike · on Dec 17, 2020

Another solution is to do all the tracking in the backend. I'm not saying it's a good solution.

coldpie · on Dec 17, 2020

Or, don't do any tracking. I'm convinced that 99% of all analytics is discarded without ever being reviewed, analyzed, or acted upon.

chipotle_coyote · on Dec 17, 2020

This is probably correct.

On my personal web sites I'm using GoAccess, which is basically a new spin on a very old idea -- just analyzing the server's web logs.

https://goaccess.io

That's not as accurate as throwing around cookies and JavaScript, but I rarely check the log pages anyway, and when I do I'm less interested in raw numbers than I am in the relative performance of various pages. (And that's mostly just idle curiosity, e.g., are there some old articles that keep getting steady traffic from somewhere?)

scrollaway · on Dec 17, 2020

Much like logging though, it's the 1 percent that isn't discarded that's important.

I agree with you by the way, but ...

shuckles · on Dec 17, 2020

Wouldn't that still violate the law but just be harder to detect from the client? If so, I don't think GitHub (i.e. Microsoft) would find it a compelling approach.

owaislone · on Dec 17, 2020

The backend already stored all the information about the users. Why would it violate any laws if it stored a bit more or a bit less info? Things can get tricky if Github exported the collected data to third party for analytics.

MaurizioPz · on Dec 17, 2020

part of the GDPR law is the intent of the information you are storing, not the method. Cookie is just a technology. If you track your users using a DB it still applies and you need consent if the tracking is not necessary

drcongo · on Dec 17, 2020

I can't reply to the reply to this for some reason, but it's worth noting that GDPR and the cookie law are different, though related.

OJFord · on Dec 17, 2020

For very recent comments, click on the timestamp to get the single comment view and be able to reply.

(I had to do it here too.)

edoceo · on Dec 17, 2020

You folks might just be time limited on the reply - HN puts some brakes on "too fast" commenting

OJFord · on Dec 17, 2020

See my profile if you want, that was my first comment in hours. I think it is that sort of brake, (prevents heated discussions veing quite so quick-fire) but it's not on the user, it's on everyone for <'5 [or something] minutes ago' comments; drcongo's happened to be '0 minutes ago' when I loaded the page, so I clicked on it to reply.

drcongo · on Dec 17, 2020

Oh! Useful tip, thanks!

munchbunny · on Dec 17, 2020

It would still be a violation because of how you're using it. The law isn't purely about what data you track, it's primarily about what you do with the data.

jarofgreen · on Dec 17, 2020

IANAL!!!! But I think, yes, there are still implications. GDPR makes no distinction about back end and front end AFAIK, it's just about what data you collect and why/purpose.

But note there are other reasons you can have for collecting data other than consent (something often overlooked) - for example I would guess GitHub would log IP addresses in the back end for a limited time for spam fighting reasons, and I think that would be fine.

asimops · on Dec 17, 2020

To my understanding of the GDPR, as soon as you track any identifier that makes those data non-anonymous you still need consent for that. It is not about the cookies per se.

M2Ys4U · on Dec 18, 2020

There are six legal bases for processing personal data, consent is only one of those.

BlueTemplar · on Dec 17, 2020

Would that mean that you need consent for storing IP addresses in logs?

kybernetikos · on Dec 17, 2020

It depends on what you use them for, but I think you would need it documented that you do it and why.

swiley · on Dec 17, 2020

I really hate the lies you see on a lot of new sites that they will send cookies "necessary for basic functionality."

You're serving articles, there's no reason for session tracking!

canofbars · on Dec 18, 2020

Without cookies they can't check if you closed the cookie nag.

amgutier · on Dec 17, 2020

I can see a need for cookies to mitigate against things like DDoS attacks, session management for paywalled content or just to leave comments on articles, favoriting certain sections. There are several reasons why as a reader you would want the site to be stateful.

ancarda · on Dec 17, 2020

How would cookies help mitigate against DDoS attacks?

morei · on Dec 18, 2020

Helps separate real traffic from DDoS traffic. e.g. traffic from someone that also visited the site prior to the start of the DDoS is vastly more likely to be real traffic.