The internet is split roughly into 3. The top 100 websites get a third of the page views, the remaining top 10k get another third and millions of websites get the last third.
The top 100 have dedicated engineering and policy teams teams that will disable FLoC because they're either not interested in ads (Wikipedia) or have their own first party implementation that doesn't need FLoC (Facebook). They'll ditch FLoC.
The next 10k might have engineering teams that can make the change, but might be more interested in finding out about their audience so they can monetize more easily. They'll keep FLoC.
As for the remaining millions, only a tiny minority of them will even know this is a thing, let alone care enough to make the change or contact a developer who can do it. These are the folks who have hosted their wordpress site with GoDaddy because it was cheap and quick when they needed a site. They'll keep FLoC.
So the upshot is that github.com, instagram.com and amazon.com might opt out, but the vast majority of the web will not. Me prediction is that at least half of all web pages loaded by users won't have this header.
Cloudflare, Akamai, Fastly and other CDNs should disable FLoC by default for all customers, and provide a toggle to those customers who explicitly wish to enable it.
But until they do[1]:
Apache:
Header always set Permissions-Policy: interest-cohort=()
Caddy:
header Permissions-Policy "interest-cohort=()"
Cloudflare Workers (not free as there are limits):
addEventListener('fetch', event=> {
event.respondWith(handleRequest(event.request))
})
async function handleRequest(request) {
let response=await fetch(request)
let newHeaders=new Headers(response.headers)
newHeaders.set("Permissions-Policy","interest-cohort=()")
return new Response(response.body, {
status: response.status,
statusText: response.statusText,
headers: newHeaders
})
}
This kind of post that provides no context will lead to cargo-cult, with people blindly copying and pasting these directives, and believing they have increased the privacy of their site...
If your web site does not include ads, FLoC is already disabled. Here, "ads" mean ads that EasyList can detect. This HTTP header will just make your config more complex and your responses slightly bigger, with no change of behaviour.
If you include external ads on your pages, then I doubt disabling FLoC will increase your visitors' privacy, but at least this header will have a real effect.
From https://web.dev/floc/ in section "Do websites have to participate and share information?"
> For pages that haven't been excluded, a page visit will be included in the browser's FLoC calculation if document.interestCohort() is used on the page.
> During the current FLoC origin trial, a page will also be included in the calculation if Chrome detects that the page loads ads or ads-related resources.
Excluding some portion of sites from a user’s cohort calculation doesn’t necessarily make a user less unique if a nontrivial number of sites doesn’t opt out.
Unclear to me are what these headers do to the browser.
I mean... the docs say that they are a "site" header that you should apply to a "page". Does that mean that you must apply it to all pages to exclude a site? Is absence on one page taken as opting back in to FLoC?
If the scope is site, then it would be better as a DNS entry. I've a feeling the scope is truly page though and I've also a feeling that most people who choose to add this header will add it on all assets now - which is a bit of a waste of bytes (even with header compression in place) but would be the only way to guarantee that all pages have it.
Google already has shown bad faith in opt-out headers like this when they immediately started ignoring Do-Not-Track as soon as non-Chrome browsers made it a default. The fact that the spec for this awful project uses an opt-out instead of an opt-in header seems a pretty clear signal to me that Google may not have any intention of following it in the long run.
But at this point in time I think it'd be unfair to call Cloudflare "just a CDN" so not really equivalent.
From what I've heard through the technical operations jungle. Google has been pushing their CDN product hard for a long time, which isn't a shock since they've been trying to push GCP hard for a long time. But it's a little like AWS's Cloudfront CDN. It's very very rare to see someone using an AWS Cloudfront or GCP CDN... that isn't on said cloud platform already.
Cloudflare, etc. should do what their customers want and not make these type of decisions for them. They are CDNs and not the owners of their customer's websites.
Cloudflare's mission is to "help build a better internet", and to that end have made a lot of opinionated decisions to increase security and performance. Where possible options are given to customers, but the opinionated way wins by default.
Examples: Turned on HTTPS for all customers, gave image compression and optimisation to all customers, moved customers to the latest TLS as soon as possible (help drive adoption), provide tools to obscure email addresses on web pages to minimise harvesting, 1.1.1.1 privacy focused DNS, etc.
FLoC is something that an opinion can easily be formed on, and where Google have said to each site operator "you must opt-out", Cloudflare can hold an opinion that default opt-out is bad for the internet and that opt-in is better... and if they make an option that defaults to adding this header but granting customers a means to toggle it off... then all Cloudflare will have done is what Google should have done... made this opt-in by default.
According to their own numbers, WordPress accounts for 41% of the total number of websites, and they're considering switching off FLoC by default (https://core.trac.wordpress.org/ticket/53069 - discussion not settled yet though).
It's kinda sad that you can't just run a web server and host your own homepage anymore. You need to mess with your webserver config to make it spam the client with a dozen HTTP headers to disable FLoC, enable HSTS, set this weird same site origin policy thing, disallow iframe embedding...
Luckily enough someone had the idea to make it so HTTP headers will be compressed too, so we can add some more before the request header completely fills up the initial RWIN of the server.
IMHO it's far more unbalanced than "The top 100 websites get a third of the traffic, the remaining top 10k get another third and millions of websites get the last third." Purely from data traffic, youtube and netflix already get a third of the traffic (and that's just 2, not 100); and purely from pageview perspective, the top social media sites plus major media sites (again, a subset of the top100) get more than half IIRC.
I wouldn't be surprised if the top 100 websites get 80% of the traffic, the remaining top 10k get 10% and all the millions of other sites get the last 10%.
I meant pageviews, not bandwidth consumed. Streaming websites are always going to dominate the latter.
This 100-10k-millions split statistic was pulled from a talk by Ilya Grigorik, who had worked on Web Performance at Google. I'm guessing they based it on data from Chrome.
Searches and time spent browsing are very different. Google likely doesn’t have visibility over how much time people spend on TikTok. Or perhaps Android collects that data.
So, every page with ads (as determined by an opaque and ever-changing method in a closed source browser), will be included while computing the cohort. Got it. It seems safest to assume that "it happens on every page" since so much of the internet is monetized with ads.
Please forgive us for not trusting Google's "we pinky swear it will change". We have no real reason to trust that Google will keep their word.
No, Chromium's ad detection code is open source. Chrome's is closed source. It may very well be the same code, but there is no (practical) way to verify that, other than trusting Google.
But as I already indicated, I have trust issues with Google.
Reverse engineering of binaries is a well-understood field. Ensuring a binary and source code align is not a fully automated task at this time as far as I know, but is well within the capabilities of our industry.
Capability and practicality are distinct concepts. Especially since it's fairly well known that Chrome will not align 100% with Chromium thanks to closed source additions and modifications, so it becomes a question of "what is different" instead of "are they different".
It's certainly not practical for me, when I can just avoid chrome for personal usage (and I'm thankful for the capability). Of course, I can't avoid it entirely, thanks to my company deciding that Chrome is the only supported browser for our product. So even though FLoC is a non-issue for me personally, it is still something I need to worry about professionally.
EasyList is extremely broad, changes frequently, and e.g. has included communities having banners for community events as ads, and Google appears to give zero promises on the stability of this.
So relying on this requires continuous monitoring that Chrome doesn't randomly decide to tag something on some page as an ad, which is doing even more work just to cater to Google whims. Just blocking it is the sane choice here.
since you say it is about "chicken-and-egg problem" during the trial: is there a clear commitment somewhere that Google plans to not include pages that do not use the FLoC API in the future?
> is there a clear commitment somewhere that Google plans to not include pages that do not use the FLoC API in the future?
Not something as clear as I'd like. The closest I see is:
A page visit will be included in the browser's FLoC calculation if document.interestCohort() is used on the page. During the current FLoC origin trial, a page will also be included in the calculation if Chrome detects that the page loads ads or ads-related resources. -- https://web.dev/floc/#do-websites-have-to-participate-and-sh...
I think advertising is positive [1] and the role of ads in funding freely-available sites is very important. My current work is primarily on how browsers can allow more private and secure advertising [2][3][4] which I think most people will agree is valuable even if they are less in favor of advertising in general.
At a lower level, I do this job because I'm paid, which allows me to donate. [5] But I wouldn't do this work if I thought it was harmful; there are lots of different kinds of jobs I could take.
That link only works if we buy into the premise: "One way to think about this is, what would the world would be like if we didn't allow advertising? No internet ads, TV ads, magazine ads, affiliate links, sponsored posts, product placement, everything."
However, no. I don't buy that premise at all. The state of ads as it is now is actively harmful with very little to show for in terms of "new non-stickier products" etc.
Coincidentally, my current project involves this Chrome proposal for supporting self-contained remarketing ads without individual tracking: https://github.com/WICG/turtledove
> supporting self-contained remarketing ads without individual tracking:
Yeah, but what about no tracking at all? What about getting rid of "remarketing"?
---
Since you are here:
> The question is, what is the alternative? I see two main funding models: Paywalls. You pay with your money. Ads. You pay with your attention.
How about:
- people pay what they feel like?
- the patreon model, where some enthusiasts pay for the content producer, but everyone get access to the product of the work?
- (for social media): "influencers", power-users and media companies pay for the service and get a quota of "free accounts" they can bring along?
> producing most of what there is to read requires more money.
Wikipedia does not pay its editors and it seems to work without being only a "hobby".
Also: is most the content "that is there to read" worthy of the money and resources they receive? Content farms, celeb-gossip and listicle "journalism" only gets to exist because the are playing for quantity, not quality. Remove ads from the equation and they will certainly die a well-deserved death.
> Micropayments (...) many proposals and startups, but nothing has really worked out.
Brave is growing (over 30 MAU already, projecting to get 50M by year's end, over 1M registered content creators) and it is showing that a tipping economy can work. Would you go work on it?
---
People defending an "ad-funded internet" think they are enabling democratized access to good quality content, but in fact what happens is that they are enabling a whole lot of people to make a living of essentially producing sewage. It's as if McDonalds was able to actually offer their crappy food for free and people were applauding it for fighting hunger. It artificially creates externalities and makes it next to impossible for people to pick winners that "should" win in the market.
I know you believe that you have the best intentions at heart, but at this scale there simply is no way to make anything more efficient than a direct, transaction-based market that could match producers and consumers.
I'm simply blown away by that donations link. Here was me feeling happy about the little I give, but it really puts into perspective how much I keep for myself.
Google, and it's method for advertising, basicaly destroyed the news industry. If you don't think your work is harmful it simply means you haven't looked into the repercussions enough.
Internet advertising and the internet in general has made newspapers less profitable. But this was happening regardless of what Google did. 92% of the decline came from loss of classified revenue (https://mumbrella.com.au/de-classified-what-really-happened-...). Obviously it makes no sense to vilify Craigslist, because someone else would have provided free, searchable classifieds if Craigslist hadn't. That's the nature of the internet, which has reduced the cost of publishing to nearly nothing.
A parallel to the demise of the newspaper classifieds is the once thriving industry of people who would copy books by hand in the 14th century. Then Gutenberg created a printing press that could make copies of books in a fraction of the time. Life didn't get better for those folks who's skills were no longer needed, but maybe it did for society as a whole. But for sure it didn't and doesn't make sense to vilify people who work at printing presses.
You're looking for a "bad guy" when maybe none exists.
> Wikipedia does not need to take any action to disable FLoC [...] If you call document.interestCohort() to get a FLoC id for a user
It is still a problem for Wikipedia, because the global Javascript for each language is editable by a subset of the editors for that language (https://en.wikipedia.org/wiki/MediaWiki:Common.js for instance), unlike the HTTP headers which can only be changed by the Wikimedia sysadmins.
If someone can execute arbitrary JS they can already exfiltrate any information they want with userid attached, in addition to impersonating users etc. This is a far bigger risk than that they might add a call to document.interestCohort() and opt that page into FLoC?
On the other hand, I could imagine them or other analytics services adding it since sites installing analytics services generally want to be able to slice their traffic by as many dimensions as possible. I would hope that any service considering reading FLoC implements it as opt-in, however, since sites probably didn't consider whether their pages are sensitive in choosing whether to add analytics (though perhaps they should have!)
>As for the remaining millions, only a tiny minority of them will even know this is a thing, let alone care enough to make the change or contact a developer who can do it. These are the folks who have hosted their wordpress site with GoDaddy because it was cheap and quick when they needed a site.
One company decides to do something stupid and you expect millions of website owners to scurry and add junk to their headers to create a "mitigation"? This is nuts.
That's probably accurate if you assume every website is equal, but if you measure by traffic the top 100 websites account for 95% of measurable tracking events. Won't that make FLoC rather ineffective if 95% of the data is missing?
alright 41 comments and everyone seems to know what FLoC is so i’ll be that guy.
What is FLoC and whats the big deal?
I googled https://www.theverge.com/2021/3/30/22358287/privacy-ads-goog... and it seems like an attempt by Google to add proprietary cohort based tracking to replace third party cookies. well intentioned but could have flaws. anything else i should know?
It's an attempt at a replacement for third party cookies. Your browser look at your history and computes a "cohort" and if other people's browsers do the same things, people with similar history will have the same "cohort".
The upshot of this is advertisers only see the cohort_id, and not the history (which stays local on your browser). I think Google thinks it needs to give the advertisers something if third party cookies are going away, and this is attempting to preserve privacy.
Of course, just not sending third party cookies and not sending FLoC is the "ideal" solution if you don't have advertisers paying you, and some people were excited that third party cookies are going away, and hoped that nothing would replace them.
(Disclaimer, work on Gmail, this is my own opinion, I only really know what I read on HN)
This is not attempting to preserve privacy. This is attempting to give a pretense of preserving privacy, while completely deanonymizing the web.
This is browser fingerprinting on steroids. In addition to things like screen resolution and OS, you get a FLoC ID. Browser fingerprinting already works very well. FLoC supercharges it, and adds profiling information.
FLoC also gathers information from web sites which otherwise Google could not track. Since your browser is tracking you, they don't even need Google Analytics installed.
Except that now you got control over your fingerprint. You can choose what to send to the website, you are the one that decide which website get it or not.
Sure you'll still get the other fingerprinting there, which still allow them to track you, but before FLoC, Google couldn't imagine reducing Chrome own fingerprints, now that they are going toward FLoC, they can do that, without cannibalizing their revenue stream.
In real life, most won't deactivate FLoC, and that's where they are still going to make money. Everyone else most probably already use adblockers or already refused ad targeting from Google Ads.
A floc Id is shared amongst millions of users, and can be reset at any time by the user.
Google owns chrome and always had the ability to track any website whether or not it had google scripts on it. If you signed in to your browser, this was already happening.
> A floc Id is shared amongst millions of users, and can be reset at any time by the user.
Sure but are you sharing your IP with a millions of users? That's only a single other information about you, there's a bunch others given by your browser.
Hold up, doesn't something central still need to decide what cohort you are and store your individual data? How does it decide what cohort you are locally without pulling data about other users down locally or sending your individual data up? Is it comparing your behavior tensor with the cohorts or something like that?
Yes, although the central service provides data that a browser-side algorithm can use to put the user into a cohort. The browser history itself isn't directly sent to the service.
Each browser developer would have to decide which central service to use, whether their own or somebody else's.
> "FLoC is designed to help advertisers perform behavioral targeting without third-party cookies. A browser with FLoC enabled would collect information about its user’s browsing habits, then use that information to assign its user to a “cohort” or group. Users with similar browsing habits—for some definition of “similar”—would be grouped into the same cohort. Each user’s browser will share a cohort ID, indicating which group they belong to, with websites and advertisers. According to the proposal, at least a few thousand users should belong to each cohort (though that’s not a guarantee).
If that sounds dense, think of it this way: your FLoC ID will be like a succinct summary of your recent activity on the Web."
I didn't know much about it, and wow, sounds really terrible. I can even see it as an idea that started with good intentions, but the use cases explained (like linking floc ids to user ids in websites you signed in and potentially exposing browsing habits) make this thing really invasive; the whole idea is broken.
If you want to learn what Floc is for the first time you probably shouldn't start with an article titled "Google’s FLoC Is a Terrible Idea". I also don't know what it is but will just wait for a more neutral source hopefully.
I don't care if I'm tracked by advertisers, and I'm also not advertising, so I guess I'm pretty neutral?
FLoC is Google's new way to target "cohorts" of users with advertising. The idea is that Google will classify each user into a cohort (and that classification will change over time as the user visits more web pages, and perhaps other data is acquired) and only that cohort is reported to the advertiser, which can then use it to serve appropriate ads.
On the flip side, this is obviously still targeted advertising, and some people have a strong negative outlook towards that idea in general. Also, it's been said that if you can manage to track a person across just a few cohort changes, you can personally identify them, which is contrary to the entire idea about FLoC protecting a person's privacy.
In short, proponents think it's better than tracking cookies, and opponents think it's still privacy invasion, and not much better than tracking cookies.
Or illegal. IANAL, of course, but there are certain cohorts (age, disability, etc.) which are illegal to discriminate against in the US. But since Google doesn't have insight into what a cohort describes, they can't ensure that cohorts are being handled properly according to the law.
I'm sure Google has a "get out of government oversight" card from their lawyers, but like biased AI, this seems like it's on the wrong side of "grey".
Not going deeply into the topic (even if I'm familiar what FLoC is) what I'd expect from a browser? Well, browse the internet pages, display those correctly and fast.
Tracking, advertising, user cohorts does not fit into the "browsing" part. That might be enough to feel why "Google's FLoC is a Terrible Idea".
FLoC is Google's loophole after disabling third-party cookies in Chrome and promising they "will not build alternate identifiers to track individuals as they browse across the web"[0]. FLoC is a new tracker to replace third-party cookies, but it works by putting users in groups and tracking those groups, so they technically aren't tracking "individuals" this time. Except you can use FLoC to track individuals by identifying them as a unique intersection of many disparate groups[1].
The GH issue is interesting. Suppose each user belongs to two flocks, and each website gets randomly shown one of the two. Will this solve the problem? I’d imagine the number of possibilities after a certain while will make it impossible that cohort histories collected by two different websites will match for the same user.
FLoC is basically something implemented in a browser. Why website owner should be bothered by it? If client decided to use browser with FLoC than it's their decision. The only interesting thing might be to inform user that they are using shitty browser that doesn't respect their privacy and make sure that website works in other browsers.
What if Google decides that they will ignore that header? Is there anything preventing them from doing that? Do we know why they decided to even implement this "workaround" with header?
> Most normal users use Chrome because "everyone else does", and won't even have a clue what FLoC is.
“Because it was installed on my machine bundled with some third-party software” is also a big factor (but obviously, nobody will give you this answer, because they don't even know where Chrome came from)
It's definitely not an informed decision. That's why I mentioned that you can inform your users about this issue. I just think that this decision should not belong to webmasters/site-owners.
I certainly do see your point here. But the reality is that Google doesn't have users' best interest at heart, and is not going to be the one to responsibly inform user so they can make an informed decision on their own.
Even if we only consider the direct self-interest of website owners, website owners may want to disable FLoC in order to prevent competing websites and adversaries from being able to target that website's visitors with ads. The more effective those ads are, the more they are taking away business from the websites that have not disabled FLoC.
> Is there anything preventing them from doing that?
Potential privacy laws, competition, bad press,... but technically, nothing. Same as DoNotTrack.
In fact that's the whole idea behind FLoC. It is supposed to be a privacy improving feature! For now, the usual tracking methods based partly on third party cookies work for them, certainly better than FLoC would, and they are definitely more privacy invading.
But with things like GDPR, and with privacy being a bigger and bigger selling point, Google feels like it had to find something else and FLoC is their answer.
I don't know how the story will end but most likely in the same way as DoNotTrack, which started out badly, and turned into a joke when browsers started enabling it by default, disregarding the recommendation.
> Pages sites using a custom domain will not be impacted.
Not sure what % of Github Pages use custom domains but this appears to leave no mechanism for custom domains to optionally enable this header either.
I don't really understand the motivation here; if it was for the benefit of GH users, why wouldn't that apply to custom domain users? Is it purely to hamper Google (as Microsoft's competitor)?
There could be a technical reason for it not to be available for custom domains (yet?).
The page mentions that the header is set for all pages served by github.io, which leads me to believe they add the header on the reverse-proxy/loadbalancer side for github.io pages.
Custom domains most likely use separate proxy/loadbalancing infrastructure where the same change could take longer to implement, or they might be exploring options to make it configurable.
github.com is itself a social network and a tracker. They should know a lot more about the status and activities of the software projects hosted there and the users than the users themselves do. Enabling third parties to track users across their site would be the equivalent of opening the lid of this treasure chest.
Oh no, I do understand that. It's just the excluding custom domains part I don't get. Why not keep Google out of all their data: why give them a portion?
I've never thought of GitHub like that, but it makes so much sense. With all of the features (like the commit heat-map), GitHub is at this point 2 parts social network, and 2 parts social network for software.
Sounds like a machine learning term that escaped from researchers into the wild. Machine learning people like to make up fun names for otherwise complicated and hard-to-summarize methods. See eg "BERT".
The name is actually descriptive. It is an algorithm for constructing semantically interesting cohorts of similar users, Locally on each user's machine.
It's actually a really good idea and certainly a lot more "privacy preserving" than anything that relies on sending fine-grained user data back to a central server for processing.
Of course there are problems with it, and I'm mixed as to whether it's something that non-Chrome browsers should even try to support.
The fact that all websites are included by default, and it's up to the individual website to opt out of inclusion, makes me squirm.
But the name makes sense and I think the core idea is a step in the right direction.
> The fact that all websites are included by default, and it's up to the individual website to opt out of inclusion, makes me squirm.
That's not the case, contrary to what Hacker News wants you to think with this massive opt-out campaign. FLoC cohort computation is only planned to include websites that themselves request cohort information. Unless your page calls document.interestCohort, it is not included in cohort computation [1]. The opt-out header does nothing unless you use FLoC.
There is an exception to this made for the pilot phase (aka. right now), where in order to bootstrap the system Google is extending cohort computation to include "all websites that show ads" [2]. My guess is that this is necessary so that early testers get useful data. This is not something that seems to be planned past the pilot. The standard also restricts this to only "while 3rd party cookies are still a thing".
Disclaimer: I work for Google, but not on advertising or Chrome. This is all from public information I researched in my own time.
[2] https://wicg.github.io/floc/#adoption-phase §7.1.4 "at the adoption phase, the page can be eligible to be included in the interest cohort computation if there are ads resources in the page, OR if the API is used."
Whether ads are being loaded is being determined by an opaque, ever-changing algorithm implemented in a closed source browser. We have no way to verify that this is how it's actually working, or when it will change. That doesn't even include how a good majority of the internet is monetized by ads, often Google's ads.
It's simply safest to assume that every page will be included.
Chrome is a closed-source fork of Chromium that applies numerous proprietary patches to Chromium. There's no way to tell what has been modified in that process (short of decompilation, et.al.).
Pretty much the same process that Microsoft takes with Edge, really.
I don't think calling Chrome a closed source browser is accurate unless you have a citation showing that Chromium is missing this code
That's completely backwards. You would need some evidence showing that Chrome does not include proprietary patches, otherwise you pretty much have to conclude that it's closed-source, even if it includes a large % of code from an open-source product.
This seems to indicate the authoritative source of truth is EasyList. On my current machine, the list seems to be stored in "~/.config/google-chrome/Subresource Filter/Unindexed\ Rules/9.22.0" and should be easily inspectable.
I don't know if I've missed some documentation pointers related to this.
I haven't heard about a global opt-out in the browser, but I haven't really looked for that info either. I think I've heard Chrome allows extensions to "easily" hook document.interestCohort and return any value the user wants (including random values). The standard also mentions "The user agent should offer a dedicated permission setting for the user to disallow sites from being included for interest cohort calculations." but that's only for blocking specific sites from contributing to cohort computations, not for disabling globally.
"Federated" is the only part of that I can see as not being simple. But even if all 3 words were generally unknown, I don't know if it's really a problem. You need to understand what FLoC is instead of what the individual words mean to know what the issues with it are.
That also causes curl to change the underlying request to a HEAD request. Though according to the spec they should return the same headers, it’s not uncommon for sites to fail to do so (some web frameworks leave this responsibility to the user) or to cache these responses differently.
Personally I reflexively use the verbose version they used for these kind of investigations of server behavior after being bit a few times.
Also consider changing the user-agent from the default. I set mine to a typical browser string in ~/.curlrc, but you can also use -A/--user-agent on the command line.
I use -i instead of -v -o /dev/null; is there any reason to prefer the latter? Is Curl smart enough to skip fetching the response body with the latter?
It's not really a blog post, even though it's pushed out over their "blog" endpoint. This post is part of https://github.blog/changelog/ which tends to lean closer to the "git commit message" length than blog post length. Just a statement of changes they've made users may notice or be affected by.
FWIW the duckduckgo extension already shows the github.com website as tracker free, so they take a pretty strong stance on privacy.
I think the why (ie. why does github take this stance on user pages) is pretty self explanatory in this situation.
This is a bit confusing. That post seems to suggest that (1) adding the header is not necessary to prevent one's site from "leveraging" floc, ie, identifying users, unless one already runs ads, and hence (2) that the header isn't necessary in most cases.
But it also says:
What adding this header does is exclude your website from being used when calcualting a user’s cohort. A cohort is an identifier shared with a few thousand other users, calculated locally from browsing history; sites that send this header will be excluded from this calculation. The EFF estimates that a cohort ID can add up to 8 bits of of entropy to a user’s fingerprint.
Being excluded from cohort calculation has a chance to place a user in a different cohort, altering a user’s fingerprint. This new fingerprint may or may not have more entropy than the one derived without being excluded.
But is individual fingerprinting really the concern? What if I don't want google clustering people who visit my page with people who visit similar pages? In they case, the header still helps protect their privacy, right? By making Google's website visit interest based clustering less substantively accurate? Or am I misunderstanding how floc works?
(Am author) Google's FLoC cohorts are determiend by browsing history. If your page is excluded thereby giving other pages a higher weight, it doesn't necessarily reduce the bits of entropy in a user's fingerprint. Cohorts will still have roughly the same number of people and thus make it about as easy to identify users.
If you add the header to your site, do it for the right reason. It could mess with unsophisticated ad targeting, but it won't necessarily make a difference wrt. privacy. Energy is better spent getting users off of any browser that supports FLoC (Chrome, probably Chromium too).
I guess the question here is what you mean by "privacy." It seems to me that privacy goes beyond merely avoiding the risk of fingerprinting, or individualized identification. Collective identification is also a privacy problem: if I get advertisements targeted at people with similar political beliefs to mine because I've labelled as a member of a cohort that has visited a cluster of X-leaning news sites, that seems objectionable independent of whether the owner of some website can also distinguish me as an individual from every other member of the cohort.
If we have GA, we're getting some information and Google is getting some information, but are they sharing this information about users directly with advertisers?
The premise of FLoC is that they are explicitly tagging you in a group specifically for advertisers.
Has anyone done the calculation of the amount of energy (and therefore co2) used and extra bandwidth cost for adding the opt-out header to most of the internet's traffic?
As I understand it, every response in a page has to have the header, not just a containing html or an initial options.
I don't think it will be that much (spoiler alert: I was wrong), because only a negligible amount of web traffic will be the headers themselves vs. web--pages and streamed content. And the FLoC header itself will be a very small part of that header, maybe 40 bytes. Those 40 bytes could fit in a singular packet.
So, at most, FLoC will add 1 packet per header. I don't know how many headers are sent total each day, but I remember reading that the average person visits 100 websites per day (including reloads). Out of 4 billion people who use the internet, we're talking about 400 Billion response headers per day.
Assuming that each opt out of FLoC (a portion of this is Google, so that's unlikely), that means that an extra 400 Billion * 40 Bytes need to be sent. This is about 16 Trillion extra bytes that need to be sent (16E13). I've just checked, and it seems that the average Google Search is about 125kb, and I found that each releases approx. 7gCO2. So dividing this out, each kilobyte of traffic releases 0.056 grams of CO2. For each byte, that would be 0.000056 (5.6E-5) grams.
Multiplying that out by the 16 Trillion extra bytes, you have 8.96 Million (896E7) grams of CO2, or an extra 8960 tons of CO2 per day. So, I was totally wrong. Jeez, that's a lot of CO2.
But, my calculations were a badly-estimated, worst-case scenario. Also, since less websites will have third-party cookies as a result of this, we would have to subtract those now gone emissions. But, this is still a lot more CO2 than I expected, even if it was counteracted.
I've read recently about a fair few sites and browsers and whatnot that are not going to play along with FLOC.
Out of curiosity, what would be the kind of figure that would make Google stop using it? I mean, at what point does the data from a smaller pool become useless?
I don't think that any decent figure would make Google stop using it. Floc is their try on locking more and more vendors in their ad ecosystem, it makes Google the superior ad provider because they now have an even bigger (and more unfair) advantage over other providers.
My hypothesis: if you are blocking floc, you are not really dependent on Google's ad system, neither as a website hosting ads, nor being found through ads. Unfortunately, Google owns too much of the ad market and too many vendors are already dependent on Google.
It's exciting when the megacorps make these kinds of plays against each other. I feel like I'm watching my abusive partner get smacked in the mouth by my abusive ex.
I have just switched to Brave (this browser blocks FLoC across all the web). I regret no trying this browser earlier. Also this IPFS stuff seems very interesting (kind of Bittorrent for the web).
This isn't meant as a dunk on MSFT but it's worth keeping in mind that MSFT owns GitHub before celebrating this as GitHub taking a stance. MSFT, FB and Google all heavily employ "analytics", although to slightly different degrees and in different forms. Them not cooperating is a good thing, but not surprising enough to warrant celebration.
How much monopolistic behavior does Google have to engage in before antitrust laws have enough teeth? It also seems to me that Google has been more aggressive in its monopolistic behavior in different areas the more there is talks of regulations raining down on it. Maybe they know the end is near and are trying to get away with as much as possible before that happens.
implement document.interestCohort() and return some useless junk or better fake data (e.g. this user cats pictures and nothing else). However I run into that there is no documentation of how cohort ID is specified. This lead to another question - how are ad companies supposed to actually target their audience with it if there is no translation between cohorts and target groups? (I assume Google already has some translation)
> how are ad companies supposed to actually target their audience with it if there is no translation between cohorts and target groups?
Even as an opaque identifier it's still useful. Imagine you run a store and you log the FLoC cohorts of your customers. You could then target ads at the most common cohorts you've seen as a way to say "show my ads to more people similar to my existing customers".
(Disclosure: I work on ads at Google, speaking only for myself)
But it's not a problem, right? Because we've had countless conversations about google and chrome, discussed them ad nauseam, and we're so sick of this tedious, incessant topic that we've all stopped using chrome, right? Except for, yes, yes, the people who have to use it at work, but who don't use it at home, right? Floc is only a problem if you deserve it.
tl;dr; 3rd party cookies are dying so google has come up with this way to replace them. EFF says 3rd party cookies suck but the choice shouldn't be a those or FLoC. How about neither where the user decides what to share, with who, and when.
This just made me realize something: Once a pattern becomes a meme (or close), it becomes possible to notice a higher level meta pattern which itself becomes memetized ad infinitum(?).
I predict the parent observation will itself eventually become a meme to be "complained" about. Maybe this one too.
I think that's part of the point; AMP became a necessity for news when Google pulled their monopolistic search ranking levers, but few other types of sites implemented AMP since there was no real motive.
AMP was specifically opt-in and only websites with enough man power and interest to implement it did, if you wanted it you had to program an entirely different page using Google's JS framework and restricted subset of HTML. FLoC is opt-out and requires zero intervention from web devs, if your website shows ads it's already part of FLoC, it can catch on if people do nothing about it.
The top 100 have dedicated engineering and policy teams teams that will disable FLoC because they're either not interested in ads (Wikipedia) or have their own first party implementation that doesn't need FLoC (Facebook). They'll ditch FLoC.
The next 10k might have engineering teams that can make the change, but might be more interested in finding out about their audience so they can monetize more easily. They'll keep FLoC.
As for the remaining millions, only a tiny minority of them will even know this is a thing, let alone care enough to make the change or contact a developer who can do it. These are the folks who have hosted their wordpress site with GoDaddy because it was cheap and quick when they needed a site. They'll keep FLoC.
So the upshot is that github.com, instagram.com and amazon.com might opt out, but the vast majority of the web will not. Me prediction is that at least half of all web pages loaded by users won't have this header.