Hacker News new | comments | show | ask | jobs | submit login
Studying how Firefox can collect additional data in a privacy-preserving way (groups.google.com)
278 points by GrayShade 6 months ago | hide | past | web | favorite | 428 comments

I can do a quick summary of what's being proposed and why. I work in the JS team at Mozilla and deal directly with the problems caused by insufficient data. Please note that I'm speaking for myself here, and not on behalf of Mozilla as a whole.

Tracking down regressions, crashes, and perf issues without good telemetry about how often it's happening and in what context. Issues that might have otherwise taken a few days to resolve with good info, become multi-week efforts at reproduction-of-the-issue with little information.

It simply boils down to the fact that we can't build a better browser without good information on how it's behaving in the wild.

That's the pain point anyway. Mozilla's general mission, however, makes it very difficult to collect detailed data - user privacy is paramount. So we have two major issues that conflict: the need to get better information about how the product is serving users, and the need for users to be secure in their browsing habits.

We also know from history that benevolent intent is not that significant. Organizations change, and intents change, and data that's collected now with good intent can be used with bad intent in the future. So we need to be careful about whatever compromise we choose, to ensure that a change of intent in the future doesn't compromise our original guarantees to the user.

This is a proposed compromise that is being floated. Don't collect URLs, but only top-level+1 domains (e.g. images.google.com), and associate information with that. That lets us know broadly what sites we are seeing problems on, hopefully without compromising the user's privacy too much. Also, the information associated with the site is performance data: the time spent by the longest garbage-collection, paint janks.

This is a difficult compromise to make, which is why I assume it took so long for Mozilla to come around to proposing this. These public outreaches are almost always the last stage of a length internal discussion on whether proposals fit within our mission or not.

I'm not directly involved in this proposal, but I personally think it's necessary, and strikes a reasonable balance between the privacy-for-users and actionable-information-for-developers requirements.

> Tracking down regressions, crashes, and perf issues without good telemetry about how often it's happening and in what context.

If that's what you're aiming at. Collect the data but keep it local. Install some sort of responsiveness/"problem" monitoring. Ask the user to send data relevant to the problem if a problem occurs. IMHO there is no need to systematically collect user data for that.

Or get the data from a random sample of users. You don't need data from everyone.

This. Firefox prompts for feedback semi-regularly. I seem to recall it even bends over backwards to make it end-user friendly by having "Firefox made me happy / Firefox made me sad" options. It seems like it would not be difficult to tie that screen to a secondary prompt that says, "Can you briefly turn on some additional telemetry for us so that we can try to fix the problem?" Let the users make that choice to temporarily lower their shields so that you can get some useful data out of their machines, in exchange for the implicit pledge that you will use this data for troubleshooting this one explicit issue (i.e. whatever prompted them to click "Firefox made me sad").

That seems like a reasonable compromise to me. I'm happy to send logs if my browser crashes whenever I visit a certain page, and if I know I'm gonna be monitored for that period, I'll isolate my browsing habit to only visit that page. I do not consent, however, to sending everything--even anonymized--on the offchance that Mozilla will see the crash events and use it to flag that domain and maybe fix the issue on that particular page.

Awesome idea... why not introduce a "reproduce bug" mode which basically monitors all things in detail. If people are annoyed enough to send bug reports there is a good chance they will use it if properly presented. If people are not filing bug reports you don't really have a business there... If you generally need more of this telemetry advertise it and use it yourself...

That sounds way more reasonable to me.

> Or get the data from a random sample of users. You don't need data from everyone.

To my amateur ear, that actually sounds like a good compromise to lessen the blow somewhat more. You should suggest it to Mozilla :)

I'm not sure how that would help. If I opt-out of data collection, I don't think I'd be particularly pleased if I get randomly selected to be one of the users in this "random sample" and the stats get sent anyway.

And if I opt-in to data collection, why would it matter to me whether the stats I'm sending are a result of me being selected as part of a random sample or not? Might as well just _always_ send those stats; it doesn't matter to me.

That's what's proposed here. I guess no one actually read the post...?

There is no mention that once validated, that the RAPPOR-based metrics when fully deployed would be take from a random population. Only that the initial study of the system will be done to a random population.


What we plan to do now is run an opt-out SHIELD study [6] to validate our implementation of RAPPOR. This study will collect the value for users’ home page (eTLD+1) for a randomly selected group of our release population We are hoping to launch this in mid-September."


"this study will collect ... for a randomly selected group"

[6] - https://wiki.mozilla.org/Firefox/Shield/Shield_Studies

I added the second part about the random sample later to the comment when I moved the proposal already out of my short term memory. I hope they use the data from their initial study to test whether the opt-out group actually is different from the group they already get data from.

"This is a difficult compromise to make"

Then don't make the compromise.

As others have expressed here the reason few people opt in to data collection may be because they have chosen to use a Web browser that does not mandate the collection of data.

I'm assuming there will always be an opt out which I shall add to my list of things I have to do when installing Firefox.

> I'm assuming there will always be an opt out which I shall add to my list of things I have to do when installing Firefox.

There will be. Sorry for the hassle :(

How can I recommend my friends to use Firefox when I know they wont remember to opt out?

On Linux it may be that the various distributions decide to repackage Firefox with the default setting flipped. Not sure about the various policies on that one.

The ESR track presumably will have the default flipped because corporations get funny about data transfers to remote servers - mind you Microsoft seem to be getting away with it for business who don't have a full on Enterprise set up.

The way I see it is that if Firefox's userbase dwindles because of this, either we get our Firefox with opt-out telemetry or... Firefox dies. And now we have a Chrome monopoly.

I'm not sure I like that gamble.

> I'm not directly involved in this proposal, but I personally think it's necessary, and strikes a reasonable balance between the privacy-for-users and actionable-information-for-developers requirements.

I use Firefox and always opt into any telemetry that sends data back to Mozilla. You could say I am a fanboy. I think it is a HORRIBLE idea and Mozilla should scrap it yesterday and never bring it up again. If people bring it up again, send them to the roof team (if it doesn't exist, create one). If they come downstairs, fire them. You already have people like me who are willing to opt-in to every single thing you can try. For example, Firefox nightly on Android has consistently crashed for me about every five minutes or so since the last weekend and yet I keep using it. Don't throw away this goodwill.

The problem here is that, for certain types of data, statistics obtained exclusively from users who opt-in to data collection aren't very useful because they're heavily biased in favor of the type of user likely to opt-in (which often isn't very well representative of the average user).

Bias can be corrected statistically.

Statistics are not a substitute for the long-tail effect.

Lack of reporting from non-technical people who aren't aware they can opt-in cannot be corrected statistically, as the two categories of people (technical, non-technical) use the browser very differently.

For made up example, if you type "Yahoo" into the search bar and then type "Search" into the field and then type your search into the third page, you'll be acting as many normal world users do, and you may uncover crashes on page #2 at Yahoo that a technical user would never encounter, simply because they wouldn't type the word "Search" into the search field at Yahoo and trigger a JS bug where "Search" or "Yahoo" gets used one too many places and ends up crashing the CSS parser because it race conditions with repaint.

If that problem affects 0.01% of the Firefox population, that's a lot of people who don't think technically, and do feel regret when we crash and can't help them because we can't see where it crashed.

(Yes, employed. No, I didn't talk to anyone else before I posted here. My own thoughts, I am not a number^Wcorporation, etc.)

This is a horrible development. If Mozilla starts collecting this sort of data on an opt-out basis, it will put many users at risk. Seriously, WTF?

> This is a proposed compromise that is being floated. Don't collect URLs, but only top-level+1 domains (e.g. images.google.com), and associate information with that. That lets us know broadly what sites we are seeing problems on, hopefully without compromising the user's privacy too much.

Sure, there's no problem with images.google.com because it's generically innocuous. But what about pornhub.com for users in Saudi Arabia? Or some Japanese site that's essentially child porn for users in the US? The top-level+1 domain in many cases is totally incriminating.

> Also, the information associated with the site is performance data: the time spent by the longest garbage-collection, paint janks.

Maybe so. But it's collection of the top-level+1 domain that's the problem.

> I'm not directly involved in this proposal, but I personally think it's necessary, and strikes a reasonable balance between the privacy-for-users and actionable-information-for-developers requirements.

Fine. But then, make it opt-in, to protect users.

If it's so harmless, let users opt-in. Adding data collection via an opt-out is shameful, it shows that you know people would not want this and yet you'd prefer to get more data anyway.

Many problems here:

1. You're proposing a mechanism for collecting data, and a strategy for extracting more data than you currently do. You have not figured out the type of data that you will finally need, only a set of things that you currently envision. Naturally, the data that you will collect in the future will be more than what you currently envision. There is built-in mission creep that is dangerous.

2. What you currently envision is not fleshed out as especially useful. You only believe it is useful. The pain point of biased data is red herring. Your concern is more about not enough data.

3. You have found a technology which you believe will allow you to collect a lot of data anonymously. But none of you seem to understand the technology very well. It seems like a shiny toy that you are eager to go to town with. I am not sure this is the right attitude.

4. You're proposing to use your users in lieu of proper testers, or to save time. There are many ways to properly test software and to save time. Have they been explored? There used to be a time when beta software was a thing. Prompt the users to become testers for your beta software. If users don't want to be testers then don't collect data from them. How much data do you actually need anyway? Have you fully utilized your existing data?

Over all, I see this as a nice-to-have luxury, not some life-and-death situation, and subverting the goodwill of users is not worth it, IMHO.

> There used to be a time when beta software was a thing. Prompt the users to become testers for your beta software.

Firefox already has opt-in telemetry, and Firefox already has a beta channel. It's unclear to me how it would help to tie telemetry to the beta channel; that would just make the existing problems (not enough data, and biased data) even worse, since there are probably far more users willing to share telemetry data than to use beta software.

In context, that might mean if there has to be some opt-out situation then opt-out for the beta channel might be slightly more acceptable.

Differential privacy is relatively battle-tested. I wouldn't be too worried about it standing up to scrutiny.

The problem with differential privacy is I have to trust the person aggregating the data to actually do it.

This is incorrect, at least in theory. RAPPOR is designed to protect the user's data even if an attacker can see all of their individual responses over time. Of course, there could be implementation issues...

Do you? Excuse my ignorance, but I thought there was a way to locally mangle the data before submitting. Is that not what apple is doing?

For the case of RAPPOR (and for what Apple is doing), you do not need to trust the aggregator with your data. These algorithms operate in the "local" model of differential privacy, where all privatization occurs on the users' local machines before being sent to the aggregator.

>Is that not what apple is doing?

I don't know, is it? How would I check, if I consider apple an untrusted actor?

Thanks for your input. Glad to hear someone from the Mozilla team on this thread.

Its an interesting compromise... because without improved performance and features, we'll lose Firefox entirely, and all of the relative privacy / security gains that entails. This is a good example where "perfect" privacy that reaches only a few is the enemy of "good" privacy that reaches more people.

Firefox must continue to exist if we are to have any browser without an economic incentive to be user-hostile. If they need performance traces from websites, and they have an open, clear discussion of how to preserve as much user privacy as possible, they should collect them.

The data collection MAKES it user-hostile. If they start collecting data, then there's no point for Firefox to exist - they're just a crappier version of Chrome.

If user privacy is paramount, then there are multiple ways to lower the privacy incursion that is caused by the data collection.

Only collect top-level domains of Alexa rank 1k. That users are using a highway is less sensitive than a specific street where there only exists 5 homes, and it reassures users that private domain names won't be leaked.

Send the data through Tor. That way you only get the data about the browser <-> site interaction, not user<->browser<->site interaction.

And make it opt-in and notify users of the purpose of the data collection. A good model to follow here is Debian installer and popcon. Follow the good practices of data collection in the free software world and do not use dark patterns.

This is a reasonable compromise, but it does bias the sample towards popular sites. Granted, many of these sites are the sites that Firefox struggles with, but browsing habits are a heavy-tailed distribution. That said, smaller sites do open the door for problems, so it's probably a workable compromise.

EDIT: It should also be completely disabled in Private Browsing mode -- otherwise the optics are even worse than they are now.

> there are multiple ways to lower the privacy incursion that is caused by the data collection

The OP actually discusses a very interesting method for doing exactly that using differential privacy techniques. I personally think that's a very good compromise for this use-case.

From the OP, one suggestion is to collect "top-level+1 domains". This don't solve the issue of a person going to "starting_a_union_inside_company_x.com", which would be a top-level domain. Niche domains don't have a large number of users and as such the users can be trivial to deanonymized. It is also rather common that domain name servers have a private and public side. Firefox could easily become a vector of leaking from the private side, possible revealing sensitive information such as unannounced products.

From the OP we can also see that they don't intend to store IP-addresses, but it will always be possible. By using a anonymity network they can reassure the user and at the same time eliminate the risk that a malicious actor in the future will silently manage to start tracking information about which websites users go to. Additional benefits is that Mozilla also won't become a target for governments, a risk that no organization can ever be safe from if they start gather information about users.

It is not enough to strike a reasonable balance between the privacy-for-users and actionable-information-for-developers. You also need to find a balance between risk management and time spent on reducing risks. What I propose primarily is that they spend a bit more time on reducing risks, as that would benefit everyone.

Even Alexa 1k could be quite sensitive, for example there are many porn sites in that list.

As an organization, we are very aware that some of the sites people visit using our browser would humiliate them if someone could draw a link between who they are and where they visited. This isn't restricted to porn, but that's certainly the most widely known category of site that falls under this heading. We consider this carefully every time we do anything with any user data ever, whether a crash report or the TLD+1 proposal described above.

EDIT: Don't forget that the DNS resolution for porn sites can be deanonymized and resold by your internet provider - there's nothing we can do to protect you from DNS being a cleartext, sniffable, mitm'able protocol.

Mozilla's crash reporter already has the option of submitting the URL.

There are a couple different reasons crash reports aren't sufficient:

1. Crash reports only report crashes. We need also want to see perf issues like GC and paint jank, etc.

2. Crash reports don't sample the general population, so statistically the information is less useful. If we get a perf issue, it's very important to know whether that issue is suffered by 10% of the users in general pop, or 0.5% of users in general pop. You want to prioritize the stuff that has the greatest impact on the general user population.

Lastly, crash reports are sort of a boolean filter - you only get the people that crash. The things I'd like to know to help in my development are things like "what is the histogram of max GC pause times on docs.google.com". Getting that info requires a good random sampling of the population, not just those who exhibit problems.

1. Then why not add a "perf reporter" and a "paint jank" reporter?

"Hi! It seems that this page is loading unusually slowly, would you mind submitting more details to help Mozilla diagnose the issue?

Click `More Details` to see exactly what information is being reported."

You even already have a good entry point for one of these - the "unresponsive script" dialog.

Personally, I'm far more likely to send you this data (after having looked over it) than even the opt-out case. If I have to opt-out of all data collection to be sure I don't accidentally report www.really-illegal-pornography.com to Mozilla I'll opt out and you'll never see any information from me at all. If I can avoid sending reports for www.reall-illegal-pornography.com but still report lots-of-annoying-javascript.google.com than you'd get more out of me.

2. If the issue is reported 10x more often on docs.google.com than on obscure.yahoo.com only because docs.google.com is far more common (even though the problem happens only on 0.00001% of visits to docs.google.com but on 10% of visits to obscure.yahoo.com) it does indicate that the issue in docs.google.com is more important. Sure it is rarer per visit, but a user is still 10x more likely to encounter it.

>You even already have a good entry point for one of these - the "unresponsive script" dialog.

Thanks for bringing that up.

> Lastly, crash reports are sort of a boolean filter - you only get the people that crash. The things I'd like to know to help in my development are things like "what is the histogram of max GC pause times on docs.google.com". Getting that info requires a good random sampling of the population, not just those who exhibit problems.

PLEASE do not go down this road. Look where "optimizing" video card drivers has led the video game industry. Game engine developers and game developers are lazier than ever. It is not up to you to make sure docs.google.com runs well on your browser. It is up to you to provide browser that adheres to (and defines if it must) standards. It is up to the web developers at docs dot google dot com to make their application work on Mozilla Firefox.

This is getting off-topic, but it's interesting. I think I have the exact opposite take on things from you :)

A program written by a developer and used by a user is a relationship between that developer and the user. I just work on the platform that allows that relationship to exist. I feel it's overstepping our boundaries as platform providers to say "we're not going to make this platform faster for you because we think developers are writing bad code using that performance as a crutch".

It feels like I'd be setting myself up as a self-appointed clergy over moral matters in software development. It's not a hat I'm comfortable with.

Why not think about the program you are working on as a program that is built to support the open standards that enable people to communicate and concentrate on performance within these standards? If someone wrote a bad performing non standard compliant code the program should throw an error.

Making bad code run faster is overstepping the boundaries.

But we're not making "bad code" run faster. We're making code run faster. The original counterpoint was that we shouldn't be, because improving the performance just gives leeway for bad programmers to use it as a crutch.

We don't prioritize bad code for optimization. See usage of 'with' in Javascript. We don't actively try to make it worse, but whenever a decision is presented which regresses 'with' performance for gains somewhere else, it'll probably be taken because we don't care about 'with' running fast.

But the example I mentioned: histograms of max GC pause times on a particular website. Or particularly bad janks, or long amounts of time spent in JS which might be the result of poor JS execution..

None of these optimize "bad code". They're just standard platform performance optimizations that help all programs. That will include "bad" programs as well.

THAT is your use case? And this just CAN NOT be done from opt-in? Makes no sense.

If mozilla can't see how utterly insane this is then there is no hope left.

>A program written by a developer and used by a user is a relationship between that developer and the user.

What about the relationship between you/Mozilla and the firefox users? This thread is evidence that at least some of the users are not happy that you are (in their eyes) sacrificing their privacy for future performance gains.

Making optimizations based on telemetry from real world sites doesn't mean you're optimizing for that one site only, like a video driver including hacks for a specific game. For example, shifting an array in Firefox used to be O(n) vs. O(1) in the competition [1]. Improving these sort of code paths benefits the entire web, even if the performance issue is discovered and profiled on docs.google.com.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1348772

> Making optimizations based on telemetry from real world sites doesn't mean you're optimizing for that one site only, like a video driver including hacks for a specific game. For example, shifting an array in Firefox used to be O(n) vs. O(1) in the competition [1]. Improving these sort of code paths benefits the entire web, even if the performance issue is discovered and profiled on docs.google.com.

Again, the question should be if something "benefits the entire web", how can we discover it without an opt-out anti-feature? If the answer is we can't, then we don't want it. It is as simple as that.

>You only get the people who crash

Uh - these are the most important people. The. Most. Important. The people you just pissed off by taking a header in the middle of whatever it was they are doing. Your performance noodling is irrelevant if you aren't addressing those issues.

I'm sorry, but you make the team sound incredibly out of touch with statements like this. To offset the other platforms advantages in marketing visibility, Mozilla has to be better across the board to survive, so unless you guys aren't crashing at all now, I'd say that this should be job #1.

Yes, Firefox developers can only work on addressing crashes or performance issues, but not both.

Sounds good. I'll just work on making things crash faster then.

Top-level domains are still betraying the user's privacy. Does it bug me that PornoTube is significantly laggier on Firefox than YouTube? Sure. Do I want Mozilla to know that I'm visiting it? Hell no.

They wouldn't know that you are visiting it, just that someone is visiting it.

How can they not know that I'm visiting it? I mean, the data is coming from my IP address. Sure, they may be dropping that data before storage. But what if it's intercepted?

Connections to the Mozilla Telemetry server are done over HTTPS, so all an interceptor would know is that you are sending Telemetry and not what that Telemetry is.

OK, fair enough. Then an adversary inside Mozilla that can intercept the data. I mean, the NSA is inside Mozilla, right? It'd be foolish, in my opinion, to assume that they're not. Such a juicy target, and all.

No compromise, I switched to FF on Android to avoid this crap from Chrome and now you'll do it as well.

I look forwards to the fork.

Palemoon on android works wonderfully.

They already said that anything they would do would have an opt-out available.

Regardless, opt-in should not be the default.

They say they have to default to opt-in because otherwise users will not enable this type of data collection. That, in my opinion, should be the #1 indicator that users DO NOT want this collection happening in the first place. They default to opt-in because they know most people won't opt-out, either because they forget, aren't aware that it is happening, or various other reasons.

I'm okay with them collecting any data they want to so long as it is opt-in (because I never will). Mozilla is slowly eroding their original, core values.

Take a list of sites, for example Alexa top 10,000 and make an automatic script that browse these sites and collect whatever information you need. Have a bunch of devices, phones, laptops, PC's from different brands doing this. This will not cost much and you don't have to spy on your users.

".. we can't build a better browser without good information on how it's behaving in the wild."

Who decides what is a "better" browser?

1. Is it the authors? Do they write the software for themselves and agree to share it for free with anyone who may want to use it?

2. Is it the users? Do the authors solicit feedback from users to determine what users want? If users demanded a browser with no default telemetry, would the authors comply?

3. Is it third parties who have an interest in the behavior of users? For example, domain name industry, ad-supported businesses, their employees or advertisers themselves. Are the authors on salary, compensated indirectly from advertising revenue? Or does it come from somewhere else?

4. Is it all of the above? If we follow the money where does it lead? Whose decision of what is "better" is the most important?

Mozilla is descended from a defunct 1990's company that aimed to license a web browser to corporations for a fee. It would have been very clear in that case who the browser was being written for. But today, it is not so clear who Mozilla is serving. It resembles some sort of "multi-stakeholder" project.

It would be nice to have a browser that fits description 1 or 2. I believe there are plenty of folks, including some developers, who would appreciate a browser with no default telemetry. By virtue of the total absence of data collection, they might consider it "better" than alternative browsers that "need telemetry" for whatever reason.

> Don't collect URLs, but only top-level+1 domains (e.g. images.google.com), and associate information with that.

Using "images.google.com" as an example is too convenient.

That would be great if you could also add whatever TLD+1 most people would rather keep private as another example right after "images.google.com".

>This is a proposed compromise that is being floated. Don't collect URLs, but only top-level+1 domains (e.g. images.google.com), and associate information with that.

Until sites start programmatically generating a unique subdomain for each [Firefox] user.

> Don't collect URLs, but only top-level+1 domains (e.g. images.google.com)

Do you consider images.google.com to be eTLD+1? The eTLD would be .com; so, eTLD+1 would be google.com; and hence, images.google.com would be eTLD+2?

eTLD: https://en.wikipedia.org/wiki/Public_Suffix_List

Yeah, you're right. Thanks for the correction. It's eTLD+1, I just erroneously used images.google.com as an example.

This is clearly an example of the infamous off-by-one error.

>>This is a difficult compromise to make,

Sorry I do not accept this compromise. Mozilla seems to have lost its way of late. Sad to see a company that was at the fore front of Privacy, and Security abandon that in name of market share and performance.

I would rather sacrifice performance for privacy, not the other way around.

From EME, to the adoption of Browser Extensions as the only customization option, now this.... Mozilla and FF is changing in ways that are harmful to the open, secure, and private web. Following the trends and policies of MS and Google is not the correct path.

I think the core disagreement here is not ideological per se, but on premises. I agree with the motivation of not collecting any data.

That said, I don't feel that we have a choice but to compromise. If we don't build a better browser, then the other browsers will win by default, which means you lose all those privacy and security motivations anyway.

This is not some gleeful romp down the yellow brick road of data collection. It's a hard-searched, difficult compromise to a question that there are no good answers to, and LOTS of disagreement about.

What are we attempting to "win". Again I go back to my statement of compromising princibles in the name of market share

I have used FF since Ver 1.0 for a few reasons the top ones being it is Open Source, it has always been the most privacy and security focused browser, and were strong advocates of Open Standards that where inter-operable on ALL platforms with out vendor lock in

FF is still open source.... the rest that seems to be in flux now.

> What are we attempting to "win". Again I go back to my statement of compromising princibles in the name of market share

I don't see it as an either-or, but rather a balance to strike. A perfectly private browser with no marketshare doesn't help users. A completely compromised browser with 100% marketshare doesn't help users either.

It would not be no marketshare. It would be the market share you have today people that respect the princibles FF once stood for

Mozilla is not happy with us current users though, they would much rather trade us for Edge and Chrome Users..

Mozilla has made it clear it does not value the Users that desire Privacy, Customization, and Power in the hands of the user. Mozilla has Dreams of "beating chrome" a pursuit I have no interest in and place no value in.

But the marketshare "today" is tanking. It's at 10% and falling. There's no reason it won't get so small that it can't support development any longer.

Then it might be more appropriate to say "has attempted to trade us for Edge and Chrome Users." I think it's inevitable that if they continue on the path they've been on since they started losing marketshare they will disappear, and sadly with so many years of audience-alienating development behind them that no one will pick it up.

The only hope is that one of the forks of earlier versions manages to get enough developers and an institution behind it that they can bring it back to popularity, but before that happens we might be calling the internet "Chromenet" and google won't allow you to visit their sites unless they have been signed with a valid Chrome developer key.

Nah, we'll just switch. If you are trying to become Chrome, we'll just use Chrome. (Firefox+data collection < Chrome not logged in).

Edit: I've been with you guys since the beginning, but the line is drawn here.

Why is the choice between opt-in vs opt-out of automatic behavior?

If Mozilla wants perf data, collect it and then prompt the user "crash reporting" style.

I would totally opt-in to prompts. Give it a threshold and ask, "This page seems to frequently perform less well on your computer, would you like to send us a report?"

Random sampling, basically. The value of random sampling is hard to overstate - it gives you a real picture of what's going on. A non-random sampling gives you a picture, but you have no way of confirming that the picture is a reflection of reality.

Random sampling and privacy run into conflicts not just in the browser space, but everywhere else. For example, recently the Canadian government went through a period where it allowed census respondents to optionally answer some questions that were previously mandatory (using privacy arguments). The result was several years of poor census information. The recent government reinstated the mandatory census questions.

The browser is just one arena where this everpresent conflict between knowledge and privacy plays out.

I've used Netscape then switched to Firefox when Netscape became way too bloated, then enjoyed years and years of Firefox getting better, supporting new JS and HTML5 features all WITHOUT telemetry and with the Crash Reporting window where I can see the data that is being submitted and submit it if I want to submit it.

What have changed so much in last 5 years or so that now you have to get all this data? What is wrong with just building a standarts compliant browser that runs JS fast and has easy to understand settings (where I don't have to go to about:config to disable the WebRTC/telemetry/Pocket etc) ?

> What have changed so much in last 5 years or so that now you have to get all this data?

To be honest, a lot. Once again, this is my personal take on the matter, not Mozilla's view.

First off, browsers were a LOT simpler back then. The sophistication and complexity in a browser has grown significantly in the last decade or so.

Secondly, browsers have matured. Remember that this software category has only been around for 20 years or so. Compared to the code quality in browsers today, browsers of 10 years ago were crude and simple. As a software category matures, the low-hanging fruit dry up, so it's harder and harder to improve your product.

Lastly, competition. Firefox has the luxury of being released when the biggest competitor (Microsoft) wasn't putting real effort into its browser product. Google will not make that same mistake with Chrome.

Basically, the information we needed back then was less, because the problems were much more obvious, because the whole industry was still pretty young. Now browsers are much more mature, the ecosystem is much more complex, has a much wider user base, and the problems are becoming harder and harder to pin down.

I think it's good to annotate this with your other comment as to why Firefox has to join this competition, rather than just do it's think and disregard its marketshare:

> A perfectly private browser with no marketshare doesn't help users. A completely compromised browser with 100% marketshare doesn't help users either.

That make sense for top sites and Flash.

But for things like perf and regression? Really?

You might miss out on issues if users don't submit, but each submission is an indication of problem (because it's Firefox that decides a problem is bad enough). And you can still prioritize based on how common that problem is.

Under the "collect and prompt" scheme, you are still sampling randomly.

A random sample of users experiences perf issues, a random sample of users opts-in to the collection, you get a random-sample of data. (If you suggest they opt-in to continued collection, you might even get a continuous stream of samples from the same user.)

Yes, that data won't cover the people who don't have issues, but do you need to optimize for them? It also won't cover people who have issues but still don't opt-in, but do you think that is somehow correlated to the severity of the issue? Otherwise the data will be mostly unbiased. The variance will be higher than if you made it opt-out, but if you are doing sound statistics, you will have to handle that anyway.

The thing is, I think, that the users that opt-in to the collection aren't "a random sample", but rather "a sample of users biased towards certain profiles".

And how did they find that out?

In part, AFAIA (but I've only limited statistical training) this is a well known feature of voluntary datasets. But IIRC they've also done user studies and compared those.

If you become everything people dislike about the other browser, nobody is going to care what happens to you.

Here's what's going to happen.

You people are going ahead with this idiotic plan - because that is what Mozilla does, asks for feedback and then proceeds to ignore feedback - and you will lose another 2% market share.

The reason is painfully obvious: You betrayed one of the core principles of Firefox, which is privacy. You pissed off a lot of people which will NEVER come back because you stabbed them in the face and spat in the wound.

You also gave Microsoft and Google a freebie. Now they have something else to throw back at you: your supposed "more private" Firefox phones home with your users' browsing history (not strictly true, but people don't dig that deep into the minutiae).

Hows that stopping them from winning by default? You basically just disqualified yourself...

Make this thing optional, otherwise you are dead meat. If you can't "win" without betraying your principles, it's time to either throw down the towel and give up or just be upfront and admit that you are going to go all in, users be damned.

That last option would actually probably gain you a few users.

Edit: spelling...

I guess this is offtopic, but what do browser extensions have to do with openness, security, or privacy?

It is a factor in openness, the Browser Extension API as being developed by FF, MS, and W3c is very very limiting far more limiting than the old XUL based model

It can be a factor in security both positive and negative as XUL was very powerful and could be abused, but it also was used by some projects to enhance the security of FF or provide other security related functionality that is now no longer possible unless FF allows or builds it into the browser directly. Same for Privacy.

It's limiting at the moment because it's not finished yet. And Firefox in particular has far outpaced any third-party or industry-wide standards in adding new APIs. They've been proactive and responsive in getting feedback from addon developers while designing the APIs. I would say security addons are one of the top priorities. For example here's a blog post by the NoScript developer: https://blog.mozilla.org/addons/2017/08/01/noscripts-migrati... "I feel that Mozilla has the most flexible and dynamically growing browser extensions platform"

I will openly admit that I am skeptical of any initiative that has the Backing of the W3C, Microsoft or Google. All of which have proven they are more than willing to sacrifice user privacy and security

So since Web Extensions / Browser Extensions was started by all 3 of those entities with FF adopting them I am very very cautious of them

In Google Analytics issue last month, "legacy" uBlock could block the connection while webextension version couldn't. It can't because it is not allowed to (openness), as a result your privacy is compromised.

There are many very, very political people inside Mozilla. Some of them may even want to commit political violence. Political violence seems to be a problem that just grows and grows, so how can we be sure that it's not supported in Mozilla. These would be a very small minority of Mozilla of course, but the problem is that you don't know who it is. And it only takes a single extremist to betray your users. To get your users injured or even killed.

The same concern will of course apply to any other data harvesters, but that's for another thread

Ok, I get your point. You need the extra debugging information.

Now, here's my concern. I DO NOT want compromises. I DO NOT want to balance anything. I DO NOT want this telemetry crud on my browser spewing out my browsing history to anyone, no matter how anonymous you people claim it will be.

I just want a decent web browser.

What are my options? "Mozilla's way or the highway"? Redirect evil.telemetry.things.mozilla.org to /dev/null? Go back to elinks?

Or will there be a "disable this piece of crap utterly and completely" button somewhere not hidden under an URL? Or even better, a compile flag?

Edit: spelling...

There'll be an opt-out, just as there always has. That's not what's being discussed here. The question is whether to allow these stats to be collected as an opt-out vs opt-in.

Before you send any telemetry get informed consent by presenting the user a dialog enumerating everything that you are sending and I'd be fine with opt-out otherwise this is a dark pattern and you are getting on my shit list.

If you present the user a dialog, that is effectively opt-in, which is what already exists.

You wouldn't say anything else, so your statements don't change anything: Any company which wants to collect more data would justify it in the same way.

The main reason to collect data is monetization. People don't like to think they're being sold, so it's justified on other grounds. That's a universal. Since the way data is monetized is to track and segregate users, claims that it can be done in a privacy-respecting fashion are, therefore, specious.

There is one conclusion to be drawn here, and it isn't that Mozilla is going to respect my privacy.

Are you honestly suggesting that the only possible use for aggregate user statistics is for ads? Not for A/B testing or tracking performance regressions?

I'm saying that the behavior of someone who was collecting aggregate statistics for ads and the behavior of someone who was collecting aggregate statistics for another reason would be identical in this forum, so we must assume the worst.

Note: "planning" means "reaching out for feedback about".

Also interesting: the method they plan on using for anonymising this: https://en.wikipedia.org/wiki/Differential_privacy#Principle...

If that is not sufficiently anonymous, then please submit the reasoning why to Mozilla.

I think the burden here is backwards? URLs may contain Protected Health and other Identifying Information. If this data leaks SSL and could be sent to a 3rd party, then it makes Firefox an unsuitable client for a great many applications.

EDIT: OK. It's boolean flags (like use of flash) plus an eTLD+1 (example.org; not myname.example.org?). Even so, I believe this tracking should be opt-in with a disclosure screen that explains exactly what Mozilla is recording. Informed consent is a practice we should be promoting, even if it seems unnecessary.

They don't plan on collecting URLs, just (eTLD+1). The only real issue I can see here are users who have registered their own domain under an eTLD, and have it set to their home page.

eTLD: https://en.wikipedia.org/wiki/Public_Suffix_List

> The only real issue I can see here are users who have registered their own domain under an eTLD

Doesn't the differential privacy system described above prevent even that from being an issue?

I'm not sure if they're planning on collecting every homepage domain here, or just asking something like "Is your homepage domain in the list of top 1000 domains?". In the former case, just having the domain listed could leak information. In the latter case, I can't see any issues.

It's more like: "Is your homepage google.com? Flip a coin, if it's heads tell me the truth, otherwise flip another coin and answer yes if it's heads or no if it's tails." (A bit simplified, but that's the general idea behind differential privacy.)

They're not planning to send full URLs, only domains. Also, the system described is resistant to attacks even if the data is captured (SSL leaks). I don't have enough statistical knowledge to understand how that works, though.

My concern is that it relies on differential privacy, or privacy through deniability. Which seems like a poor fit when it comes down to submitting URLs visited, unless they plan on submitting fake URLs when the "coin flip" comes up as tails twice in a row?

Not to mention, people will tend to visit the same websites repeatedly. The entire premise of DP is that the real data will stand out from the noise, creating a compelling picture of what an individual visits on the web. How will that aggregate data be anonymized, when it is reported with (a minimum of) an IP?

In short, this still requires a lot of trust in Mozilla, even with the DP algorithm, to not do the wrong thing with the dataset. And, in my eyes, making this opt-out and not opt-in already compromises that trust.

That’s still personalized data.

Mozilla has been violating even the minimal legal standards in the EU for years, and no one cares.

It’s insanity that an organization promoting its products with privacy doesn’t even meet the minimal legal standards. We’re seeing Google Analytics tracking in parts of the browser ("Get new Addons" page, for example), without even the legally required cookie warning.

EU law is clear on this, as soon as you store any data, do any tracking, connect to any third party, or transmit anything for analytics, you have get opt-in.


Wouldn't that still leak health information? Less overall, but if any is bad, this still isn't acceptable.

Sure, except that with differential privacy, say 5% of telemetry reports would be marked as visiting do_I_have_very_bad_medical_condition.com anyway – regardless of whether they actually did.

What if a person visits 10 domains all indicating the same thing?

5%^10. Very very unlikely. Sounds very similar to "guilty beyond all reasonable doubt".

> URLs may contain Protected Health and other Identifying Information

A URL must not contain PHI. If it does, a breach has already occurred.

And Firefox is only collecting the domain names, it looks like.

What do you mean, a URL must not contain PHI? You can't prevent a non-tech minded person from submitting questions about their health to any text field linked to a form with a GET method.

I'd argue that domains are the same- there are tons of domains that clearly indicate what they're about (e.g. stop-drinking.example)

You can argue it all you want. Whoever is storing that is responsible under the Canadian laws criminally. It's probably the same if not worse in other countries (Germany, etc).

@clarkevans But submitting search input via the GET method (e.g. https://www.google.com/search?q=how+to+stop+drinking ) are a common practice in leading search engines.

> What do you mean, a URL must not contain PHI? You can't prevent a non-tech minded person from submitting questions about their health to any text field linked to a form with a GET method.

You can't, but that can't be part of Mozilla's threat model, and it's not relevant here anyway because Mozilla isn't collecting it.

And even if they were, that's not considered PHI legally. You are free to type any information about your own health that you want anywhere; that doesn't make it legally PHI, unless you are providing it to a Covered Entity.

> I'd argue that domains are the same- there are tons of domains that clearly indicate what they're about (e.g. stop-drinking.example)

This information is not legally considered PHI. As for privacy, SNI means that all domains you visit are already visible in transit, even if you are using SSL. Domain names are not considered private.

> This information is not legally considered PHI.

Do you have any sources that go into more detail?

When I've worked on PII in analytics, even TLDs were treated carefully. (obviously not the same from a legal perspective...)

> Do you have any sources that go into more detail? When I've worked on PII in analytics...

PHI is an incredibly well-defined term legally and is not equivalent to PII. Some things that constitute PHI actually wouldn't qualify as PII.

There are a lot of resources that explain HIPAA in great detail; if you want to know the specifics like here, you have to read the bill and the case law itself.

>>You can't, but that can't be part of Mozilla's threat model,

How normal everyday people actually use their product cant be a part of the threat model... Really?

That is scary...

I don't care what the legal definition of PHI is, I am concerned that Mozilla is collecting actual personal health information (if not through URLs, the domain name concern is still valid). And I know that DNS resolution is not necessarily secure from snooping, but having one extra orginazation explicitly collecting this data is more dangerous than not having one extra org collecting it.

As a practical matter, there are lots of applications that use GET for user submitted search data. Since GET requests encode user entered information into the URL and since the URL is typically found in web server logs and other tracking/history mechanisms, it is unwise to use GET for user-submitted data in applications that are concerned with privacy. However, it was not always considered unwise: REST advocates recommend GET when the underlying information representation doesn't change as a side effect of the request. Therefore, one might just as well say that logging of URL query parameters is the technical problem.

That said, when a "breach" has occurred is a legal distinction involving the control of information -- when protected data moves beyond those who have a duty to protect it. Saying that a particular technical approach creates breach is inaccurate.

I think logging is unrelevant here because it can be set up to log POST data too or not to log query string. But the problem with leaking data via referrer exists. Google encrypts (or obfuscates) search query in referrer for example.

A group of URLs tied with the IP accessing them may leak such data. Maybe not in exact violation of the law, but it would allow for a reasonable estimate of the chance of someone at that IP address having some given condition.

Any submission of data requires the transmission of an IP address, which is personal data and necessitates appropriate protection.

I very much hope that the Debian maintainers (and hopefully also the guys preparing Fennec in F-Droid) will disable such data collection mechanisms, either completely or hidden behind an explicit opt-in instead of the opt-out suggested in the e-mail.

> Any submission of data requires the transmission of an IP address, which is personal data and necessitates appropriate protection.

Do you have a citation for that broad assertion? My understanding is that this is highly variable across legal jurisdictions and even in Europe, which typically leads the way in privacy, it's not that simple. See e.g. https://www.whitecase.com/publications/alert/court-confirms-... discussing an EU Court of Justice ruling that had two requirements: the ISP can link that IP address to an individual AND the website operator can get that information from the ISP.

Within the new European GDPR framework, IP addresses are to be considered as personally identifiable information, so the concern is warranted. What's decisive when characterizing an information as identifiable or not is not the fact of being actually able to perform the de-anonymization of the information (e.g. via the ISP in case of an IP address), but the mere possibility of it.

Legally though Firefox would be allowed to collect this anonymous data from the user by having him/her send the data e.g. to an API endpoint they provide via IP-based communication, they would just not be allowed to associate the data with the IP address of the user submitting the data. In the end, it comes down to trusting the party that collects the data, at least if they don't perform anonymization of the IP address via other means, e.g. by passing the information through a third party proxy server.

BTW, GDPR does forbid to turn on such data collection by default (privacy by default), so they would be required to get the explicit opt-in from the user for that.

>Within the new European GDPR framework, IP addresses are to be considered as personally identifiable information,...

My understanding is that many of these details are yet to be settled with GDPR. The case referenced above was not interpreted under GDPR, which has yet to take effect. The definitions of personally identifiable data data rather vague, and precedent has not been set. A quick search showed conflicting opinions, but one perspective to consider is quoted below:

> In addition, businesses should note that Recital 26 to the recently adopted EU General Data Protection Regulation ("GDPR") states that the test for whether a person is "identifiable" (considered in detail above) depends upon "all the means reasonably likely to be used" to identify that person. The CJEU in Breyer did not directly consider the issue of likelihood of identification. If the BRD was not reasonably likely attempt to identify Mr Breyer from his IP address, this could potentially give rise to a different analysis under the GDPR. Consequently, it may be necessary for the CJEU to revisit this issue after enforcement of the GDPR begins on 25 May 2018.

This is a few years old, so if you know of some new decision or regulation that clarifies it would be great to know!


The GDPR does not provide a list of data types that are considered personal or not personal, instead it uses a definition which states what criteria need to be met for data to be personal and gives a list of relevant categories, which explicitly includes "online identifiers":


Now, you could of course argue that often it's not possible to infer the identity of a person given an IP address (e.g. because it is a dynamically allocated IP address by an ISP or an IP address of a proxy server through which many users connect to the Internet) and therefore store it, it would be very hard to impossible though (IMHO) to ascertain that none of the IP addresses which you store could be used to identify a specific person (what e.g. if there are 5 % static IPs in your data?). This in turn would make treating all of your IPs as non-personal data a risky business to say the least, as there will almost certainly be a way to identify at least some of your users from their IP addresses. The fact that you don't know about a particular way of doing this identification is not relevant for this.

My advice: If you do not use a very robust method for making sure that all the IPs you store are non-identifiable I would recommend not storing them at all (or at least truncating them to 24 bits, which does also not always eliminate deanonymization risk though).

GDPR was approved on 25th of may 2016 with the IP address defined as the poster above you specified is my understanding

It might not be legally protected, but that doesn't change how sensitive it is.

What's up with calling obvious stuff by "broad assertion"?

Are you saying that people can not be identified by their IP address?

Because it's neither obvious nor globally true. Legal status varies around the world and from a technical perspective an IP address on its own doesn't usually identify a person unless you have other information — account data, correlated data from other sites, etc. — and things like NAT and public wifi make that necessary to reliably link activity.

I think it's important to talk about this issue – especially the importance of not storing it long-term — but from my perspective the real concern is the industry dedicated to linking and sharing your online activity. Without that an IP has little value and with it they can deanonymize most people without using IPs.

> Any submission of data requires the transmission of an IP address, which is personal data and necessitates appropriate protection.

And requires an opt-in under EU law, which makes this entire thing even more ridiculous.

Then don't send the correct source IP address, with simple statistics gathering like this I hardly expect they require a response. It would mean there would be no personal data whatsoever.

Most ISPs filter spoofed IP addresses nowadays[1]. Even if your ISP doesn't prevent it, NAT might. You wouldn't get a whole lot of responses this way, and there'd be a strong bias because of regional differences w.r.t. filtering.

[1]: https://spoofer.caida.org/summary.php

How would you do that, though? The browser has to open a socket to something to do this, after all. And that already is a violation.

You'd just send a UDP packet with a spoofed source address and forget about it. There's no need to open any sort of 2-way connection for data that's only being transmitted in one direction (from browser to metrics server).

You could transmit the telemetry through Tor.

Tor is blocked in some places and viewed as very suspicious in others. If you're already in a place where you're trying not to draw attention to yourself, using Tor might not a good option.

This isnt how tcp/ip works

With udp/ip it would work, if none of the routers on the way to the destination filters spoofed IPs.

Wouldn't spoofing IP addresses risk the data packets being filtered out by ISPs or other upstream network providers? IMO if keeping IP addresses hidden is a concern, it'd be better to use something like TOR.

Does an IP address actually require an opt-in? And if it does, does it only apply if it is being stored?

Yes[1], no[2]. An IP address is "personal relationships" data and collecting, processing or using such data is prohibited unless allowed by law or the concerned person gives consent.

[1]: https://en.wikipedia.org/wiki/Bundesdatenschutzgesetz#Types_...

[2]: https://en.wikipedia.org/wiki/Bundesdatenschutzgesetz#Overvi...

The way I interpret this, if you don't collect, process or use the IP address beyond it being incidentally involved in the transmission of anonymized data, it shouldn't require explicit consent.

Otherwise literally everything that connects to the internet in some way would have to treated in that way, and that's not how the law is currently enforced.

> Any submission of data requires the transmission of an IP address

Not true. Tor has demonstrated that it's entirely possible to transmit data over the internet without revealing your IP address to the party you're transmitting to.

At a heavy latency cost, and under dubious asumption that regular people control all exit nodes.

Actually no, it doesn't matter who controls the exit nodes as long as your only concern is keeping your IP private. (Exit nodes can indeed do bad things to unencrypted traffic, but that's irrelevant for this use case.)

Latency also doesn't matter here; this telemetry could take 5 minutes to reach its destination and it wouldn't matter, so long as the data is eventually received.

Hmm, I've never thought of that. I like your idea.

The point is not whether this method is "sufficiently anonymous", the point is:

I don't want my browser to collect any kind of data.


No, seriously; why? I don't get this mentality at all.

Let's ignore the exact implementation here for a moment, and assume that Firefox is somehow magically doing this data collection in such a way that it is guaranteed the data collected cannot be traced back to you as an individual. (E.g. "sufficiently anonymous".)

What problem do you have with that, specifically? How does this harm you in any way?

If it's through "secure" code, you can't always guarantee it in the future nor what Mozilla does with this data in the future. Also it doesn't set a good precedent to proceed in this direction of opt-out behavior in Firefox.

In this case, it's not through secure code though, it's through the nature of the data being sent itself. Differential privacy is meant to ensure you _can't_ use the data to make any sort of inferences about individual users; only about users as a whole.

That's kinda beside the point here though, as the GP seemed to be against collecting this data _regardless_ of whether or not it's anonymized or not. I'm interested in hearing why.

And I don't get the mentality where I should justify why I don't want my tools to report what I am doing.

I'm ok with testing things and sending feedback, but when I switch to a production environment, I just want my tools to behave like my tools, not the testing farm for somebody else.

Why should I prove to you that it would harm me? I just do not want it, it should be enough.

I don't know, to me that's a bit like running a torrent client and expecting the default setting to be "no seeding, download only". After all, the torrent client is _your_ tool, right? Why should it do anything except the bare minimum required to download the files you want? Why waste upload bandwidth on something that doesn't benefit the user?

Obviously that's ridiculous, right? If the default setting was to not seed, torrent clients would be much less useful for everyone involved. Browsers sending usage stats are much the same way. While no individual user benefits from _their machine_ sending those statistics, it's better for the user population as a whole if the default setting is to send them, since those stats help the browser vendor build a better browser. (And before you cry "privacy", remember that in this context we're talking about a situation where the statistics are being sent in a way that is "sufficiently anonymous" such that privacy isn't an issue. See the GP.)

So while I agree you certainly should have the right to disable sending usage statistics if you wish (just as many Torrent clients let you disable seeding), expecting that to be the default setting is a bit strange.

It’s a question of consent.

Imagine I came to your house, and photocopied all your documents.

Don’t worry, I blanked out the name, so it’s completely anonymous, and everything is where it used to be.

Would you be okay with that?

I certainly wouldn’t.

Making this opt-in or opt-out is a question of consent, and choosing opt-out shows that you don’t give a flying fuck about me, and only want your own benefit.

If you continue to post uncivil comments to HN we are going to ban you. You've had many more warnings than usual.


What is uncivil in this comment? I’m sorry, but I don’t see anything problematic in there, and I’d say the same to anyone’s face IRL, given the same circumstances.

There is a swearword used, but it’s not in the context in any way uncivil, as the plural "you" that it is referring to is an abstract person, a hypothetical entity – not any actually involved person. (In this case, the potential future group of people at Mozilla who might decide to override an explicit choice I made for their own convenience)

Profanity isn't an issue on HN but "you don’t give a flying fuck about me, and only want your own benefit" is not, to my ear, the sort of thing one says to an abstract entity.

Have you also read the "and choosing opt-out" before that?

The topic is a choice that Mozilla plans to make, and is questioning users, and more about.

The decision has not been made.

My argument is that, if Mozilla (and whatever users Mozilla asks to give an opinion), choose to override the current decisions of users who do not want telemetry, and require a new hidden opt-out, then that would be proof that they (as group) don’t really care about the users choices.

The user I was talking to has no power in making that choice, nor do I. Nor is all of Mozilla making that choice.

I was using "you" with the meaning of the German word "man", (I’m natively German): one; you; they; people (indefinite pronoun; construed as a third-person singular).

Ok, probably a linguistic misunderstanding in this case. But you've straddled the civility/incivility line so often in your comments here that I still wish you would take a few more steps in the right (for HN) direction. I bet translation issues would cease to be a problem if you did.

Then why promote the browser as such?

Remove all "we respect privacy" from the advertisement.

It's misleading.

Because data collection that is "sufficiently anonymous" _does_ respect privacy. If it's completely impossible to tie the data collected to any one particular user, how does the existence of that data compromise privacy in any way?

We can go into an amazing yet pointless semantical argument of what "privacy" means. But let's look at this from a different perspective:

I like your faith. However, if this change goes in, and the capability is there, it will get misused. Because, statistically that's how these things go on this planet+capitalism.

Actually, the way they're implementing this, even if Mozilla decided to try to misuse the data in the future, they still wouldn't be able to, since the data itself is "sufficiently anonymous" (unless of course you want to argue otherwise, like the root comment was suggesting you do).

Or are you saying you're worried that they could _start_ collecting non-anonymized data in the future? If so, I don't really get that argument either. People always have the ability to change what they're going to do in the future, Mozilla deciding now not to collect this data wouldn't change that.

Ok, we changed the title from "Firefox planning to anonymously collect browsing data" to (hopefully more representative) language from the first paragraph of the article.

Can you tell me the URL of the page containing the form where I can submit my feedback?

See the OP.

This is like saying "diffie hellman key exchange has no flaws, therefore https implementations are perfectly secure".

I'm not saying anything has no flaws, just that if it has, it would be nice if you would inform Mozilla about them.

As someone familiar with differential privacy, and (somewhat less) with privacy generally, here are some suggestions for Mozilla:

1. Run an opt-out SHIELD study to answer the question: "how many people can find an 'opt-out' button?". That's all. You launch this at people with as much notice as you would plan on doing for RAPPOR, and see if you get a 100% response rate. If you do not, then 100% - whatever you get are going to be collateral damage should you launch DP as opt-out, and you need to own up to saying "well !@#$ them".

2. Implement RAPPOR and then do it OPT-IN. Run three levels of telemetry: (i) default: none, (ii) opt-in: RAPPOR, (iii) opt-in: full reports. Make people want to contribute, rather than trying to yank what they (quite clearly) feel is theirs to keep. Explain how their contribution helps, and that opting-in could be a great non-financial way to contribute. If you give a shit about privacy, work the carrot rather than the stick.

3. Name some technical experts you have consulted. Like, on anything about DP. The tweet stream your intern sent out had several historical and technical errors, and it would scare the shit out of me if they were the one doing this.

4. Name the lifetime epsilon you are considering. If it is 0.1, put in plain language that failing to opt out could disadvantage anyone by 10% on any future transaction in their life.

I think the better experiment that is going on here is the trial run of "we would like to take advantage of privacy tech, but we don't know how". I think there are a lot of people who might like to help you on that (not me), and I hope you have learned about how to do it better.

This is ridiculous. I use and recommend Firefox for pure ideological reasons, because frankly, Chrome/Chromium is miles ahead of them.

If they start opt-out tracking using the same approach as Google I do not see any reason to use it nor install it for my friends and family. That's some data for you, Mozilla.

Your stance is paradoxical, because Chrome has been improved based on data mined from users, and not in as nearly a considerate way as Mozilla is proposing.

You want Firefox to succeed as a browser, but to be able to better compete it needs better usage data.

Wouldn't you prefer for Firefox to be the best browser available, AND also be considerate towards your privacy rights?

Company A does bad thing which benefits them massively, allowing them to have a better product. Some people dislike that approach and flock to company B which promises not to do the bad thing. Now company B start doing the same thing 'for better good' but promises to 'keep it moderate'.

At this point why would anyone stay with the company B which broke its promise once, just in the hope that it won't break the promise again? It has already lost the trustworthiness and it also has the worse product. Might as well use products from company A.

This is specious reasoning. Company B is not doing "the same thing" at all. Company B is collecting data, but not only is it far more limited (e.g. collecting domains instead of URLs), it's done in a way that protects privacy. You can't just throw up your hands and say "well, they're collecting some data, therefore we may as well just throw away all privacy protections and use the browser by the company whose business model is based on collecting all the personal information they possibly can".

Privacy is not a boolean.

Opt-Out vs Opt-In is a question of consent. Do you value your own benefits more than my own right to determine my own life?

If yes (and that’s what you get when you choose opt-out), then we’re done. There is no gradual change there, it’s a binary question if you value the user or your own benefit more.

The world is not black & white. If Firefox starts collecting a small amount of data in a privacy-sensitive manner and makes it opt-out, that does not at al make it equivalent to e.g. Google collecting all the user data it can.

But it means they have an equal value system: Convenience being always more important than Privacy.

And that’s strictly incompatible with mine.

Except that's not true. Firefox collecting a small amount of data in a privacy-aware manner does not mean "convenience being always more important than privacy", not by a long shot. I don't understand why you're insisting on such an absolute black & white viewpoint.

Firefox being so arrogant to presume I want to collect the data by default is a very rude thing. You don’t just assume someone wants it, and do it for them, especially if it might hurt them.

First ask, then fuck up. Is that concept so hard to understand?

If you’d do that IRL to someone they’d never talk to you again, it’s the same with Firefox if they do this.

Firefox collecting data in of itself isn't at all rude, or problematic. Nobody cares if Mozilla has "data". What they care about is if they collect data that violates the user's privacy. The whole point of RAPPOR and differential privacy is it's an approach to collecting data that is supposed to preserve user privacy. So the real question is, does it preserve user privacy sufficiently that it's ok to make something opt-out instead of opt-in? But that's not what you're complaining about, you're just ranting because they're collecting data, period, without actually understanding the extent to which your privacy is being violated (if at all).

And of course this all started with you saying that you may as well switch to another company's products, a company which you know violates your privacy quite significantly. You still haven't explained why Firefox collecting a small amount of data in a way that tries to minimize any privacy violations means you should just give up any semblance of privacy and use a product that tries to collect as much personal information as possible.

First off, I’m a developer myself. A developer in the EU. In Germany. Working on open source. In fact, on open source with goals to preserve privacy.

I’ve dealt with these issues before myself.

And I understand well what they collect, how, and why. I understand how painful it is when you have no data on what is used, and how, or not even crashreports.

But there also is a limit to how far you can go, and where consent is required.

And when transmitting anything, or collecting anything, consent is required.

You could make it dependent on situation. If a performance issue occurs, show a bar: "Is this website slow? Click [Here] to submit a report so it can be improved. [Details] [X] Always submit".

This gives the user a far better understanding of what is submitted, why it is needed, it is contextual, and it is still opt-in (but with far better conversion)

If Google does not respect my privacy, why is the proposed way to gather information based on Google's approach?

And if the way Mozilla gathers data is much more considerate, what results can I expect from it? Better parallel requests and data fetching, hardware acceleration, etc are all features that are missing for me as a Linux user. They don't need my dataset for that, it's probably all in their bug tracker.

Wouldn't you prefer for Firefox to be the best browser available, AND also be considerate towards your privacy rights?

I prefer absolute privacy over some minor advantages on irrelevant webpages.

How do you even think this system would work in restricted environments such as governments where even the presence of code that could collect data is an absolute no-go?

Your stance is paradoxical, because you already stated Chrome is willing to go farther to improve their browser.

How is Chrome miles ahead? Both seem to work just fine for me, neither being noticeably faster or better. I like a couple of minor Firefox features, so that's what I stick with.

Firefox -> Chrome is a sidegrade at best. Literally only reason I use it is because I got fed up with weird little CSS quirks I couldn't replicate in IE or FF, but were very present in Chrome.

As you may have read in the feedback request, Mozilla is proposing to use differential privacy – differential is very different from tracking.

For more information, see https://en.wikipedia.org/wiki/Differential_privacy for instance.

My point is not the way you label gathering information from your users but rather that it is about implementing something Google proposed.

If the mechanism works, fine, but why should I use Firefox over Chromium then? Opt-out data collection is in violation to my core beliefs and what I believed to be Mozilla's principles.

Collecting data without asking the user about it is - to me - in violation to the very definition of privacy and calling some way to anonymise data (who guarantees that the cryptographic approach to this is not obsolete in a few years?) "differential privacy" is at the very least dishonest.

Existing telemetry in Firefox already works on an opt-out basis. This changes nothing.

Existing telemetry dosent collect browsing data

So, I read that, and already see two problems. One - DP provides privacy by deniability. How does that apply to URLs (or even just domains)? For a domain to show up, I have to have visited it (unless Firefox will report back random domains).

Two - DP is only really private over a small data set per individual. If DP were enabled for even two days, you could get a very accurate picture of the sites I visit, since a majority of the domains reported would be necessarily be accurate values.

One: I'm pretty sure that the idea is to report back random (existing) domains, yes.

Two: That's an interesting question. You'd need to ask it to someone with more domain knowledge than me.

> I'm pretty sure that the idea is to report back random (existing) domains, yes.

Here's a concern that comes up from that implementation option: any outliers from the set of existing domains (which would likely simply be implemented as a list of strings) would immediately be able to be called out as a "True" value, while a single reporting of a domain could reliably called out as a "False" value. Unless, of course, you choose a randomization algorithm which exhibits a very strong clustering trait.

You could also limit reports to those domains which are in the whitelist, but that would voluntarily neuter the reporting; something they seem less-than-eager to do.

Ultimately, it will all come down to the implementation details, which are unlikely to be available until after the opt-in release, and auditable by a remarkably small number of people in the open source community.

RAPPOR uses a Bloom filter. It doesn't report the domain itself; it reports (a corrupted version of) a handful of bits of a hash of the domain.

Good info, thanks!

The single largest advantage of Firefox over other browsers is that despite all odds and occasional missteps they managed to respect users' desire for complete privacy.

  For Firefox we want to better understand how people use our 
  product to improve their experience. 
Sure thing. But the fact that they are unhappy that some (many?) people are opting-out from the data collection is merely a sign that they don't want to understand why people are using Firefox in the first place. By opting out from the data collection people effectively tell them over and over again that they don't want for Mozilla "to understand how they use Firefox" or "to improve their experience", not at the expense of their privacy.

No phoning home. No telemetry, no data collection. No "light" version of the same, no "privacy-respecting" what-have-you. No means No. Nada. Zilch. Try and shovel any of that down people's throats and the idea of Firefox as a user's browser will die.

> No phoning home. No telemetry, no data collection. No "light" version of the same, no "privacy-respecting" what-have-you. No means No. Nada. Zilch. Try and shovel any of that down people's throats and the idea of Firefox as a user's browser will die.


And now this :-(

I have been using Firefox since before it was called that. I develop my apps in it, even though most of my colleagues have switched to Chrome years ago. Even though it is (or was for a while) slower than Chrome for things like Canvas.

But I use because I believe in Free Software. But Mozilla keeps disappointing. DRM, bundled 3-rd party apps, analytics, tracking... It is just so very sad. :-(

Also, I have 17 add-ons installed (11 active). At present, of these 17, only 2 will continue working after November when the switch to WebExtensions is enforced.

Where to go from here?


Mozilla fought DRM until the very end and lost. If Firefox is to have any chance at remaining a mainstream browser it needs to support Netflix and the likes. You can't seriously blame them for this, because they are damned if they do and damned if they don't.

EME is implemented as unintrusively, securely and privately in Firefox as possible. No DRM is downloaded or run on your computer until you specifically consent to it, and the DRM components run in a sandbox.

> Mozilla fought DRM until the very end and lost. If Firefox is to have any chance at remaining a mainstream browser it needs to support Netflix and the likes. You can't seriously blame them for this, because they are damned if they do and damned if they don't.

Yes I can, and I will, because they sold out. They sold out their principles for the sake of market share. (And looking at their marked share, fat lot of good that did for them anyway.)

I'd suggest you research the topic of negative and positive liberty. I'm all for a free and open source experience but what about the liberties of content creators? What about my right to as a user to be offered content with the knowledge that I won't and don't want to know its inner workings as long as it's passive non-malicious code?

I will be happy to do so, once consumers and content creators (and specifically the companies they sell rights to) are on a level playing field in terms of legal protections and lobbying powers.

This isn't about the money or power you or I have. This is about freedom to distribute content and the agreement between the user and the creator while you're asking the browser to be the ideological arbiter of this transaction. If you're all for freedom, you should logically see that not including the DRM option is inhibitive of both the user's and the creator's freedoms. As a browser, it should be ideologically agnostic to my downloading of an executable or zip file that goes against freedom, privacy and all that we hold dear and it should still be my right and freedom to download and view as I legally please. The Richard Stallman approach does have its limits.

I don't buy that argument, sorry. Because it requires something as anti-freedom as DRM to exist in the first place.

> Yes I can, and I will, because they sold out

Excuse me, but did you support Mozilla with time/money?

> They sold out their principles for the sake of market share.

12% is still better than 1%, and the thing that mostly changed the landscape was the fact that mobile Internet heavily disfavors Mozilla (e.g. Android ships with Chrome, iPhone with Safari), and Google has a heavy advantage when it comes to advertising and engineering.

> Excuse me, but did you support Mozilla with time/money?

Yes, I have done. Thank you for the snark.

That's different. Netflix is optional. The AdSense and the telemetry discussed aren't.

Also, Firefox has been adding things like Pocket while removing simple options that have been part of Firefox since the beginning claiming that it should be part of an add-on (like the option to disable javascript) and they are also adding privacy invasive options like "Block dangerous and deceptive content"... Firefox is still my favorite but that can always change...

Even worse, in that discussion, it appears that there's a backdoor built into Firefox so that WebExtension-based ad blockers can't block Google Analytics. Only old-style add-ons can block it.

"It's as if the order to block/redirect the network request was silently ignored by the webRequest API, and this causes webext-based blockers to incorrectly and misleadingly report to users what is really happening internally."[1]

[1] https://github.com/mozilla/addons-frontend/issues/2785

This is a specific issue with that preference page. You can easily observe that the WebExtension version of uBlock does block Google Analytics, just not on the about:add-ons page.

There are probably security reasons why add-ons can't modify about:add-ons. Imagine an add-on that could hide itself by modifying that page.

Please don't spread FUD.

I'm not really sure what your concern is here. Let's assume for a moment that Firefox's implementation of differential privacy in this scenario is completely correct, and that as a result it's completely impossible (even in an information-theoretic sense) to learn anything about any individual user using this data; only about many users in aggregate.

In this scenario, how exactly would Firefox's actions here compromise anyone's privacy?

Why are they not letting people decide? If it is not harming anyone's privacy, and they make it clear that it isn't, then what is the problem with letting people opt-in to it?

Instead, it's telling that they are choosing to force people to opt-out. They know that their users don't want this, but don't care.

Opt-in inevitably results in data being heavily biased in favor of the small minority of users who go out of their way to opt-in. For some stuff that's fine, but for certain types of data you really do need a broad, unbiased sample of users in order for the data to be at all meaningful. (Usually to answer questions like "What percentage of users use x feature?" Or "What level of jank does the average user experience on facebook.com?")

They still _are_ planning to let people decide for themselves whether to participate (via opt-out), they're just using a default that's more likely to result in unbiased sample data.

Again though, what's your actual concern? Provided this feature doesn't compromise anyone's privacy even _if_ its enabled, what's wrong with having it be opt-out?

I have no way of knowing how this may or may not compromise my privacy without a deep understanding of the techniques being used. I am meant to trust Mozilla and hope that they haven't overlooked some weakness in the algorithms used. The obvious security choice is to not add this feature in. The 'Provided this feature doesn't compromise anyone's privacy' is a fantasy, because no-one can be sure of that.

But that's true of _any_ new feature that gets added to Firefox. Anytime you change code, there's a chance you could be creating a new vulnerability that compromises users' privacy or security in some way.

If, as some commenters here [have suggested][1], this telemetry would help improve Firefox by significantly reducing the amount of time it takes Mozilla to fix bugs and performance issues in the browser, what makes you think that's not worth the risk when other features (such as the performance fixes themselves) are?

[1]: https://news.ycombinator.com/item?id=15072157

It's obviously far, far more likely in code that is designed to send my browsing habits to a 3rd party (in whatever encoding). Do you not see this, or are you just trying to extend out these arguments to some ridiculous extreme for the sake of it?

I don't know what level of risk this implementation carries with it. Probably more than a performance fix to the JavaScript interpreter, yes, but is it really a significant enough risk to make this feature not worth implementing? Maybe it is, maybe it isn't; I honestly don't know.

You just seemed to be arguing that _any_ amount of risk would be too much, which in my view is ridiculous since, as I said, all new features carry with them some amount of risk.

You just seemed to be arguing that _any_ amount of risk would be too much

Unfortunately that's exactly the kind of thing I was talking about, extending arguments to ridiculous extremes.

I have never said any amount of risk would be too much. In this particular instance, I think the risk and the unknowns are clearly too much.

> In this particular instance, I think the risk and the unknowns are clearly too much.

But why? I don't claim to know enough about RAPPOR to say for sure that the risk _is_ worth it, but it seems a little presumptuous to claim it isn't without knowing _anything_ about the project or Mozilla's proposed use of it.

That's why I assumed you were arguing that _any_ amount of risk would be too much; you didn't include any sort of analysis of the risk/reward in your previous comments, and without knowing the risk the only way to conclude this feature is definitely _not_ worth it would be if you already considered the acceptable level of risk to be zero.

Well, the alternative is not a Firefox without telemetry, it's Chrome. If Firefox can't do what it needs to do to stay relevant it's going to die. Developers are already treating Firefox as a second class browser, so this is not an abstract threat.

If I want a browser with telemetry, I can just as well use Chrome.

It’s Firefox without telemetry, or no Firefox at all.

Software was built for decades without this data and can continue to be built without this data for decades to come.

>>Provided this feature doesn't compromise anyone's privacy even _if_ its enabled, what's wrong with having it be opt-out?

The only Anonymous Data is data that is never collected. If they collect data it is a violation of privacy.


Because opt-in data is inherently biased and is a terrible indicator for common user behaviour.

It would, however, be useful data for the common user behaviour of people who opt in to tracking.

This doesn't really seem unreasonable to me. Obviously part of the inherent cost if not wanting to be tracked is going to be not having your raw user data included in evaluations of what people want.

Differential privacy does not ensure complete information theoretic security as you say. There is a parameter ε that determines the amount of privacy, and in this case you do not get to set it, somebody else does.

Interesting point. Admittedly, my understanding of differential privacy is very rudimentary, but isn't that only a risk under the assumption that you can ask the same user the same question multiple times, and get a new, independently chosen answer every time? If you can only ask each question once and every subsequent time you ask you just get the same answer, is that not secure in the information theoretic sense? Perhaps there's some other factor I'm missing?

> in this case you do not get to set it

Nothing's been decided yet. If this is something you want to advocate for, maybe consider suggesting that in the thread linked in the OP?

You are speaking of perhaps Google's RAPPOR protocol specifically, in which answers are sent through a series of BSC-like channels. These channels introduce noise, meaning the input signal is degraded, but by no means is it gone -- otherwise no statistics could be collected. Multiple independent reads would be an obvious attack; actually it's a form of repetition coding; but there are many other coding strategies against noisy channels -- there is an entire field dedicated to that task alone. To contrast, encrypting with a one-time pad is information theoretically secure.

Attacks aside, the point is really that in this age of statistical machine learning we should be vigilant against even this sort of data collection. A leak is a leak. Ideally people can opt into providing just enough information for the statistics they want to participate in and no more; realistically, more is always collected.

Ah, fair point. I guess it's incorrect to say it's impossible to learn _anything_ about a user as an individual using data generated using differential privacy. Just that what you do learn is more of a small statistical possibility than a sure thing. (E.g. "The user visited this site." vs "There is a 5% higher than average chance the user visited this site.") And that's even assuming you already know who "the user" is (which certainly isn't a given).

That is a massively unwarrented assumption, and the burden to show things are otherise is on the party that wishes to push these changes.

Fair point. What would you accept as sufficient proof that their implementation is correct?

If your answer is "nothing" then I think you're being unreasonable. Firefox risks compromising security/privacy with _every_ new feature they implement, not just this one, and it's clear from [other comments][1] in this thread that this feature is just as important for the overall functionality of Firefox as any other feature would be.

[1]: https://news.ycombinator.com/item?id=15072157

> I'm not really sure what your concern is here.

You must be kidding me.

> Currently we can collect this data when the user opts in, but we don't have a way to collect unbiased data, without explicit consent (opt-out).

That to me suggests the problem isn't that too many people are opting-out, it's that not enough people are opting-in.

It's not even that not enough people are opting in, it's that the people opting in are "people that would opt-in", i.e. they match a certain profile that makes them not representative of the average user, and thus less good sources to draw conclusions from. Because presumably, the users who opt-in are tech-savvy users who actually read dialog windows presented to them, and thus behave very differently from the average user.

Clearly those users that dont choose to opt in are wrong, and mozilla needs to make this choice for them...

This trend towards parentalism in software, especially software that is supposed to be user driven is frankly a steaming pile of garbage.

If you have any shred of pretense of being pro-privacy and pro-user dont do this mozilla.

It's more that a lot of people really don't care one way or another, and will neither go out of their way to opt-in or opt-out.

Additionally, it's not that Mozilla just disregards user privacy here: differential privacy being used would mean that no user has to reveal their private information, but looking at all the data in aggregate would still allow Mozilla to gain useful information on how to make Firefox better.

So, let me see if I can follow your argument.

Because most people don't care, it was decided to implement a feature that is flat out contrary to people caring.

Management decisions like this don't exactly inspire confidence about the future of the browser.

The people that don't care one way or another are mostly using Chrome.

> What we plan to do now is run an opt-out SHIELD study [6] to validate our implementation of RAPPOR.

IMHO, this is a bad idea. Many people I know already use Firefox because they're weary to give Google (Chrome) all their data.

Firefox should make this feature opt-in only.

This might finally prompt me to start compiling Firefox for all my devices, or at least evaluate some of the high profile forks.

It's not just about the data, it's about the lack of consent. If you just ask people for permission on the initial startup, I'm sure most people will be fine with enabling it. Last time I installed Firefox, it just showed a tiny bar at the bottom of the window, which is pretty easy to miss. I'd expect fewer dark patterns from Mozilla, that's the kind of shady behavior you see coming from Microsoft. I always try my best to disable or block anything which phones home without explicitly asking for consent.

Tunnelblick [0] is a good example of this being done well. On the initial run they ask if you want to enable automatic updates. It includes the option to disable sending anonymous system information, as well as including a disclosure widget with a brief explanation and a table showing the information that would be sent. [1]

[0] https://www.tunnelblick.net

[1] http://i.imgur.com/tWQX5aB.png

> Firefox should make this feature opt-in only.

I agree, but note that they are explicitly trying to get more info than they can from the small, biased sample that is users opting in.

Then they should fix that by asking in the GUI politely if you are willing to share and also explain how-to disable it easily.

Just starting to collect your browsing data is a bad idea (tm) especially if your main claim is "more privacy".

> they are explicitly trying to get more info than they can from the small, biased sample that is users opting in.

Maybe because most people using Firefox use it precisely because they don't want the browser vendor to track their behaviour?

I wonder how the Torbrowser folks will deal with this.

They claim is biased but is it really biased? How do they know? I think this is just making up excuses so they can collect more data.

They get good enough data from the people that have volunteered it. I don't know what makes them think it's biased but I seriously doubt that is true.

Because, usually the kind of people that’d opt in are techies/power users or work (volunteer, or paid) for Mozilla themselves. Let’s say only 1% of your userbase opts in to this, how is that not biased? (As it currently stands, I believe this is a further optin in a tucked away menu).

Then they should make the possibility to opt-in more prominent, instead of switching to opt-out with the option tucked away in some menu where only techies/power users will disable it.

The Mozilla user base is already biased. Plus they can run experimental test fixtures to canvas sites. This sounds like a solution looking for a problem.

>They claim is biased but is it really biased? How do they know? I think this is just making up excuses so they can collect more data.

Informed, constructive opinion there.

One clear sign of the bias is that the crash rate of the browser goes up massively every time a new version transforms from beta to release. Clearly, it's not renaming that string that makes the browser crash. The populations are just fundamentally different.

To give an obvious example, beta users are overwhelmingly more like to have up to date video drivers. (Which can be seen in crash reports, but is also very logical).

Absolutely. If this is helping users, it should be easy to convince them to turn it on.

That's not how it works. Most users don't care and will simply use whatever the default is; and when it comes to anonymous usage statistics, "most users" is _exactly_ the group of people you want to be collecting them from; otherwise your results will be skewed heavily in favor of a small minority of power users.

I aways have a problem with this aurgument.

Most users do not care because they do not understand the true ramifications of their not caring. It is not like that looked at all the data, then made made an informed choice to share everything.

FF should be at the heart of caring for users privacy EVEN IF THEY THEMSELVES DO NOT.

The average person does not understand technology, how much data they are leaking about themselves and how this data can be used against their interests

Taking advantage of that ignorance for any type of gain is unethical IMO, most companies willfully exploit this collective ignorance Mozilla should be better than most companies.

Why would optimising for the power users be wrong though? In most cases if it's good enough for the power users who tend to break things more often than regular people it is perfect for the regular users.

Quite the opposite if the focus too much on the regular users they might get too much noise and never notice issues in the more complex features that only power users tend to use.

You want the heavy users of your product sending in reports not the average Joe because he is less likely to even notice a issue.

Higher level features are less likely to be covered by tests and more likely to break just because of their complexity however you wont have many average people using them.

Because there's much less power users than normal users, and browsers that only cater to power users are useless because they end up not working on any websites. That's webcompat for you.

Former Opera people can tell a few stories there.

Precisely. Most people don't have an explicit preference. And collecting data on every possible human would give better results -- useful even if they don't plan on justifying running any specific test. We should probably use our expertise in computer networks to create universal, unjustified surveillance. As long as there is an opt-out option (hopefully we can use a complex tracking method so people don't understand the implications of not opting out -- oh, wait, RAPPOR already does that! Mozilla really stepped up their game here).

[EDIT: Firefox branding used to use the word privacy a lot. I can't find it on their website much at all anymore.]

When I browse to firefox.com I get [1] which has this text:

  More privacy
  Firefox doesn’t sell access to your personal information  
  like other companies. From privacy tools to tracking 
  protection, you’re in charge of who sees what.
  Here’s how Firefox protects your privacy
So yes they still advertise with that as one of the major features.

[1] https://www.mozilla.org/en-US/firefox/?utm_medium=referral&u...

Indeed, I didn't find the subpage until later. I also like how, it isn't that you _have privacy_, but _more privacy_, because access isn't being sold like other companies. Someone must have noticed that they should only make promises they'll keep and toned down the language.

For instance, from the same page 8 years ago: "we have experts around the globe working around the clock to keep you (and your personal information) safe."


Or this quote from the equivalent site 6 years ago:

"And, as a non-profit organization, protecting your privacy by keeping you in control over your personal information is a key part of our mission."


> Most users don't care and will simply use whatever the default is

These users will be installing Chrome, not Firefox.

I hope not. If that's true, then Firefox is in serious trouble. There aren't nearly enough power users and privacy enthusiasts around to make Firefox a significant player in the browser market all on their own.

Why not?

The switch barrier is non existent. Most of the replies in this thread are ideological. Nobody is arguing CSS rendering speed comparisons and such.

People use Firefox because they like it.

People are irrational but like is huge. Toyota over Hyundai. Vacations at the sea instead of skiing. Firefox over Chrome.

The like is a habit, but if all things are equal and free, a very flexible one.

That’s exactly why Firefox is around 5% global marketshare. Only the powerusers are left.

And this move will also drive them away.

While I do understand the allure of collecting this kind of data I find it highly disturbing to see this from Mozilla.

I think not having perfect information about the users is a trade off that should be made in order stay an alternative to most other browsers. There are still ways to get more data by other means, though. When it comes to most visited websites, for instance, the alexa ranking should give a good, if not perfect, idea.

Just want to add a little volume to the general opinion here that collecting user data, no matter how anonymous, is a terrible idea for a product whose only appealing quality is that it respects its users privacy.

Data is both highly alluring and addictive as evinced here by Mozilla potentially willing to shoot itself in the foot to get some. What's to keep this from becoming a frog in a boiling water kind of situation? How can I trust that Mozilla is going to adhere to their own stated standards? The easiest answer is that I won't have to because I can just use something else. Personally, the only reason I use Firefox is because it's slightly less convenient to set up a secruity-patched version of Chromium.

Other people in this thread have made the excellent points of the fact that not enough people opting in to data collection is in itself a critical piece of data. Moreover, things such as "Which top sites are users visiting?" can be answered by looking at data from page ranking services and then they can go to those sites on their own testing equipment to answer their other questions. A little investment in acquiring this data by not spying and maybe getting a wider array of testing equipment is probably less costly than the potential for loss in market share that they're already struggling to hold.

In the end Mozilla is simply going to go through with it and there's nothing we can do about it. Just like with the killing of the XUL plugins - the company simply didn't care about the outcry. I mean why would they? The amount of people that cares about stuff like 'customization' or 'privacy' is slim.

So we will toothlessly complain but then the changes will be shoved in our throats, because obviously why would one care what the non-targeted demographics whines about. And of course it will be framed as being 'for our own good' and half of the people complaining with just deal with it, just like the majority already does.

The good news is that the project remains open source. The upcoming XULpocalypse already was set to make the existing Firefox forks permanent, so those of us that care will be moving to one of these anyway. Presumably these will not contain telemetry code.

I generally trust Mozilla, but I really don't understand what they are going to get out of the data. Their explanation leaves me scratching my head. Perhaps it's simply because I don't work on browsers?

How does seeing which sites users use that need Flash drive their decision-making. Either they support Flash, or they don't.

And- ditto for "Jank" (not sure I understand that term, frankly- why is it capitalized?). Some developers don't optimize well- how is Mozilla going to use this? I think they do a good job over on MDN.

I guess I'd like to be sure I understand what problem they are trying to solve. Maybe they feel like without understanding their users they can't keep up with Chrome. I see people talking about how good Chrome is. And I must admit- it is sweet for me too. But that may be because (1) I don't have it loaded up with add-ons like I do Mozilla and (2) they have optimized for certain sites like youtube and gmail and I just can't get Firefox to work all that well on those sites.

But I'm not convinced that they need my data to fix that.

EDIT: On the other hand, Chrome seems to lose my passwords on every upgrade so it won't be my main browser until if fixes that little issue, which is going on, what, 5 years now?

(Disclaimer: I work for Mozilla.)

"Jank" is our internal term for slow, non-responsive interaction with the browser (the capitalization of it in the original message is a little peculiar). If you click your mouse button, and then a second or more later, the item that you were clicking on the screen responds? That's jank. That input form that's not keeping up with your typing? That's jank. And so on.

We can (and do) collect statistics on how much jank people are experiencing, and we can look for ways to improve those statistics, but knowing what particular sites (not complete URLs, just eTLD+1 sites) jank occurs on is much more actionable. Browser developers can go visit particular sites to experience and analyze the jank for themselves, or we can see what janky sites are particularly popular in a given region and focus our efforts on improving those sites--either by doing things more efficiently in the browser, or reaching out to the site developers and asking them to consider changing things to make their site work better in Firefox. (Complete URLs would be even more actionable, but we don't want to collect your complete browser history.)

The argument for Flash is similar: we can get aggregate usage numbers for Flash, and perhaps see how that correlates to jankiness (or crashiness, or what have you), but having some information on what sites are using Flash makes the data even more actionable, for similar reasons as those given above.

Spending resources optimizing Firefox for how sites implement their JS seems over the top. The heaviest sites tend to be the most mainstream anyway, imo, and those are easy to pick out.

I am a Mozilla supporter for more than a decade, but this is the wrong move.

> How does seeing which sites users use that need Flash drive their decision-making. Either they support Flash, or they don't.

I'm not saying I support this proposal or not, but here's an example of why this could be useful: Chrome was considering deprecating some API, because it wasn't supported by other browsers and they didn't think that it was used very much.

They collected generic statistics about how much it was used, but the numbers turned out to be much higher than expected, so they were considering leaving it alone. What if some fairly popular website they just hadn't heard of used the API? You might not want to break it, or at least you'd want to get in touch with the site to see if they could move to a more widely supported API.

In the end, they somehow (maybe through spidering, or somebody just happened across it in their own browsing) figured out that the high usage was due to being used by some ad network for fingerprinting. Not only was this not a reason to keep supporting the API, it was a reason to stop supporting it!

I say it over and over. You can not completely anonymize data with any reliability. Please note the qualifier, many systems work for many vectors, but any sufficiently large dataset can be used to graph habits and correlate them. Maybe there is a safe way, but I put the onus of proving it on the person implementing it.

> You can not completely anonymize data with any reliability.

Well... there's actually a field for that. I forgot what they call that field because of how niche it is but my friend at google is doing just that.

He said there are math theorem to prove that it's sufficiently anonymize.

He gave an example of how Netflix competition with the data they gave researchers were able to deanonymize it. And his job was to prevent that at google.

I can see why if you're trying to sell users data while maintaining privacy.

Mozilla currently uses Google Analytics for tracking, with "IP Anonymization" enabled.

Which, according to Google’s FAQ, https://support.google.com/analytics/answer/2763052?hl=en, just blanks out the last byte of the IP.

Which is useless, because it still includes enough personalized data as to be completely and utterly reversible.

Not to mention, if we're talking about IPv6 addresses, they often get allocated as /64 (or even /60 or /56) blocks. Such "anonymization" becomes useless at that point.

Google Analytics has nothing to do with this. As clearly linked in the mailing list, you can read the paper and source code for the client-side differential privacy tech used.

The linked Wikipedia page actually mentions that competition: https://en.wikipedia.org/wiki/Differential_privacy#Netflix_P...

The linked paper to RAPPOR is really, really noteworthy here.

In essence, Firefox will ask itself whether it visited website X and flip a coin and if it's heads, it will lie to the server and send a random boolean. If it's tail, it will not. This way there is no way for anyone (including Mozilla) to know whether you actually visited the website. But the statistics will work out such that the collective data from everyone will give a good representation of all users. I find this a neat technology to collect data in a privacy-preserving way. And there's an opt-out (opt-in won't work because it creates bias and provides messy results).

I really, honestly don't understand why people are so upset.

I liked Firefox for years. I have lived through years of shenanigans such as broken extensions, forgetting what tabs I had open because Firefox accidentally closed without restoring them, moving icons and menus around for no reason, and recently, an update on my Ubuntu that broke scrolling of pages (with PgUp/PgDown). And now this..

I am starting to think that they just don't want people to use Firefox.

Yeah, I know it's free software, so I have no right to complain. I just wonder why?

Where governments and corporations are concerned, the "why" condenses down to two simple answers: commercialization(profit) or weaponization(control)... it is easily conceivable that both will result over time. I hope Tor & EFF start giving more love to Pale Moon & it's ilk, but that may just be mitigating the inevitable death by 1000 cuts to privacy.

I'm not sure why Mozilla needs to track what sites I'm going to but if they add tracking into their browser then I'm just going to have to find another browser or at least put together a build of Firefox without the tracking. It's not so much that I have anything to hide but the fact that I'm not interested in being their product. If they can't remember that they're a nonprofit that's suppose to make a FOSS-based browser which doesn't spy on people and works well with web standards then they just need to shutdown. I know that's extreme but I'm just frustrated with the further corporatization of the Internet even on the margins like Firefox. Everything just has to be a product or a way to commodify the use thereof.

I am ashamed of the general "sky is falling" tone in this thread. I'm a privacy advocate. I know I'm not a fan of submitting gmy browser history (even domain-only) to another organization. Mozilla has always been the most privacy- and user-focused browser, and I think that history should be taken into consideration before the sky falls.

People are insulting the developers, saying Chinese owned, VPN-operating Opera would be better for privacy... there is a lot of nonsense here.

IMO this is not the most needed feature, and I would be happy for Firefox to keep in mind its reputation as a product focused on user privacy.

This might not be so bad as I expected from the title, but implementation details will really matter. If, for instance, they collect exact homepage URLs, they cannot make it anonymous (some site include username as URL components).

They are only considering collecting "eTLD+1, e.g. facebook.com or google.co.uk" so this should almost certainly not be an issue.

My homepage is my self-hosted reader, at rss.<myname>.com :)

They should ensure statistics are submitted per domain, in a way nobody can know the users of <my_name>.com are also using <kinky_site>.com

Mine is <myname>.co.uk :)

1. Any data collection at all deanonymizes the user, cf panopticlick.

2. Frankly even opt-out is not acceptable. I can't recommend any software that peridically asks users for data access, since there exist non-technical users who have a nonzero chance of clicking yes to everything. If they are related to me in some way this compromises my privacy also.

1. Any data collection at all deanonymizes the user, cf panopticlick.

This isn't true. Panopticlick collects a ton of data about your browser that this proposal will not. There has been a lot of research done in this area and we know how to collect anonymous datasets. https://arxiv.org/abs/1407.6981

Look at it from a security-conscious user's perspective: I would have to verify that:

1. The concept is sound. 2. It is implemented as described. 3. It is implemented with no bugs. 4. Mozilla is trustworthy 5. Any third-parties Mozilla involves in this process are also trustworthy. 6. All of the above will remain true.

Doing this would take a tremendous amount of both time and expertise, if even possible. If every piece of software I use makes me do this every year or so, I would get nothing else done.

In practical terms, your argument is no better than just saying, 'trust us, we're good for it', regardless of the merits of your tech. And we know Mozilla baked Google Analytics into FF's addon page, so trust is in short supply.

Except if you actually read and understood the link, points #1, 4, 5 aren't a concern. Moreover, points #2, 3, and 6 apply to just about every piece of software used.

what percentage of FF users on the planet do you expect could read a paper on differential privacy and actually verify those points, while understanding all the ifs and gotchas, and be able to tell if any of the arguments are wrong? What percentage of that elite group would actually be willing to devote the time and energy, for free, for every one of the thousands of softwares they use?

Not many, certainly. Which is perhaps why it's better for this to be implemented (since differential privacy is a known, rigorous definition for privacy), rather than to leave it up to the larger majority of users who (by your implication) don't understand it and won't be bothered to understand it.

...or you could just scrap the whole idea and not bother with it.

This is true for the user, too. If the only viable choices are 'verify claims at great cost and no gain every few months', or 'use some other privacy-respecting browser', I am going to recommend the second.

I still use Firefox specifically because of Chrome's privacy concerns and was under the impression after dropping FirefoxOS Mozilla was headed in the right direction.

It seems they've convinced themselves that the only way to improve the product is to collect data on their users, rather than continuing to push the idea of privacy - which, in my opinion, if marketed correctly, could win over a lot of users. The browser is still fundamentally awesome.

This seems like the kind of thing they could push through their TestPilot program and just market it, rather than pushing it to everyone by default. But I imagine they want to push it to everyone specifically so they can take advantage of those who are ignorant to the ability to opt-out.

I guess any browser wants to dominate the platform. It turns into another IE once it succeeds at doing so. Here comes the new boss, same as the old boss.

It seems your premise is wrong since Firefox's market share has been steadily declining for years. Privacy apparently doesn't matter to that many people.

I suppose you're right, I just know that privacy has become more of a known issue than it was 10 years ago. I guess users don't back that knowledge up with action by switching browsers.

I'm sure losing the only advantage over technically superior Chrome is going to help their market share!

Losing a major technical disadvantage will probably help.

Yeah, if you could keep your hands off from collecting my data without my consent, that would be great.

Otherwise I might as well just use Chrome. Hopefully some PR guy will pour some water on this before it turns into a dumpster fire.

I don't really understand why it is necessary? Cannot they just take top 100 sites from a rating like Alexa? And if they want to evaluate the performace, they could buy a cheap Celeron or Atom-based laptop with Windows and browse those top 100 sites. I am sure that this will give more information than any statistics.

"One recurring ask from the Firefox product teams is the ability to collect more sensitive data, like top sites users visit and how features perform on specific sites."

I would say that is none of the browser vendors business.

Please stay away with your opt-out stuff - it bothers me. Make it opt-in, always and forever.

Even opt-in is a problem. There's no way to be 100% sure the checkbox in the UI is and always will be respected. It maybe something as innocent as logic woopsie or something as nefarious as intentionally and quietly changing it to opt-out during an update. A better option, keep the data sharing code out of Firefox; opt-in to log locally; if they user decides they want to share something with Mozilla, give them instructions on how to email or upload the files.

I've been using Firefox as my only browser for at least 12 years. If they go through with this, I'll switch to something else. I don't know how they could think that this is acceptable.

I think Mozilla's boldness in stating the plan to systematically leak user information, is a tipping point. Will increase usage of Opera, Midori, Chromium. As for Tor integration with Firefox, that really is a shame, I hope Tor integrates with something else.

It's not "stating the plan", it's calling for feedback. If you disagree, provide feedback :)

Ok, boldness in "seeking feedback" on systematically leaking private information. Same concern.

People universally stated their feedback about pocket and it's still there.

Why participate in a no-op?

Because the Pocket feedback I remember was about privacy concerns, and those were addressed.

> those were addressed.

source? I never saw anything addressed other than "don't worry about it, it's for your own good"

The code in the browser is a stub. No data gets collected let alone sent anywhere until the user adds a Pocket account. Pocket updated their privacy policy, and they open-sourced the browser integration code. https://venturebeat.com/2015/06/09/mozilla-responds-to-firef...

But based on the article, it wasn't addressed until users raised a stink about it. And it wasn't just privacy, it was also closed-source, unnecessary features that should be an addon, etc.

Why would it be addressed before anyone complained? And it was planned as part of the Readability feature, which is very popular and not considered "unnecessary". But FF devs were having a hard time making a good read-it-later UI and decided to use Pocket instead of reinventing the wheel.

Edit: To be clear, I think the browser code was always a stub, and the privacy policy was modified before the feature launched as part of Firefox.

My concern is that Mozilla has been on a "Sure, you can provide feedback, but we're gonna do it anyway" streak. Pocket is quite unnecessary, and would be a great candidate for an add-on. I don't know their reasons for bringing it in, but it seems pretty cut and dry that there were a lot of users who didn't want it even after it was cleaned up, and Mozilla ignored them.

I think you're probably just underestimating how popular it is. Tagging activity tripled from 2012 to 2017. They had 10 million monthly active users in February when Firefox bought them.

Does Mozilla look at any of the other top add-ons and implement them natively in the browser? Why was Pocket so special?

Oh, it’s simple. Just comply with EU law, and it’s all okay.

That includes:

1. You can not collect anything without explicit opt-in

2. You can not transmit any data to a third party

3. If a user requests it, you have to provide all data stored about them, and have to provide a way for them to delete all of that. (And you have to provide this at least once every 12 months via letter, fax or email for free) (compare §34 BDSG)

I wonder which kind of data it actually applies to though.

All and any.

That includes IP addresses (just connecting to a socket without a user explicitly starting that action), names, emails, hashed IPs, it includes usernames, CC data, messages, interactions with webpages.

Anything that in any way is connected to a person is covered by this.

This directive is also the origin of the cookie disclaimers, which require opt-in before collecting statistics or loading any third party tracking solution.

Does it only require permission if the IP address is being stored?

Read this comment, it cites the relevant laws for Germany: https://news.ycombinator.com/item?id=15072474

But be aware, in May 2018 it all changes as the new EU GDPR comes into force, and that’s a bit more restrictive (and even applies to anyone processing or storing data of EU citizen, no matter where the processing entity is located)

You cannot legitimately suggest using Opera as a privacy-respecting alternative to this possible new Firefox.

"If Mozilla starts reading a tiny bit of user data using well-researched techniques for preserving user privacy and anonymity, I'm going to ditch my browser of 12 years and switch to the competition, which is perfectly happy to grab as much personal information as they possibly can from me in order to monetize me."

(well, unless you're going to switch to Safari, since Apple also cares about privacy, though, spoiler alert, Safari's also using Differential Privacy to collect data).

Why use the quotes? My guess is if the personb you're replying to thought of what Mozilla is proposing in the same way you think of it, they would have said so.

The quotes are because that's basically what suby said. I'm guessing that suby's not actually serious about switching browsers though, because doing so would be cutting off their nose to spite their face.

And the trend towards being a Google Chrome Clone continues...

First it was killing customization.

Now they are killing Privacy.

Why should I use this browser again?

Worth mentioning is that they are using https://github.com/google/rappor

Why is Firefox hellbent in reducing any advantage it has over Chrome and becoming an unnecessary clone.

Who runs Mozilla, do they understand why anyone would choose Firefox over Chrome?

Maybe it's time to put a spotlight on the management and decision making structures of increasingly important open source projects like Firefox to ensure they are being run in the public interest.

Because firefox as it is now is slowly dying. It looks like data collection is such a huge advantage that anyone not doing it is doomed in the long term.

Why is this proposal hosted on Mozilla's main competitor's discussion platform? That seems unprofessional at best, an irrational blind spot of the corporation that is decimating their market share with dubious marketing and monopolistic practices. Isn't an organization the size of Mozilla able to host policy discussions on one of their own domains? What are people who do not use Google products supposed to do?

By now people should be aware that it is not just the content that is important, but also the metadata. A browser that phones home with information on users' browsing habits is not acceptable to many of us, who will move to forks or a different browser altogether. This from one of the people who "doesn't complain, but just never goes back."

> "Which top sites are users visiting?"

Could someone explains to me how this information is useful to a browser vendor? It's not as if they are optimizing on a site by site basis.

They want to sell user data, plain and simple. "Improve user experience" is the standard excuse. Basically they're saying we want to collect a bunch of information we already know people don't want us collecting, so we want to make it opt-out and we'll pinky-swear it will be kept anonymous. There's no way for them to send data to their servers truly anonymously; there's no way for them to guarantee everyone who has access to the data before it's anonymized will not do something they're not supposed to. They're asking us to move away from a not having to trust anyone to trusting them by default.

I'm sure you already know all this and I'm sure people are getting sick of hearing rants about it every time it comes up. This is the second time in a week for a Mozilla product. I suspect they're trying to exhaust the ranters so they're just left with the users who don't care, "have nothing to hide", or think it's their duty to help the browser vendor squash bugs. No software or service should be trusted until it's absolutely necessary to get the job the user wants done, not the job the browser vendor wants done. It will never be necessary for a browser to send browsing data back to the browser vendor to get to a website.

Do not do that. Privacy respect is the most important differentiating point of firefox.

I wonder how valuable the data is going to be for the Firefox team? The cost in reputation may be large, so I'm guessing this must be pretty important for them.

I think it's worth approaching this with an open mind and giving Firefox at least a little bit of the benefit of the doubt. It's pretty plain to see how such aggregate usage data would lead to a better product for everyone.

How many people here use website/app analytics to improve products they work on?

It's pretty plain to see how such aggregate usage data would lead to a better product for everyone.

No, it is not. Especially not for something such as a browser which is mostly transparent to the content.

Browser performance is very closely tied to specifics of the content. That's why optimizing for e.g. JS benchmarks doesn't always result in a browser that feels any faster.

I can't think of a statement more vague than that.

Or why any external data would be needed at all, let alone why the opt-in data would not be sufficient?

Here's an example from Google. https://news.ycombinator.com/item?id=8495939

is mozilla planning on circumventing all of the methods outlined here for identifying unique users? https://panopticlick.eff.org/

Wouldn't an easy solution be to just give a right click function that says "bug on this page"? You get a nice and easy way to the user to report a page and you are non-intrusive. If you're concerned with what pages users visit the most, why not just check Alexa ratings?

I've removed all URLs from about:config and replaced them with localhost (search for "http"). This should help with privacy-related issues as long as no API endpoint is hardcoded.

I did the same, but used https://error.invalid/, as that URL is guaranteed to never resolve.

Unfortunately, it is not.


> Name resolution APIs and libraries SHOULD recognize "invalid" names as special and SHOULD always return immediate negative responses. Name resolution APIs SHOULD NOT send queries for "invalid" names to their configured caching DNS server(s).

It's only SHOULD, not MUST. And in fact, the glibc resolver (and I bet also other major implementations) does send such queries to the DNS server.

> And in fact, the glibc resolver (and I bet also other major implementations) does send such queries to the DNS server.

Using the glibc resolver as baseline is a bad idea, it’s broken beyond hope.

Try resolving http://-emmawatson.tumblr.com/, which is a valid URL under newer standards, and works on all other systems. The Glibc authors refuse to merge patches fixing this, because they disagree with the standard.

Curious; I didn't know about that TLD: https://en.wikipedia.org/wiki/.invalid

I work at Mozilla, but I'm speaking for myself here, and not on behalf of Mozilla as a whole.

For those interested in understand more about this project and why we're doing it, here you can find an introduction of Differential Privacy and what we're trying to do. https://twitter.com/Alexrs95/status/896366072240144385

What you guys just won't grasp is that:

1. You will absolutely obliterate any trust you have with actions like this. This is important. Because if you continue to ignore this and you will have tons of data but you will be absolutely clueless as to why your product and brand are completely abandoned.

2. This data isn't worth that much to begin with. Here is a crazy idea, try to make a better browser instead.

This data will be used in the pursuit of #2. As it turns out, a lack of understanding of what users are doing with their browsers is an obstacle to making a better browser. Performance issues in complex systems often only show up in production, and that's what Mozilla is trying to collect this data to fix.

Why is opt-in data not sufficient? Why can't Mozilla take the top-N sites and test them out for themselves?

We're already doing that. Experience shows that this is not sufficient to accurately catch regressions.

Also, just because a site is part of the top-N doesn't mean that it's part of the top-N for Firefox users.

Can you give any examples of sites and URLs that you've missed with opt-in data?

As it turns out, people use firefox to display web pages.

As a user, I don't want to waste my time learning up on Differential Privacy. Am I really expected to read up on it and prove to myself that the theories work and are safe to use? Why should I take the risk that the mechanism might leak data? The only sensible secure choice is to not allow it. Let users opt-in, don't force them to opt-out.

I am surprised that there is such a fundamental misunderstanding of differential privacy on the Hacker News crowd.

Meeting the standard of true differential privacy is one of the strongest known unconditional privacy guarantees. It will prevent Mozilla from being able to answer any user specific questions. For example, they might have an accurate count of how many people visit Google.com (say 60% of their user base), but they will be mathematically unable to point to exactly which 60% visited the site.

Differential privacy in the RAPPOR implementation is peer reviewed and well understood. We can also review the actual code that ships in Firefox, which is a big plus over the Chrome implementation. There are some caveats -- what epsilon are they setting, are they adding an appropriate amount of noise, how do they protect against repeated queries, etc. but all of these can and will be reviewed by the differential privacy community.

I am not affiliated with Mozilla or Google, though I do work in the field of differential privacy. On mobile now, but I am happy to provide links or answer questions to people who might have any when I am back at a laptop.

I only skimmed the RAPPOR paper, but can you discuss the worst-case scenario where an NSA-like adversary is able to each data point when it arrives? Assuming this happens from the start, how much information would she be be able to obtain?

> We can also review the actual code that ships in Firefox, which is a big plus over the Chrome implementation.

Sure, but note that it's been implemented already and will be pushed to the users as an add-on, without going through the full release process. Even this HN post seems to have been prematurely buried.

It would have been possible for this to be deployed without anyone knowing. Post-hoc reviewing of functionality as sensitive as this is not the ideal solution.

I'm surprised at how difficult it is for certain posters in this thread to recognize the following about our behavior and device usage:

More Data Collection < Less Data Collection

I'll also contend this is a disturbing, terrible idea.

I design optimization algorithms and software professionally and the majority of that software is released open source. Now, does my software likely run terribly on some problems that my users give it? Absolutely. That probably costs me business because they get frustrated, give up, and go somewhere else. And, to combat that, I could absolutely engineering my libraries to send anonymized information about their problem structure back to my company. Certainly, it would help me improve my software and algorithms. I also view it as horribly unethical, a breach of my customers trust, and an unacceptable course of action. Look, I want my software to work well for everyone, but it's part of my job to figure out when things don't well and fix that beyond scraping information about my customers uses automatically.

I contend this is a terrible idea and very much would like Mozilla to abandon it.

Normally, I would vote for opt-in only, but I think that it could be opt-out for Firefox as long as there are no dark patterns that make it difficult to opt-out. The survival of Firefox is extremely important for the future of the Web.

If it is opt-out, then Mozilla would have to be extremely open about how to opt-out and exactly what is tracked.

From a technical perspective, I'm not quite so bothered by what data Mozilla collects about me, how often, &c. I'm happy to opt-in to Telemetry and don't mind if it's extremely "comprehensive" in what it measures.

The real issue here is with ethos and perspective. I use Firefox because the ethos of the company and its employees, and their general "take" on issues like this allows (or has allowed) me a general sense of trust in them.

Even the very existence of this discussion erodes that trust. This says to me "the people making this browser don't understand the importance of consent, and have a vastly different perspective on the value of privacy to mine".

If your developers need more data from Telemetry, get consent and collect more data. Establish trust in users in what you do with that data.

What I have not seen here is a discussion of how, exactly, collecting browsing behavior will help Mozilla improve Firefox.

Start by actually bothering to read the link:

> One recurring ask from the Firefox product teams is the ability to collect more sensitive data, like how features perform on specific sites.

> [for example]: "Which sites does a user see heavy Jank on?"

That is, however, not a sufficient analysis of the problem domain and how the data gathering is offset by potential privacy intrusions. Right now it reads as if dev comfort is prioritized over user privacy, which might not be the case, but you cannot suggest sweeping changes like this in a few lines on a message board without proper analysis and communication. Mozilla should know better by know, especially since their market share is diminishing and target audiences are increasingly disappointing with Firefox.

Ask for it. Don't just collect the data. Ffs, what has the team behind FireFox become? I need a new browser.

They did and too few people said yes.

On the other hand, everyone complains Firefox is slow.

So,few pay, few opt in, and everyone complains.

"They did and too few people said yes."

There's the answer. And the response? "Tough shit", we'll take away that choice granularly. For our own good, apparently.

Moz has been giving tough shit with caveats for more than a few years now. Perhaps that is why market share is falling?

There is a story about people getting driver's license having a check box to opt-in into being organ donors, and very few said yes. Once the box was changed to opt-out, very few said no :)

The question is are people saying no because they are privacy conscious, or because they don't care. My money is on latter. In general more people care about Firefox being fast than security.

What's a bigger issue for Firefox is deprecating its add-ons. That's going to hurt its marketshare way more than telemetry data.

They have telemetry and it isn't enough.

"If Firefox is dedicated to preserving privacy, then no Opt-in data feature should be added. "

I really don't understand this of the user, while previous sentence he writes:

"but I will say that I believe Opt-in is pro-privacy, while Opt-out is anti-privacy."

I think he's saying that not having an opt-in/out feature would show FF's dedication towards privacy, but if they are going to have an Opt-feature, Opt-in is better than opt-out.

If you do not like the idea that by default Firefox will send data to Mozilla about your Firefox usage (no matter the privacy protection techniques being used), you should probably be aware that Firefox is already sending this kind of data to Mozilla. This is called Firefox Health Report and is pointed out in their privacy policy: https://www.mozilla.org/en-US/privacy/firefox/#health-report

FHR is also opt-out, i.e., enabled by default. If you do not like this, you may want to disable this as well.

Idea, pay people for their anonymized data. It's opt-in with an incentive.

If they do this, I will uninstall Firefox and start recommending some other browser. Mozilla has really sunk in my eyes the last couple of years but this is the final spike in the coffin for me.

What else is there, though, at least for mainstream users? There's Chrome which has been doing this since the beginning for Google, there's Safari but only if you're on a Mac, and there's a smattering of smaller browser projects on Linux that are good in their own right but not mainstream enough to have the features of the big four.

On Windows, Microsoft Edge is of course a lean and capable browser, but the OS itself is also collecting telemetry on you at all times, including browsing habits.

Hopefully someone will fork Firefox for Windows/Mac/'nix and strip out all the telemetry and data gathering bits, otherwise there's not much choice left for a privacy focused, full featured, fully supported browser on all platforms.

Most likely Brave, Edge, Opera or Vivaldi. I haven't really dug into if they do telemetry but some quick google searches tells me that at least Vivaldi does not. Brave is also a browser all about privacy so it would be weird if they do spy on their users.

But you are correct, if Firefox falls as the last bastion in the web browser world to protect users privacy there is little choice left of really nice browsers.

If / when Firefox does this however there is really no real reason to even pick Firefox to begin with since all the other major players do the same.

Brave sounded interesting until I saw this at the bottom of the landing page:

"Brave makes money by taking 5% of any donations and -- after it is fully implemented -- a small cut of advertising that is placed. Brave even shares some revenue with you -- at least as much as we receive."

Then there's this:


If they are planning to inject ads into the browser and somehow pay their users a kickback, how do they expect to maintain a reputation as a privacy focused project? Even if they offer to pay in cryptocurrency, they are still tying browsing habits and targeted advertisements to a trackable user. No thanks.

I had switched from Chromium back to Firefox after Google was caught injecting binary blobs into Chromium at build time[1], and so last year when I decided to drop all Google products from my life, I already had a great (if slow) browser making it less of a hurdle. Now, though, I think I'll stick with Safari on my phone and Mac, and find a way to sync bookmarks from Safari to Midori on my other systems.

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=786909

Yeah that makes me also not want to run it. But still, there are a few options even if neither is very good. I will probably just start running Edge or something since I already use Windows.

But I much rather just continue to use Firefox.

Can someone more knowledgable than me explain the benefits to this data collection exactly? The link mentioned finding urls with jank and the like, is it not easy to find such sites?

The real problem seems to just be marketing though. Regular people either don't see any reason to consider looking for an alternative browser or don't understand the differences. Years ago Firefox had a larger market share because the internet as a whole had a larger share of tech-savvy people and they had IE as a competitor.

That would be a public relations suicide!

Greed and monetization of user's data - this is the only real business model which brings profit. There is, probably, already a long queue of customers willing to pay for the data.

And, of course, "anonymity" is nonsense. The whole point of collecting data is to classify users into target groups and model user's behavior. In other words - to collect the data for machine learning algorithms and sell access to the datasets and other services.

Why opt-out? I understand that if the users have to make an effort to submit telemetry data, most won't. But it could be a dialog at update/installation time, wich requires the user to choose between yes amd no. If a users says no, they clearly value their privacy more than additional features or stability of their browser, and Mozilla's values include respect of the users' choices about their own data.

It's my computer, my data, and my bandwidth. If they do not provide an opt-in/out I'm going to start using a Chromium fork instead.

Hey Dang, HN mods!

With the 300+ upvotes and active discussion still occurring at hour 7 since posting, care to explain how the algorithm has relegated this discussion to the fourth page and is still dropping? Is the much less active, day old posting of a SF author's death more heavily weighted than an inconvenient topic that is important to several more factors of readers?

You can search HN for other comments from the mods regarding the algorithm, but the short answer is that ranking is not a simple, transparent algorithm, nor is it independent of mod input. Ranking is dependent on time, commenting rate, user upvotes and flags (which don't result in a '[flagged]' tag until a threshold is reached), as well as mod input. Too much commenting activity can trigger the "overheated discussion detector", which can push a post down. Given that this has 213 upvotes and 319 comments (at the time of this posting), I'd say the latter is likely, though it's hard to say.

In my experience, the quickest, most reliable way to contact the mods is via the Contact link in the footer. You might want to try that as well if you're looking for an expedient response.

Thanks for response, grzm. Considering neither of Mozilla's recent gaffe postings appear in the 400+ 'recent' list today, I will forgo contributions to the site. Open info != cloistered nor censored. YC has interests to protect after all the market speak and spurious posturing is washed away. I will miss the less controversial content, but my clicks & data history seem to be my only vote that matters anymore.

Let's be honest here. In many people's opinion (including my own) firefox has gotten worse since they started all this telemetry stuff. Supporting this, marketshare has also been dropping. So, I don't think they know how to properly interpret the data they are getting. Either that or the data is so messy as to be worse than useless.

I'm amazed that no one in this thread has yet mentioned Waterfox as an alternative - it uses Firefox as a base and then strips out telemetry, "pocket", and more: https://www.waterfoxproject.org/

Thanks firefox. Not.

One of the last truly shiny examples of open source is losing the plot. Not only that it requires pulseaudio (alsa?), it is getting harder and harder to use it normally with FreeBSD. Now this.

I've had enough, testing links -g and it works well for most of my browsing needs.

May I suggest w3m-img, tmux and uxrvt compiled with mouse support?

On one hand Mozilla doing something like that is anti privacy. On the other hand how is Mozilla supposed to impprove FF w/o detailed usage data.

In theory can be done, but in practice they are competing with Chrome and their team has waaay more data to use. And th is data gives them an edge at least on the decision which parts are worth improving.

So they can either start collecting some data and really piss off their most vocal privacy minded users and try to use this data to improve FF and steer it away from the death spiral it's on. Or they can keep the vocal privacy minded people happy continue to work in the dark and pretty much ensure that FF will become one of the insignificant .3% market share browsers.

Because somehow I kind of feel that multimilion fundraisers to make FF popular again aren't gonna happen second time.

Why would FireFox be on a dead spiral for not tracking it's users? If they want browsing data, then can use their own.

(Disclaimer: I work for Mozilla)

The problem with our own browsing data--by which I'm assuming you mean the browsing habits of our ~1000 employees--is that it's wildly non-representative of the broader population. For instance, people here routinely have browser sessions with 10, 100, or even 1000+ tabs. These numbers also indicate that the browser is an application you start, and then you just leave up for a while, perhaps until you restart your computer or you have to update for whatever reason.

The latest statistics we collected on a broader sample of users indicates that the average number of tabs is...2. The average session length is on the order of minutes, not days. Such knowledge leads to very different choices when deciding what browser features to prioritize.

And it's not just browsing usage, either: most employees probably have a top-of-the line (or close to it) Mac laptop, Windows desktop, or Linux desktop; developers have a machine with four, eight, or even more cores. These machines are hardly representative of the wider Firefox user base: a significant majority of our users (~70%) has a machine with two cores, and users with a single core in their machines outnumber users with 8+ cores. We'll not even cover graphics hardware or screen resolution here; see https://hardware.metrics.mozilla.com/ for more examples.

Using our own browsing habits and our own machine specs for making decisions is not feasible.

Then I realize Firefox is not a browser made for me. I don't care about the marketshare of my browser. I want it to to make my browsing easier and faster, while never compromising on security and privacy. _Any_ outgoing connection without my action is not OK. Not even Googles Safe browsing. If have to decide between having both Javascript and Google Safe Browsing, or neither, I would take the latter.

I value the expertise at Mozilla. Could you point to a browser that might fit me?

Thnak you for the detailed explanation!

English is not my native language so maybe I didn't express myself clearly.

It is not on a death spiral bevause of the missing user tracking. It's on a death spiral because of Google and Chrome. And tracking is a way of catching up. It's much easier to improve when you know what your users do with your software.

Man, Mozilla has been planning to do this for a long time! One of my first open-source contributions was helping a team work on this exact issue, and that was in early 2010.

In the latest version of Firefox on Windows it calls home when exiting and visiting certain web sites (flash maybe?). I did not opt in for anything.

This is great. A principled well-considered approach to collecting useful information in a way that respects people's privacy. Go Mozilla!

What's your favorite open-source browser?

This doesn't matter. SimilarWeb, Jumpshot and other clickstream companies are already doing this, in an even more non-transparent manner by using browser extensions that track every URL you visit, and searches you do, let alone the domain. I say let Mozilla give them some competition!

Well, if it's really needed, why not.

What impact will this have on bandwidth?

How does Mozilla earn $?

Mozilla: you do not have a need to know for that information.

I tried to be unbiased in the submission title and it's probably late enough that this will be buried, but here are some my thoughts:

> They don't plan on collecting URLs, just (eTLD+1).

This is true as of right now, but can change at any time in the future. From the post:

> What we plan to do now is run an opt-out SHIELD study [6] to validate our implementation of RAPPOR. This study will collect the value for users’ home page (eTLD+1) for a randomly selected group of our release population

This test consists of collecting domains, indeed, but that doesn't say anything about what will happen in the future.

> Note: "planning" means "reaching out for feedback about".

Planning means planning. Today they're reaching for feedback, and the plans might change or not.

> Hello, Redditors...

This is my fault, I suppose, for posting the link here :). Many of the angry comments are uninformed, but the users, educated or not, are stakeholders here and Mozilla should be prepared for the fallout. There have been situations in the past (Pocket, Google Analytics) where well-formulated feedback from users was raedily dismissed.

> One recurring ask from the Firefox product teams is the ability to collect more sensitive data, like top sites users visit and how features perform on specific sites. Currently we can collect this data when the user opts in [...].

Does anyone know what this is about? Telemetry? Because I will disable it if so.

> Allow Firefox to install and run studies

This is from the Nightly settings page but is pointing to https://support.mozilla.org/en-US/kb/shield, which doesn't exist (yet?). For anyone interested, there's a wiki page about them https://wiki.mozilla.org/Firefox/Shield/Shield_Studies.

> What we plan to do now is run an opt-out SHIELD study [6] to validate our implementation of RAPPOR.

This still sounds bad enough to forever poison "SHIELD" for me. It's also terribly named because it doesn't "protect" anyone.

> No telemetry, no data collection.

Without telemetry it would be almost impossible for the developers to figure out what works or not, and what's fast or not in Firefox. There's a whole spectrum here from "no telemetry" to "creepy". Please don't ignore this.

> Now they are killing Privacy.

Please try to get informed. A Mozilla employee in this thread (alexrs95) posted a series of tweets about what's being proposed: https://twitter.com/Alexrs95/status/896366072240144385. It's short enough, so please read at least that before complaining.

> What's your favorite open-source browser?

Firefox :).

> I've removed all URLs from about:config and replaced them with localhost (search for "http"). This should help with privacy-related issues as long as no API endpoint is hardcoded.

Beware of SHIELD, as Mozilla may still have the ability to push extensions to the browser.

> He said there are math theorem to prove that it's sufficiently anonymize.

I've not dug deep enough into the RAPPOR paper, but they do consider in passing the possibility of an attacker that has access to all of the collected data (think https://en.wikipedia.org/wiki/National_security_letter).

> Everyone else

Please be kind.

EDIT: Looks like this post might have been pushed back from the front page by a moderator. I'm not sure I'm fine with that.

Comment threads are a useful way of organizing the discussion, why did you choose to not use it?

Indeed, that might have been better. The thing was that many arguments ideas I took issue with were repeated by different people and I didn't want to spam all those comment threads.

My reasoning is that people might search for my comment (I sometimes do when others post), but by the time I wrote the it the first comment page was full and it ended up on the second one.

This is really slick.

It's like describing to a spouse a system of sex with strangers that includes blindfolds and a hazmat suit. Such a system could be a great way for a person to learn more about their sexual tastes and improve coitus overall. If the spouse is anxious about the system, then all they have to do is find a problem with the hazmat suit that would endanger the...

Wait, spouse, you haven't even studied the system that I so carefully designed to protect you from the possibility of...


Has anyone seen my spouse?

Can we at least stop with the FUD please?


EME is not DRM, it's a fully open source spec to support third-party DRM modules. If you don't actively choose to install a DRM module, there is no DRM in your Firefox.

> 3-rd party apps

Like what? Pocket is fully owned by Mozilla.

> analytics, tracking

So far this has been 100% opt-in. It might change with this new thing, but even that's not for certain.

> Can we at least stop with the FUD please?

Please don't use uncivil internet tropes on HN. If you have a substantive point to make, make it thoughtfully. Your comment would be fine without the first sentence, but experience unfortunately teaches that flamebait has more impact than its accompanying substance does.

We detached this subthread from https://news.ycombinator.com/item?id=15072694 and marked it off-topic.

> Can we at least stop with the FUD please?

Can we please stop with using the word FUD for things that are not? The very idea of accepting DRM as a possibility in the browser was a slap in the face to those who believe in internet freedom.

> Like what? Pocket is fully owned by Mozilla.

Check your facts: it was added to FF 1.5 years before Mozilla bought it. It was an example of them just not giving a shit and adding it anyway.

> So far this has been 100% opt-in.

Check your facts: the Google Analytics in the extensions page that I linked is explicitly "opt-out" and even that happened only because people found out about it and (rightly) raised a stink.

https://news.ycombinator.com/item?id=14753546 and https://bugzilla.mozilla.org/show_bug.cgi?id=697436#c14

(And really, of all the analytics choices, they fucking picked Google Analytics?!)

DRM was already possible in browsers by using NPAPI or similar plugins. EME is no worse in that regard, and is from a technical perspective a much better solution.

Pocket's integration during that time amounted to a button which sent a request to a potentially useful service. Such buttons have existed in firefox long before pocket ever existed, and you are of course welcome to no click on them.

Apparently Mozilla has a special deal with Google regarding their use of Analytics but it really doesn't make it that much better.

So freedom is not having the choice to use DRM?

I don't buy that argument, sorry. Because it requires something as anti-freedom as DRM to exist in the first place.

You would outlaw locks for the front doors of houses too?

No, because physical goods and 0's and 1's are not the same.

All 0's and 1's occupy real electrons in real space. They are not part of the ether. You are making a distinction between electronic information and the physical that is artificially derived.

I think perhaps you are saying that a pattern of information should not be locked up, but instantiations of a pattern can be.

> EME is not DRM, it's a fully open source spec to support third-party DRM modules

So, it exists only to facilitate DRM. And I could have sworn that it defaults to enabled?

> Pocket is fully owned by Mozilla

It wasn't when it was added.

>> analytics, tracking

> So far this has been 100% opt-in

Other than the Google Analytics on the add-on pane, maybe.

> And I could have sworn that it defaults to enabled?

It does not.

Please don't turn Firefox into a botnet!

Does anyone know why Firefox always phones home to:

    firefox.exe	3588	TCP	pc-name	49172	ec2-35-167-184-4.us-west-2.compute.amazonaws.com	https	ESTABLISHED	3	667	5	3,334		
I have a tool called TCPView (a Microsoft sysinternals tool) that inspects my traffic. I disabled all the Mozilla telemetry and it still phones home to this server. The connection is encrypted going through port 443. It even appears in different forks of Firefox like Waterfox. Is this unique to me, or does anyone else notice this? My best guess is that it's some sort of telemetry they're collecting. Keep in mind I disabled updates so it's not pinging update servers either. Also: It happens only when I visit a website, and doesn't appear when I start up the browser for the first time. I also have no plugins installed. Since I see some Mozillians in this thread I figured it's the best place to ask!

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact