Hacker News new | past | comments | ask | show | jobs | submit login
Telemetry in Front-End Tools (timseverien.com)
130 points by gravitate on April 5, 2023 | hide | past | favorite | 141 comments



Telemetry in webapps is incredibly tempting, because it's just so easy to collect. Even without JavaScript, web browsers offer a dazzling array of ways for applications to call home with fingerprinting and tracking information. Because it is easy, it is commonplace, and many will use this as a justification for their own decisions.

It is true that, in some cases, telemetry can make it faster and easier to improve software, but this is ultimately placing the convenience of the developers ahead of the privacy and autonomy of the users. I believe that software should serve the interests of users, and informed, affirmative consent for phoning home is an absolute baseline for respecting their wishes.


The site in question isn't even talking about web apps. They're talking about telemetry in tools commonly used to build webapps, which is perhaps more surprising.

The Next.js CLI phones home to Vercel with statistics about how many times e.g. `next dev` is being called, along with details of the developer's OS and CPU, as well as what Next.js plugins they're using in their project.


Wow, I'm really disappointed by that in Next.js. I love the product, and I'm sure they're being ethical about what they collect and how they track it, but I deploy this on my production servers. The idea of them doing stuff on my production instances that I'm not aware of and don't approve of is not pleasant. Not to say it's fine if it's on personal machines, because I don't think that is either, but yeah.


What I really really want is to have distros push for a centralized place for telemetry opt-out - something like /etc/telemetry.conf or ~/.config/telemetry.conf - and force all applications to either make telemetry opt-in or honor this config (perhaps patching the applications and submitting the patches upstream)


The telemetry runs when you call next dev, so does not run on your production instances.

(And you can opt-out and they display a warning, tough of course that is indeed not an opt-in.)


Offering opt-out is odd from a GDPR standpoint.

If the next.js project is collecting ip addresses together with this info they are processing personal data under GDPR. They need to do so under one of the 6 bases for processing, which in their case is either consent or legitimate interest. If consent, opt-in is required and opt-out is a violation. If legitimate interest then opt-out is alright and in fact not even required, but they have a high bar for clearing that standard, especially since opt-out is offered (which somewhat disproves the claim of legitimate interest).

I assume the project is in non-compliance and one complaint to a regulatory authority away from a proceeding that may lead to a fine if they don’t switch to an opt-in model.


In this case I mainly wanted to assuage concerns that this affected production instances.

That said, I'm not sure if consent or legitimate interest are the only potentially applicable bases. Knowing when the software breaks so you can fix it seems like it might be in the data subject's interest. And if it's not PII (which I'm not sure it's not, given that an IP address can be exposed, even if not logged), those bases aren't even necessary.


The fact developers even have to worry about that is a problem. How do you know where they will draw that line?


I agree that it's annoying that you can't just plug any tool into your code and hope that it does not do malicious things, but alas, that's the world we live in.


> this is ultimately placing the convenience of the developers ahead of the privacy and autonomy of the users

this is a biased framing of the problem. first of all, the telemetry is often more useful for PMs (not developers) to make decisions on what efforts to prioritize vs not, and to assess relative success of launches. less of a convenience thing, but an industry standard way to get some meager signal from all your users, not just the ones that fill out surveys (and i'm sure theres plenty of complaints about surveys being unrepresentative too).

second of all, the work that developers and PMs do with this anonymized data is usually intended to help users.

pitting developer vs user is a false binary.


> pitting developer vs user is a false binary.

Not in the modern software ecosystem it's not. How often is telemetry used to make a product better vs. helping to add more dark patterns to increase "engagement"? No amount of telemetry seems to stop Microsoft from shoving more user hostile garbage into Windows 11.


>first of all, the telemetry is often more useful for PMs to make decisions on what efforts to prioritize vs not, and to assess relative success of launches. second of all, the work that developers and PMs do with this anonymized data is usually intended to help users.

OK, then it places the developers and PMs convenience ahead of the users privacy and autonomy.


In 18 years, I've yet to see someone use telemetry in the interest of the user. I've seen it be used to save money, earn more money, and quite a bit for mental masturbation. What I've seen actually work in the interest of users is qualitative user research, e.g. user testing, and actively engaging with users.

Sure, one can make the argument that if booking.com upsells rental cars more effectively they earn more money and provide a better service and experience for users. Haven't seen this happen once though.


Former Firefox dev here. I was part of the Firefox performance team, which introduced Telemetry in Firefox. We used Telemetry to measure the performance of many operations, to determine, among other things:

- which operations were running slowly on real user's computers (and how commonly they ran slowly), so that we could fix these;

- whether users were faced with out of memory errors, or timeouts, etc.;

- whether users had abnormally large database, preferences, etc. files, which suggested that we should optimize for such cases;

- in some cases, whether some features were used at all because if they weren't, it wasn't worth optimizing them.

I'd qualify these as definitely in the interest of the user.

I know that not everybody enjoys having Telemetry collected on their usage, but without this, Firefox would never have been able to catch up with Chrome on user-visible performance.


Not a browser dev, but not long ago I spotted a discussion of Chrome potentially deprecating/removing a rarely used XML API. The discussion was halted when a sudden spike in usage (IIRC, from well below 0.1% of whatever the metric to just slightly over 0.1%) was noticed. I have strong reason to believe I contributed to the spike (I was prototyping a change that hit it rapidly in the same time period), and I have strong reason to believe this particular API was preserved because telemetry was in place where a more conventional outcry would’ve been needed to otherwise stop the change (ahem alert ahem). But there wouldn’t be an outcry, the API is long standardized but basically unused outside of very odd corners of the web that aren’t very web centric.

I don’t want browsers or tools to undermine the trust of their users by quietly tracking unknown things. But I agree collecting usage data matters in ways that aren’t always appreciated.

I don’t know how to do it transparently and preserve/establish trust. But I think the instinct to distrust any metric collection probably isn’t the right balance to strike.


FWIW, the one feature that I had removed because nobody used it was the yellow bar in the bottom "Add-on foo is slowing down your Firefox, do you want to disable it?", so nothing quite standard :)


But knowing which addons are slowing Firefox is quite useful, even if I decide to not disable the addon!


Yes, but I reimplemented the feature and moved it to about:performance.


Dammit I was using that! ;)


> - in some cases, whether some features were used at all because if they weren't, it wasn't worth optimizing them.

Terrible conclusion, by that reasoning a drunkard should only seek for his keys under the streetlight

https://en.wikipedia.org/wiki/Streetlight_effect


That’s good internetdebatesmanship, but if the post was “Firefox spent years optimizing thing no one ever uses at all”, it’s easy to imagine the ire that would raise for people who just wanted their keys not to be inexplicably lost under a streetlight.


Well, at some point, you have to make choices. If you don't have enough developers, you can either optimize features that users are using or features that users aren't using because they _might_ use them some day.

If you do the former, there is a chance you may be missing out. You do the latter, you are certain that you are missing out.


It's super useful in big companies. If you want head count, you gotta show numbers. Otherwise projects get canned. This is especially true for projects like the ones shown. Organizations want to know if people are actually using this stuff.

I disagree with the practice, but it's definitely the way it is.


You're missing the point by focusing on "developer". "Developer team and related teams" might be a better phrase because it includes the PMs.


This. When I use the term "developer" in these sorts of contexts, I'm referring to the company as a whole, not just programmers working on the product.


Exactly, both are developing a product.


If usage information is so valuable, why not ask for it instead of simply taking it?


The answer I've gotten to this question from multiple sources is: because if we ask, not enough users say yes.

Which speaks volumes.


I've learned that a lot of people somehow made it to adulthood without a single lesson on what actual enthusiastic true consent is.


The followup to that is: '...and they'll switch to alternatives that don't ask.'


because 80% of users dont opt in, dont opt out, they roll with the defaults, and when asked dont mind that youre getting anonymized telemetry back. most users are not privacy maxis like those motivated to comment 3 threads down on hn posts like these. they want to get their stuff done and expect the product to improve and understand that we have evolved these practices as pragmatic tradeoffs.


They trust us and we are eroding that trust by invading their privacy for bs reasons.

They do not understand, they just shrug “I guess devs need this”. They expect us to know, to care, because it is our job. We don’t and it’s sad.


And because of this erosion, more people are pushing back on telemetry. Imagine if everyone decided to get together, detail what telemetry they needed and why and outline how users can help - protect their privacy and actually Do The Right Thing(tm) that more people would be willing to care, and maybe contribute.


Not to mention it's usually possible to opt out (as it is with Next.js/Vercel).


The type of people who volunteer are different from the general population.

That’s why things like public opinion surveys have to be random and why jury duty selection isn’t volunteer.


“help us help you”?

If that is so, just ask and explain and be more specific than “this helps improve things”. How hard is it to say how exactly this vague ball of data is helping The Users?


> Even without JavaScript, web browsers offer a dazzling array of ways for applications to call home with fingerprinting and tracking information.

This is why I don't use web apps or web-based services unless I have no other option.


What are the other options?


Native binaries. They even come with the additional benefit that I can firewall them off individually, so I can selectively allow telemetry collection if I'm OK with it.


It's funny to see this comment, because any time someone complains about the Zoom desktop app, there's a comment bragging about how they only use Zoom through their web browser.

I'm surprised that anyone claiming to be security- or privacy-minded would prefer a native desktop app to a website running in a sandboxed web browser. Even if you build the code yourself, you're probably safer with the web app. At least in the browser, you can monitor and block any connections from the web app. Good luck doing that for a native binary without inspecting the source and build process for underhanded code exfiltrating data from your machine through some unknown number of obfuscation techniques.

Amusingly enough, the OP article is about telemetry of "frontend tooling," which does not refer to "web apps," but to "native binaries for building web apps."


> I'm surprised that anyone claiming to be security- or privacy-minded would prefer a native desktop app to a website running in a sandboxed web browser.

That shouldn't be so surprising, really. It's a question of what threats you are the most concerned about. That decision is pretty individual.

> Good luck doing that for a native binary without inspecting the source [...]

I don't have to do all of that. I firewall off all outgoing traffic by default. If a binary is trying to exfiltrate data, it won't get past the firewall. That's hard to do in a web context.

> which does not refer to "web apps," but to "native binaries for building web apps."

Indeed so! I'm not saying that just using native binaries all by themselves is sufficient. I'm saying that I have more tools available to mitigate the problem when it's a native binary.

Just look at the impossibility of effectively stopping browser fingerprinting for an example of the difference between the two things.


> I firewall off all outgoing traffic by default.

At what level? Unless you're running every native binary on its own hardware, or maybe within a VM on an isolated VLAN, how can you be so confident that your firewalling method is less leaky than the battle-tested sandboxing of Chromium or WebKit?

Also, why not both? The most secure option might be running a web app in an isolated Chromium process, with a Chromium extension allowlisting outbound connections, and then also firewalling the Chromium process itself at the operating system level.


I run exclusively Linux, and on each machine I've set up, all outbound traffic is dropped. I whitelist specific applications as needed. I do this by using rules that match --gid-owner, and change the group ownership of whitelisted applications to a special group that those rules will catch.

> how can you be so confident that your firewalling method is less leaky than the battle-tested sandboxing of Chromium or WebKit?

I can't be 100% certain, of course -- although that's equally true of Chromium and webkit. But I have a monitoring system that checks my firewall logs to catch anything suspicious.

> The most secure option might be running a web app in an isolated Chromium process

Because fingerprinting. I can't think of any way to stop fingerprinting. The best I can do (and I do) is to disallow JS from running, but that still leaves many possible signals.


> Because fingerprinting. I can't think of any way to stop fingerprinting. The best I can do (and I do) is to disallow JS from running, but that still leaves many possible signals.

This is even worse for native apps. webapps do fingerprinting because they don't have access to lowlevel native information. Any native App does not need to do Fingerprinting, they can just read device IDs directly, much more reliable that any method of fingerprinting.


> This is even worse for native apps

Not if those apps can't phone home.


Unfortunetly less and less apps nowdays can work in an exclusive offline mode. There is no way to distinguish between legitimate traffic and "telemetry" trafic.


How do you know the app is not using a side channel to exfiltrate data through the normal mechanism of operation? Zoom - I expect it to require significant amount of bandwidth to operate. How am I to know that if remote call X fails it will not just route data through the video path?


What are you using to accomplish this? iptables?


Yes, iptables.


Not sure how effective this is unless you're exclusively talking about offline binaries rarely updated, coupled with an external firewall to mitigate workarounds.


I'm not sure what you mean by "offline binaries" here. Do you mean applications that don't need to talk over the internet? If so, then yes, that's all I can effectively cover. I'm extremely cautious about such what applications I'll use that require talking over the network.

I in no way claim that my approach is airtight. It's purely a "best effort" sort of thing. But it's far better than nothing. I'm better off for it even if it doesn't stop everything.


> but this is ultimately placing the convenience of the developers ahead of the privacy and autonomy of the users.

Is it though? Putting aside the issue of whether you trust the person collecting telemetry for a second. You can collect telemetry completely anonymized. You don't have store IPs or any sort of personal identifier. I know a certain crowd still won't like it but in practice how does that harm the privacy of the user?


The very act of making a network connection (data points: network address + time of use) makes it impossible to be truly anonymous https://kieranhealy.org/blog/archives/2013/06/09/using-metad...


> You can collect telemetry completely anonymized.

You can't.

You can decide to discard identifying information (although only if you aggregate and don't insert some sort of "anonymous" identifier), sure. But you're still going to collect it.

So it all boils down to trusting not only the developer, but any company or interest that might acquire the developer or product in the future. And I think it has been repeatedly demonstrated that this trust is not well-placed.


> You can't.

Of course you can! Just route telemetry through Tor.


Personally I find this unacceptable. It’s ok to do it within your own company environment, but not for a public open-source project.

This is a slippery slope for “understanding more about our users” and I challenge anyone to show a meaningful improvement that derived from telemetry of this kind. There are better ways to figure out how a project is used without harming privacy (ex: analyzers like BuiltWith). Projects doing this should get a red mark to discourage it.


> I challenge anyone to show a meaningful improvement that derived from telemetry of this kind

Which features of our software are people using, and where do we need to concentrate the most with our limited resources and bandwidth to provide our users the best possible experience.

What are the biggest build- and run-time errors, and are they a result of the developer experience, and can we fix this through either better documentation or better error messaging.

What are the paths to successful adoption of our software, and can we nudge potential users (through documentation or developer outreach) to get on one of those paths.

We polled our users and they said that they want feature A, but our usage data shows that a vast majority of our users need feature B to be more successful faster.

Our software is used most often with other-tool-A. Can we strike up an agreement with the team that makes A, in order to give our users a better experience when integrating both tools through documentation, tighter API integrations, etc.


> Which features of our software are people using, and where do we need to concentrate the most with our limited resources and bandwidth to provide our users the best possible experience.

The best possible experience starts with not being spied on, especially against my consent. No feature is worth that.

> What are the biggest build- and run-time errors, and are they a result of the developer experience, and can we fix this through either better documentation or better error messaging.

Make it easy to report bugs. This is a solved problem (c.f. any popular open source application).

> What are the paths to successful adoption of our software, and can we nudge potential users (through documentation or developer outreach) to get on one of those paths.

As a user, I never want to be "nudged". Ever. Do not "outreach" to me especially if you've made the decision to contact me by spying on my data.

> We polled our users and they said that they want feature A, but our usage data shows that a vast majority of our users need feature B to be more successful faster.

So your users told you what they wanted and you didn't give it to them and justified the decision based on analytics. This is where people say that analytics are good for the product manager's job in helping justify its existence but at the expense of the users.

> Our software is used most often with other-tool-A. Can we strike up an agreement with the team that makes A, in order to give our users a better experience when integrating both tools through documentation, tighter API integrations, etc.

Create great solutions. If that means doing biz dev or integrations to make that happen, go for it. You do not need to spy on your users to build meaningful partnerships.


> The best possible experience starts with not being spied on, especially against my consent. No feature is worth that.

This kind of knee-jerk reaction says that there is no comment in the universe that would make you entertain a different perspective. That's fine, that's not what this comment is going to be.

Instead, I'll say this: the default assumption of ie "collecting crash data equates to invading my personal privacy" is just wildly off-base. I will freely admit that there are plenty of companies who have abused the concept of the always-on-network connection and are doing hardcore data-mining almost at the keylogger level. That's bad, 100%.

But there is a large majority of data-driven companies who are actively trying to make your experience with their software better, with the small team that they have. To say "well f them, they can try to divine how I, Power Linux User, would even use their software" is reductive and puts an engineering team at a distinct disadvantage and slows their overall output. This is not what you, as a user of their software, want, and it is ultimately a self-defeating argument.

"Create great solutions" does not happen in the year of our lord 2023 without data. That data needs to come from somewhere, and "make it easy to report bugs" isn't it.


> there is no comment in the universe that would make you entertain a different perspective.

You're correct in that there is no comment in the universe that would make me ok with being spied on. That's not a "knee-jerk reaction" though, it's privacy and consent 101.

> "Create great solutions" does not happen in the year of our lord 2023 without data.

It's true that there's a trend toward dark patterns and spying, which is exactly why we as users should resist and not allow it to be framed as the status quo. Great software has been created for decades, a lot of it still in use, without a hint of spyware or analytics.


I've lived in the data-driven mindset for many years, and ultimately found that it still consists of mostly guesswork, opinions and strong biases everywhere. People make product decisions and they will seek the data to validate what they already want [1] anyway. It should complement, not replace talking to users, and is not a requirement to build great products.

When it comes to simple anonymized telemetry, one user or subset of users, spamming the same error over and over will skew your data, and unless you track them even more you won't be able to tell the difference. As I said earlier, it's a slippery slope to stand on.

Comprehensive telemetry/tracking is fine for say, an e-commerce website or any kind of cloud application where you're already receiving all the user input anyway. It's also fine, with consent, for desktop apps, especially complex ones like IDEs etc. It is not OK for third-party software packages that you'll ship to your own users - regardless of any promise to not add tracking to the resulting build, since at this point they already broke that wall and it only takes one "well-intentioned" PM to make it happen.

[1] a system that evolves automatically by running stochastic A/B tests is something many have dreamed up including myself. I'm really curious what that would end up looking like. That's being truly data-driven!


> Make it easy to report bugs. This is a solved problem (c.f. any popular open source application).

How can you make it easier to report a bug for the average user than providing a single button to the user that lets them upload a crash report after a crash?


Oh, man. Don't call user-initiated data collection "telemetry". You are going to lose that battle for no good reason if you insist on associating with automatic collection.


The problem is not data-collection per se. optout data collection is the problem. if the user is informed and consents there is no problem at all.


That seems like a fine way to do it. What we're talking about are apps that track you while you use them, not an opt-in one off bug report.


This comment is at the heart of so much that is wrong with modern software:

> Which features of our software are people using, and where do we need to concentrate the most with our limited resources and bandwidth to provide our users the best possible experience.

That a team of product managers and UXers can answer this question even half correctly with a bunch of metrics and A/B tests is one of the biggest fictions of our time (not to mention pure arrogance). Talk to your users, or better yet be serious users of your own product.

> What are the biggest build- and run-time errors, and are they a result of the developer experience, and can we fix this through either better documentation or better error messaging.

Invest enough in quality so that users see errors so rarely that they get in touch directly when they do.

> What are the paths to successful adoption of our software, and can we nudge potential users (through documentation or developer outreach) to get on one of those paths.

This is an anti-pattern. Your adoption is irrelevant, only the utility of your software and user happiness matter. If those things aren't compatible with you making money, you shouldn't make money. Using telemetry for this purpose decimates the already weak value of telemetry for improving your software.

If you are building applications with telemetry for this purpose, please stop.

> We polled our users and they said that they want feature A, but our usage data shows that a vast majority of our users need feature B to be more successful faster.

This is the height of arrogance. Quite incredible that anyone would think like this. I would immediately end the relationship with any provider who believes that their usage data, of all things, is enough to reliably decide that the users don't know what they want and need and the provider knows better.

> Our software is used most often with other-tool-A. Can we strike up an agreement with the team that makes A, in order to give our users a better experience when integrating both tools through documentation, tighter API integrations, etc.

The level of telemetry required to know not only how I use your software but what other tools I'm also using is quite terrifying. Pretty much total surveillance. Try talking to some users instead.


> This is the height of arrogance

"If I had asked people what they wanted, they would have said faster horses." Henry Ford


All that is true to a degree.

But I offer two interesting and related observations.

The first is that it's not that uncommon for truly terrible feature decisions to be made on the basis of telemetry. Things like removing features that turn out to be really important even though they are rarely used, etc.

The other is that I see some people who are using software that they know is collecting telemetry alter their use of the software in an attempt to influence decisions based on that telemetry. Things like using critical features more frequently than they otherwise would, in the hopes that the feature won't be cut.


I actually think it would be OK if the telemetry were opt-in. At this point, I'd even settle for just having the software actually warn you about the data collection on first use.


I agree opt in telemetry (with no dark patterns, nagging, etc.) is fine.

We do this (even then, strictly for non personaly identifiable crash/error reporting only), and I would encourage everyone who isn't a friend/family of the team, close community member, investor, etc. not to opt in. It isn't necessary. (Though having a small number of tens of people from those groups who do opt in is reasonably useful.)


you opt in by using the product. especially if its open source.


Tracking my usage was never part of the social contract of OSS. This post is a reminder of that.


What social contract is that?

As far as I'm aware, the contract of OSS is "you can fork this to change it, if you don't like it" (license permitting).

A big part of the reason people advocate for OSS is so they can read the source code to understand what's going on, and change the parts they don't like. Telemtry sits squarely under that usecase.

I agree that it's good manners for projects made users aware of telemetry, but there's no "contract" that implies they have to.


He was talking about the social contract, not the license. OSS is a community that has mores, expectations, and social contracts like any other community. Spying is very much in opposition to that social contract.


If you haven't provided informed consent, you haven't opted in.


Ahh fair, I was under the impression this was disclosed telemetry.


As long as its disclosed I have a hard time with the pearl clutching going on over this. If you don't like something the maintainer of an OSS product is doing, fork it or don't use it.


I couldn't care less if it's phoning home by default, the data helps build a better product. Opt in telemetry is basically useless as you're self selecting for people that discover/care enough to enable the feature.

In a world where we have companies like Google/Facebook and massive levels of surveillance it's bizarre to me that this is the hill some of you want to die on.

The tools themselves provide a way to disable it, so I don't see what any of the fuss is about.

This happened recently with a telemetry proposal in the Go community and everyone threw their toys out of the pram. The Go team was forced to back peddle and now the telemetry collected by the toolchain is worse off because of this.


Optout telemetry is illegal in Europe.

Any connection that is not neccesary for the purpose of the tool is exposing the IP of the user without his consent.

Pinky promises that IP of the connection will not be (ab)used do not count.

See https://rewis.io/urteile/urteil/lhm-20-01-2022-3-o-1749320/

> It is sufficient that the defendant has the abstract possibility of identifying the persons behind the IP address. Whether the defendant or X. has the specific opportunity to link the IP address to the plaintiff is irrelevant.


Go famously does not listen to users. Am I to believe that the one thing holding back Go development was tabulated data?


Why do you say that when this is a prime example that they actually do listen to users?


Ha, fair point.


Except this time they did listen to users (to the detriment of their telemetry data)


That's cool that you don't care, but some people do care, or else this site wouldn't exist in the first place.


Just because Google or Facebook does something doesn’t mean everyone else should get away with something a little less bad.


There's an obvious difference between companies collecting and selling your data for profit and developer tooling collecting some metrics to improve the product.


The difference doesn't justify spying on me though.


I don't seem to understand the "let's take back our data" theme here. If someone builds a frontend website that you use, do you have the right to not have any metrics gathered about how you use their frontend that they built?

What about server logs? If I send an http request to a server, do I have any right to say they can't log that request? What about the database metrics?


> If someone builds a frontend website that you use, do you have the right to not have any metrics gathered about how you use their frontend that they built?

Data about me, my machines, or my use of my machines is private data. Nobody has any right to it besides me. If it's being collected without my informed consent, that's just spying.


If walking into Walmart you were stopped and had your pockets turned over by a greeter and had to record your car registration, DOB and favorite Electron app "just having a look bud, now carry on" would you go back?


Yes because wanting to collect metrics on which devices your users are using to better aid in the design and development of your application is similar to giving your DOB and VRN to Walmart.


I'm not disagreeing with your point, but did want to add in that what a developer's intentions are don't mean a great deal, because there's a serious trust problem.

It's very difficult to tell what data is really being reported by an application. The majority of the time, all we have to go by is what the developer says is being collected.

But the developer claims cannot be considered trustworthy by default. Many developers have simply lied in the past about what they collect. And even if the developer is truthful, they can only speak for what the software is doing at the moment. There's no guarantee that the developer won't change their mind in the future, or the product won't be sold to another company that isn't so considerate of their users.


This is a poor analogy - while it's certainly a debatable practice, nobody is being 'stopped', and this isn't comparable to going through anybody's pockets or digging up other unrelated info about their lives. Even the article here explicitly assumes that all the data is not personally identifiable, and I'd be very surprised if that's not correct (because otherwise they'd be begging for a GDPR problem).

Also though, interestingly Walmart actually really does do the equivalent of this: https://bernardmarr.com/walmart-big-data-analytics-at-the-wo.... They have real-time detailed metrics for individual customer transaction behaviour in stores, and I would not be surprised at all if they tracked lots more, e.g. the total number of cars in parking lots, how long different cohorts of customers spend in stores, etc.


In fact the article is wrong. Doesn't matter what data they do collect, they need to establish a connection to the telemetry server and that exposes the IP (which is Personally Identifiable information) without the user's consent. https://rewis.io/urteile/urteil/lhm-20-01-2022-3-o-1749320/

And the problem is even bigger, how do i know what data is being collected? even if it declared publicly somewhere, do i need to check on each update if the collected data has changed?

> otherwise they'd be begging for a GDPR problem

They have in fact a GDPR problem but like in the case above where it was not Google to answer for the GDPR Problem but the site that used Google CDN, in this case too, most probably it will not be these toolmakers who are taken to court but the companies that asking their employees to use these tools without disclosing this personal IP leak. If this becomes the norm how much burdon do we have to go through to verify each tool whether we need to ammend the employee GDPR consents or not? I see this already having an effect... Just standardise only on Microsoft tools so at least we need to gather GDPR consent only for Exposing data to Microsoft...


Privacy matters, no matter what layer of the tech stack we are talking about. The developer is not legally required to build an on/off switch for telemetry in most jurisdictions, but they should anyway. It’s the right thing to do. It could be a switch for the whole stack or you could provide more granular options.

Personally, I like to implement a “Report a bug” button which sends detailed telemetry when it really matters. To keep an eye on more general usage data, I use Cloudflare Web Analytics. It’s easy to build a toggle for analytics. Personally, I find that more than sufficient, especially when combined with traditional user studies.

Having the backend respect the toggle too wouldn’t be that hard.

https://www.cloudflare.com/web-analytics/


This is talking specifically about the frameworks people use forwarding telemetry to the developers of those frameworks


>If someone builds a frontend website that you use, do you have the right to not have any metrics gathered about how you use their frontend that they built?

Yes, absolutely.

That website's code is being executing on my machine, consuming my power, then using my internet service to phone home with a data package of arbitrary size.


Applications should not have network/internet access by default. Operating systems should be competing on features that put you in control[0].

To do anything less than blocking every connection until you've allowed it is putting a lot of faith in software (vendors) that they have already demonstrated they will abuse. It's nice that some software grants you the privilege of asking it politely to not do a telemetry[1], I guess. But it (and the vendor) already demonstrated they are happy to breach your trust (and, arguably, your security) for their own arbitrary reasons.

So many of these tools present themselves as free software. But if you're not in control of it, it's not free.

[0] Unfortunately, it's really nice when you control the operating system, productivity software suite, game development studios & publishers, email, world's best and most helpful good boy with your best interests at heart online chat assistant, scalable web services provider, programming languages, software forge, proprietary 3D graphics stack, laptops, game consoles, ...

[1] What does `telemetry = no` even mean anyway? Does the tool report that telemetry was disabled? Does the tool now do nothing on a network that isn't explicitly declared, or what is required to perform the actual work you ran it to do?


People that keep whining about privacy and all have never tried to build a business where you need insights and data to not put your company out of business. It’s all a matter of perspective and not all black and white. The truth is to be found in the middle


This isn't true at all. I've been in this business a long time, and have developed several very successful products.

I know that this sort of data is very useful to a business. But I also know that it's abusive and wrong to extract this data without my customers agreeing to it.

While this data is very valuable, not having it isn't an existential issue for any company (or, if it is, that's because there's something very wrong with how the company operates.)

> The truth is to be found in the middle

Hard disagree. This is an issue of consent, and I don't see how there's a middle ground for consent.


The success of your business isn't my problem.

The fact that you may have incentives to do bad things is just a reason society should make laws against you doing them, not a reason I should feel sorry for you and let you spy on me.


Okay, I'll offer up a perspective for you: the company should just go out of business then. If you need it that bad for success, then that is a signal that you aren't supposed to succeed. Full stop.


Just make it opt-in. The real problem here is that is that these products send data by default without explicit consent. Ask yourself this question: do you think most people who use these tools know that their data is being sent? Probably not, right? OK, of those people, do you think 100% of them will be totally OK with their data being collected once they find out? Probably not, right? Then why the fuck do this?

> But then no one opts in!

OK that is your business' problem to solve.


There are lots of other types of privacy issues in open source out there, this wiki page documents some of them:

https://wiki.debian.org/PrivacyIssues


Hopefully, this sort of thing can at least be stopped by very restrictive outbound firewall rules, but it's getting to the point where you have to seriously consider only using dev tools on air-gapped machines anymore.


This is the only way. Just deny by default.

I never liked Postman and that's how I mostly stopped using it. Completely stops working on analytics request denial and then starts sending reports to Sentry (but even if you allow those, they don't learn, I've tried)


This site would be a lot more useful if it talked about _what_ data was collected, rather than just assigning 25 for "Data sensitivity".

It's also pretty interesting to see the comments here, in a privacy focused thread, and contrast them with comments elsewhere about software performance. If you like faster software, developers need to measure it.


I tried doing that on the tool-specific detail page, but they’re broad categories, e.g. “Device, Environment, Usage” for Next.js. Do you think that’s sufficient?


I saw (after posting) that all the tool specific pages do link specifically to each project's documentation about what telemetry data is collected.

Those categories feel a little too broad to me, especially without clearly stated definitions. "Environment" in particular made me second guess if dev tools were leaking secrets. But, thankfully, the projects I looked at were very clear that they don't collect anything from the shell environments.

Maybe all it really needs is some detail for what's in the category. "Device (OS, core count, available memory)" or "Usage (command, plugins, timing)"?


some stuff I remember on mac:

docker ce - collects telemetry in the installer before you have even installed anything

brew - telemetry on by default

balena etcher - telemetry

anything apple


> balena etcher

Holy shit! A glorified dd if=/of= that uses Electron and has telemetry – now that's something


I guess this is only what you can find out from tcpdump and strace ? Reading the EULA/privacy policy is way scarier, for example content of files, network information, other apps on your computer, and web searches.

If you are using a free - as in free beer - and not free as in free software - you are the product!


> If you are using a free - as in free beer - and not free as in free software - you are the product!

True. And increasingly, paying for the software doesn't make this any better.


The currently available information is all from browsing documentation and scanning source code. Some people suggested looking into JetBrains IDEs, which isn’t open-source, and thus would require packet sniffing.


To provide an extreme example; if you have sex with someone without asking them, it's rape. "But baby, I assumed you wanted it because you had a glass of wine and you were wearing that sexy outfit." - Still rape. Opt-out telemetry is like that, too, you stop only after penetration.

Or another example, if architects decided to start putting cameras in the house everywhere. The camera, somehow, blocks out your face but monitors everything else. "But, we need this data to figure out which part of the house you're using and how often you're in these rooms. PINKY promise it's anonymous and no one will know who you are."

Opt-in consent is the only right way.


There should be laws against the collection of data without explicit opt-in consent. Oh wait, there already is, it’s just being completely ignored


I don't know which law you're referring to specifically, but often they relate to personally identifiable information, and have carve-outs for non-PII data.


IP address is personal info.


That's fair enough, and I believe just not storing it is not even enough. I'm not sure how they deal with that.


The GDPR for example, which enforces opt-in for anything not strictly necessary for the provision of the service (and no, telemetry doesn't count as "strictly necessary"), enforces the concept of data minimization (which means you must collect & process as little personal data as possible to fullfil your objective) and considers IP addresses as personal data.


I reviewed the source-code before adding them to the site. I can’t say for sure IP addresses aren’t stored server-side, so we have to trust them that they don’t, but most tools do find creative ways to get some kind of unique and persistent IDs. For example, Astro computes a hash for the git repository remote URL, and Storybook computes an ID by hashing the first commit hash in the repo.


The intent of the GDPR is to prevent the act of non-consensual tracking and not some specific technical method of achieving that, so even a deterministic ID generation method that doesn't rely on any client-side storage would still be in scope, as would a hypothetical crystal ball if it worked better than random chance at reidentifying a unique user.


> (and no, telemetry doesn't count as "strictly necessary")

I'll start with the caveat that I'm not a lawyer, but I wouldn't be so sure. For example, knowing when your tool breaks for a user so you can fix it sounds like something that you could argue is necessary to make sure the tool works for the user.


I think the standard for "strictly necessary" is whether that piece of data is necessary to provide the requested functionality.

Does submitting an order form for some goods in any way depend on the telemetry requests you sent before/after? Would the order still go through if those requests were missing or fed fake data?

If the answer to the above is true then "strictly necessary" would be difficult to apply. Of course the point is moot because so far GDPR enforcement is not only lacking to begin with but the cases that are investigated aren't given the right technical expertise to adequately determine those answers.


Are there any good examples of how telemetry actually improved experience for users? I don't mind if providing device hardware information will one day lead to improved performance on my old laptop. But when software tries to collect all the data and does nothing - that's just a waste of bandwidth and CPU cycles.


Products by JetBrains are notoriously missing.


Meaning that they don't collect telemetry, or that they DO collect telemetry and it's not mentioned?


Opt-in telemetry, as well as unspecified telemetry while phoning home for license checks. I gather from memory that IDEs based on the IntelliJ platform scour the local network in search for other local network-connected machines running instances of their IDEs - e.g. in order to detect forbidden duplication of licenses. It is not apparently clear what details are(n't) transmitted back to the mothership.

This example requires a finer-grained analysis than the broadly chosen "opt-in" and "opt-out" categories to describe telemetry collection, and I wish for this information initiative to improve in clarity and thoroughness (e.g. does VSCode also scour the local network?).

Addendum: Thanks for asking for clarification. The grandparent comment could've been ambiguous in intent.


I did browse VS Code’s source code and I can’t recall seeing anything other than straightforward, albeit verbose, telemetry. When I find the time, I’ll reiterate on clarity! I’ll try to be more thorough, too, if time lets me. I want to verify everything before it goes online, which is time-consuming — an already scarce resource.



I’ll add anything worth adding, regardless if it has telemetry or not. The tools on the site just the ones I tested!


That’s very different isn’t it? Telemetry on an application you are a customer of, especially one with a GUI is expected. Tracking on a framework that you install, maybe even as a third party dependency, is not. Can you picture every major software dependency you use phoning home when you start a project?


The only real differences are who the user is (possibly a developer vs. an "end user") and how many hands are rifling through the user's pockets without permission. In an audience composed of mostly developers the attitude toward these scenarios may be different, but the fundamental ethical calculus is the same.


While tracking is never ideal I don't personally mind tool level tracking like this if it helps the dev, especially for CLI.

What command line switch I prefer seems less problematic than say my browsing habits. One hell of a slippery slope I know, but it does seem to me like a spectrum


The CLI is the last place I want to be tracked. There's so much sensitive data there, and there currently exists a culture of privacy (that's under attack). Most timeless and excellent CLI tools have had zero tracking in them. Git and Vim did not get designed by spying on people's usage.


> Assuming no personal identifyable information is saved, this is legal.

This is not true in Europe Anymore. Any Optout telemetry is illegal because they have to establish a connection to the telemetry server and thus leaking the user IP (which is PII for GDPR) without the user's consent.

See the latest case that even linking a font from CDNs is illegal. https://rewis.io/urteile/urteil/lhm-20-01-2022-3-o-1749320/

relevant part: > It is sufficient that the defendant has the abstract possibility of identifying the persons behind the IP address. Whether the defendant or X. has the specific opportunity to link the IP address to the plaintiff is irrelevant.

Any connection that is not neccessary for the primary purpose of the tool/app needs to have the consent of the user thus all opt-out telemetry is illegal because the tool/app can execute its purpose just fine without the telemetry

Note that this is not a problem only of the toolmaker, this is a problem also for the employers that are exposing the developers to these tools creating unneccesary burdons for them to gather the consent of their employees.


Maybe it is time to revive https://consoledonottrack.com , an initiative to unify the convention of having one standard environment variable to opt-out of any tracking.


This doesn't seem to be correct for PowerShell

https://telemetry.timseverien.com/opt-out/


I may have not tested this thoroughly enough — I’ll be sure to revisit it soon.


A better solution for Visual Studio Code would be to just switch to https://vscodium.com/ instead.


I’ll be sure to list that as an alternative!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: