I guess a big WARNING banner might scare users away, but it's still a bit disturbing to see such a lax attitude towards tools which developers use to work with a software company's most valuable assets. A lot of people, developers included, don't really read EULAs, and it's the same reason (traditional) spyware could thrive: no doubt they all specify in their license agreements the fact that they all collect information, but approximately no one reads those.
Put it another way, would you want your compiler or other parts of your toolchain sending information about all the source files it processed? I wouldn't consider myself particularly paranoid when it comes to security, maybe even looser than the average on HN, and even then I wouldn't use such tools. I wonder how many companies have already banned their use...
Companies have basically DoS attacked the public by attaching pages of legalese to anything and everything.
It's DDoS. You could read one EULA, but there are way too many, it's a distributed attack.
I mean obviously I wouldn’t like any of the information I handle to be sent somewhere, but I also don’t mind statistics about that info (number of files, file sizes, feature use count) to be sent to Microsoft. If I found out that file contents was transmitted then obviously I’d be outraged - but I’m rational and assume Microsoft is too.
Exactly, telemetry on a broad level is important to find out if your software has problems. The problem is what kind of telemetry you have.
I didn't know until I saw it myself. Maybe I forgot some announcement, but I don't need Chrome to role play as an antivirus.
According to "Matt" , It scans the folders related to Chrome. Please do not worry as your data wouldn't be effected by the clean up tool. .. I think reality speaks for itself. My Steam Library just isn't related to Chrome.
Or they forgot and their stance on it changed. I vaguely remembered (or thought I did) having seen it. Just checked my settings and it's disabled. I've gotten way more privacy minded in the last 24 months, so I figured I might have said "yeah, you can collect data" back when I first installed it, but now I'd prefer they didn't.
A middle way would be to periodically remind people, "hey, you've consented to sending anonymous usage data to us so we can make better decisions. I just wanted to confirm that you're still cool with this" once a year or something, because at least for me, "does this app collect telemetry data?" is not something I constantly have in my working memory.
Almost nobody will do this because, if they do, they know people will turn it off. It's the same reason commonly given for making tracking (or "telemetry", as they call it) opt-out and on by default, instead of requiring users to opt-in -- "if we have it off by default and ask users to enable it, pretty much nobody will".
They may very well be correct but, to me, that doesn't make it okay.
Most data collection for "analytics" is just about you as the product owner optimizing the product for whatever goal you want to, engagement, sales, ad clicks, retention or what have you. I as a user that provides the data gain nothing from that and, being the target of those manipulations, I might actually lose (by spending more time on your site than I wanted to etc), but if you can show that I give data = I get better product, I don't think I'd be that hard to convince.
I don't know if VS Code does something in that regard (if they do, they don't communicate it effectively enough to reach me), but I do like their approach in general - I once read through an issue where they discussed changing some component for a more advanced one and pretty much everybody agreed that the newer one was better, but they still had a few words on "how can we measure that? how can we make sure that it's not just us devs that work on this thing all day and it won't be the same for the people using it in their daily life on all kinds of experience levels". I liked that a lot (so much so that I still remember it months after reading), but apparently not quite enough to enable telemetry data ;)
2. VSCodium is just a FLOSS VSCode binary. You could build it yourself from the available VSCode source. However, the VSCode binary is not FLOSS so you cannot be sure what it is running.
It's not like VS code is the next PRISM-- I'm sure MS has better ways to spy on users ;). The real pull is whether you prefer FLOSS by default.
That's equally true of the VSCodium binary. It's not a reproducible build, I have no way of knowing from which source code the binary was generated.
Of course I could build from source, but VSCodium is just VSCode built from source with a build flag set. So in this regard it's not contributing anything notable (and doesn't claim so either).
Sorry, but I don't think I am qualified to explain clearly
Seems to just bring another problem by trying to solve one.
If you dislike outdated software, try switching to a Testing or Unstable version of your operating system, or choosing a distro that packages less conservatively (Arch or Fedora, for instance).
In short: Once you've worked with a large enough number of nighmarish Linux installations, you treat them as adversarial systems and wish you could install software just by clicking through a few screens.
The difference is whether you have to click through one of those wizard things to get the program you want.
So they have created a fork of the project.
I would not trust an installer from a third party without knowing what was really changed, that's scary as hell if you ask me. Just look at the bootstrap 4 backdoor that was introduced but luckily was caught.
I don’t quite understand your point. Do you think that it’s hypocrtical to use and benefit from OSS work and investment? It seems important to respect the creator’s intent and if software is released under an OSS license it’s not dissonant or hypocritical or bad to reuse, or even make money under many licenses. It’s a feature, not a bug.
In the 90s there used to be these companies that sold “internet in a box” in waldenbooks and other stores in the US. It was about $70 but it was all just OSS stuff- trumpet winsock, Eudora, mosaic, etc. I thought it was really crazy because it’s all available for free. One day I was in line behind an old guy who was buying it and he was so happy because it was conveniently packaging everything together and he had no idea how to bootstrap all these tools onto his PC. OSS is designed to allow this.
If you have trust issues with Microsoft, you shouldn't be using software authored by them, as software can have backdoors and security issues hiding in plain sight (heart bleed bug, for instance).
Atom has/had the exact same kind of telemetry, but hasn't attracted this type of hysteria, because GitHub wasn't Microsoft. This is all plain and simple dogma.
So why is it problematic now for other people to build on top of what Microsoft added?
There's absolutely zero problems with people using OSS as intended. But for me personally, as I stated in the original parent comment, value obtained by persisting with Microsoft's VSCode, despite telemetry, is worth continuing to use it, as against an entity's fork whose USP is furthering unfounded FUD.
Telemetry lets you shed all the baggage of supporting the minority.
Less severely, there's an awful lot of long tail business productivity served by obscure software features that is very difficult to satisfy with modern hyper-engagement-optimized tools.
Software products may be developed for a mass audience or they may be developed for a narrow niche.
In either case, making the product accessible to people with disabilities is something developers should try to do.
And in every case, having data on user behaviour, software performance, bugs, crashes, etc, will enable the developer to do a better job of catering to their users' needs.
Have I missed something about how these objectives must be mutually exclusive?
No need to be sorry, it is pretty obvious that it is my opinion, since I made no attempt to support the statement, that is all it could be. That said, it would have been more polite to ask me why I believe what I wrote rather than being dismissive.
> "less effectively" ... "greater expense"
I feel that in order for me to provide satisfactory support of my claim to you, we'll have to first agree upon a strict definition of these terms. You're right to call them out in quotes as my use of them was intended to be qualitative and informal. How about more "effective" development being development that is more focused on serving the needs of its users & the goals of its developers, and "expense" include direct monetary cost, manpower, and any other resources whose use incurs an opportunity cost?
> There is NO telemetry in Linux (the kernel) and many other great software.
1. As I mentioned above, my comment was informal. It was not intended to make a strong claim about all software, without exception.
2. Unless a majority of pre-internet software was developed as effectively and cheaply as Linux and the other software you were thinking of, it is possible that my claim is still correct in the general case.
3. Software used primarily by those who actively contribute to it (such as Linux during its early development) has a very different communication dynamic from other software.
4. Linux is not representative of software developed pre-internet, considering the project was first announced by Linus in the comp.os.minix newsgroup in 1992
* To be clear, any additional claims I have made above are _also_ my opinion.
Telemetry not being disabled
I have disabled telemetry as described in the FAQ. I have set the following properties in the settings:
Now despite of this, when I log my network traffic (with Wireshark) I can see that Visual Studio Code periodically contacts vortex.data.microsoft.com.
For the record,
I still see connection attempts by Visual Studio Code to marketplace.visualstudio.com and vortex.data.microsoft.com at startup.
I'm pretty sure we [VS Code Core] are doing the right thing here and we would love any pointers (all code is OSS).
Which almost sounds like a complete deflection of the issue. The ticket was locked.
I suppose it's not clear why vortex.data.microsoft.com is still on the list of servers being contacted at this point, but it seems quite plausible that this is coming from an extension.
The VSCodium docs (linked in an above comment) mentions baked-in telemetry. Presumable this refers to the binaries rather than the source code, but then the issue is two-fold, being telemetry in the source and telemetry Microsoft may insert into the binaries. So following some FAQ to disable telemetry will not address the latter if you're not building from source.
This is the main problem as I see it and MS hasn't been transparent in question on what can actually be added to the binaries which bring us to the necessity of fork such as VSCodium.
Though there's no explanation for why VS Code continues to ping vortex.d.m.c even after updating is disabled. Hmmm.
I hope this project motivates MS to open source the runtime and making tracking a user- selected option.
I really like Code and think MS has really helped the dev community by making it so great and free. But I would like to see them embrace f/l/oss for all their non-core products and stay on track for customer/dev-friendliness.
* How many times you clicked Edit -> Paste vs Ctrl/Cmd + V. This is not personal information. It cannot be used in any way to identify you.
* Your location/medical information. Guard this with your life if you must. Share it with literally no one. If it's uploaded somewhere, delete it asap.
All our information falls on a spectrum between these two extremes. Let's please acknowledge that not all information is super sensitive, that it's ok to share information like the paste example.
Second, this information can be useful, and can change the direction of the products we're building. For example, the Office team wanted to re-design the old menu bar interface to surface the features people were using the most. You know what people used the menu bar for the most? EDIT -> PASTE. You'd think everyone knows about Ctrl + V, but apparently not. The vast majority of the world used to click Edit, then click Paste. Knowing that users were doing this allowed the interface to be moulded to suit the needs of the silent, non shortcut using majority instead of HN power users - https://imgur.com/a/WLm4UJd.
Third, if you acknowledge that this information is useful, then it follows that it's only useful when it's opt-out. If it's opt-in and only 0.01% of your users opt-in, you can't make any reasonable conclusion from the data because it wouldn't be representative. If you tried to convince folks that no one uses keyboard shortcuts to paste based on this tiny data set, they'd laugh you out of the room. Collecting opt-in data in this case is almost useless.
Fourth, products that collect this info become better. If you're ok with webapps collecting this but not native apps (like VS Code), then the consequence might be that web apps end up being much better than native apps. Whether you prefer that or not is moot, but personally I like having options. Let's not force native app makers to stop collecting non-identifying information when the downside is minimal/non-existent.
Search engines are recording really detailed information as you type, as you browse other websites using their invasive trojan-like analytics. With this data you can probably identify anonymous users from their behaviour (even if the ipv4 address is "encrypted" hah). All it takes is for data from one company to end up at another company and I believe this happens very often.
I think the problem is that you need less and less information to personally identify someone as you have more data. We are not intelligent enough to know what data is identifying or not so therefore the only option we have is to stop giving out our data.
Tell me, who exactly is running ML on this data? They're going to great lengths to anonymize this data. If they wanted identifying information, they'd just upload your unixname, they wouldn't anonymize and then de-anonymize with ML.
I don't even know how you conflated the metrics on menu clicks with uploading paste data. That's ridiculous. Did you even notice you made that leap?
I don't know what data they collect now or what they will collect later so I am just speculating. But I am sure the ToS has a section about how they can change the terms however they wish.
Please tell me about the great lengths they go to anonymize this data because I believe it to be very difficult to do, and absolutely not in their interest.
Edit: With pasting habits I meant if you paste with shortcuts or go through the menu, not the actual paste content. Sorry for the confusion.
So what? VScode can also be used as web app already, it's not about what others do or don't with their online services. The question is about the expectation and respect to the end user, having telemetry on by default is disingenuous to say the least.
One single link to your identity, and you now have user tracking. If course, there's probably easier ways to achieve it.
Very few users change the defaults. To the point where you don't get enough data for useful insights.
This is literally the only reason that telemetry is almost always on by default. There is no illuminati secretly buying developers to learn your secrets via application telemetry, and it's laughable to think that's the case, in my mind.
Tl;dr; people usually accept the default so your prompts yield very different results.
If you don't like that switch to a non-open source editor.
As grandparent OP said, in recent years this has moved waaaaay beyond theory territory and been shown time and time again that <Corporate Sector> + NSA + FBI + CIA + intelligence agencies around the world all employ different ways of collecting analytical data broadcast over the internet.
NSA just taps the servers without telling anyone.
FBI sends National Security Letters containing gag orders preventing companies from telling you that the federal government is now a data sharing partner.
CIA just pays companies for it.
FISA court issues secret rulings justifying the legality of it all.
Whether that bothers you or not is up to you. Most people don't care. I usually don't. I wouldn't say "logic dictates this won't happen" when it is pretty much only the multi-national multi-billion dollar companies subject to this kind of tampering, and most incentivized to monetize the analytics by allowing these and unknown third parties in.
To avoid the security exhaustion, some people would simply prefer their text editor not be "smart", which is a euphemism for internet connected.
It’s not like it’s human sifting through it by hand.
I recall a recent thread here on HN where I learned that the Windows Calculator was sending data ("telemetry") back to the mothership.
Nowadays I think I'd be more surprised if a Microsoft product were released that didn't phone home.
I mean, sure, it might be the case that there were indeed tens of thousands of ALSA users with tracking turned off, but... From my perspective, it seems more likely that it was just a handful, and really there's no way to tell the difference. If you turn off telemetry, and be aware of and accept the downsides.
I trust Mozilla, so my telemetry is on, but for many other applications, I often opt to turn off tracking - with the understanding that it's harder for them to tell what my needs are.
There really isn't such a thing, in the way you've used that phrase.
Capturing telemetry on how I use a tool from within that tool is perfectly fine, to me. Collecting telemetry on my search history in the browser by that same tool isn't. THERE ARE NO INTERIM STEPS that makes the second of those ok. There is no slope. If there is, it isn't slippery. There is a series of discreet decisions and at some point (which is different for everyone) a line is crossed. There was no slope or slip that brought you there, only a series of mostly unrelated decisions.
To think that Microsoft's long-term goal is to install a keystroke logger via a multi-decade and multi-phase plan that begins with application usage telemetry in a free developer tool thanks to "a slippery slope" is just simply not realistic.
By itself no, but it is often used fallaciously. Such as in this case, when someone is opposing some good thing on the basis that that good thing might, some day, lead to bad things.
Our entire legal system is predicated on common law precedent. So it is very valid in many cases to argue that allowing something good now, might set us up for something very bad later.
You think people elect for bad things? Bad governments, bad software or privacy violations? They chose things that are full of rhetoric and promises of good things then those bad things get snuck in off the back off relaxed regulation, or existing software adoption, etc.
Take Facebook as an example, people didn’t sign up to it thinking “I wanted to be tracked around the internet so I can have personalised adverts” nor dis Zuckerburg think “Wouldn’t it be good to create a platform that could latter be used for rigging elections”. No, instead we got there because of a serious of good ideas that slowly got abused.
There is a saying that goes “The path to hell is paved by good intentions.” I’m not a religious man but I think that beautifully illustrates how slippery slopes are not a logical fallacy.
No, that still falls into fallacious usage.
The user doesn't justify why the steps of their assertion follow after the other. Just that they... do.
Over the course of the last 20 years, we've seen that once a data collection and digital surveillance framework is put in place, the surveillance tends to expand.
Slippery slope arguments, sans good reasoning, tend to be fallacies. However, don't fall into the trap of thinking that an argument backed by historical record is a slippery slope just because it's predicting an outcome. We might call that the "history is all slippery slopes" fallacy. Stating "this has happened before multiple times before, and each time has lead to x" is a very different argument to stating "this has happened, so the logical extrapolation is x".
For example, I slept in 'til 8:30 today. OMG, a slippery slope. Next thing you know, I'll be sleeping until 3PM. Til midnight! I will never wake up again. But as it happens, sleeping in isn't a slippery slope. I don't think there's any solid evidence that telemetry is either.
“... and that was me before we had children.”
> Lol, probably just an oversight because they made the search a lot better.
> So I think it only does this on the settings file and not on other files ;)
>> it only does this on the (VSCode) settings file and not on other files ;)
The "page" in settings consists of just two options, and the complete descriptions of the types of information they collect are "crash reports" and "usage data and errors". That seems the opposite of transparent and granular. Am I missing something?
There is an application in the Microsoft store you can install that lets you view all telemetry collected by Microsoft, and the actual data is encrypted. The metadata (which tells you WHAT is being collected) is not. Knowing the people I have worked with in the past, this is to securely prevent modification by users before the data is actually sent. I've worked with lots of people who would modify that data to attempt to get a feature added that they wanted or just to screw with MS.
That tool also gives you the option to delete all telemetry sent from that machine in Microsoft's possession.
Setting that to "Trace" will show the telemetry being sent in the Output pane of the interface, amongst a bunch of other stuff, I am sure.
The data could be accidentally broadcast or left in a vulnerable place. Even the payroll data for the national security establishment was once reported to be compromised. Everything is vulnerable. Computer science is in such an abysmal state.
Even with properly configured servers, OSes, and databases that are up to date, they are still vulnerable to zero-day attacks because they are not formally verified and have enormous and largely unnecessary complexity. Then throw in the crazy complexity of processor instruction sets, creative side-channel attacks, and stuff which exploits the physical properties of the hardware (rowhammer).
It is reasons like these why we should never really trust transmission of sensitive data over the internet. The concept of secure voting systems, for instance, is literally a joke. insert obligatory xkcd here
"Logic and critical thinking textbooks typically discuss slippery slope arguments as a form of fallacy but usually acknowledge that "slippery slope arguments can be good ones if the slope is real—that is, if there is good evidence that the consequences of the initial action are highly likely to occur. The strength of the argument depends on two factors. The first is the strength of each link in the causal chain; the argument cannot be stronger than its weakest link. The second is the number of links; the more links there are, the more likely it is that other factors could alter the consequences.""
I can also see them collecting code searches done within the app as a way to check if their search system is working well for real use-cases.
Neither is outside the realm of possibility - you just have to put yourself in the mindset of a dev who is assigned to track down a rare crash or to “improve the search experience” who might want a little more data to work with.
Not saying I agree with any of this collection - it’s terrible and definitely falls under “the road to hell is paved with good intentions”. Companies should be extremely clear about what they will and won’t collect - and never cross the line even if it would be useful.
Telemetry is used either with naivety or malice. There is always some risk to the user.
That being said, it’s somewhat amazing we are now in a time when a Microsoft product could have aspects users don’t approve of and rebuild it without it: everyone wins.
I can think of two reasons for the data to be made public :
- for trust and transparency purposes : It is for the same reason than an election system should be observable and reproducible, from data collection to final decision, including counting methods, etc. Otherwise it would be like "trust us, the data shows it" and you don't show the data to anyone to prove it.
- for coordination and sharing knowledge : some people might interpret the data differently, chose to focus on a niche market by doing different bets than MS, and create a complementary editor to the one from MS. MS has no obligation to support minorities, but someone else might be interested, and those minorities are detectable in the data
Not only, that's the main problem. And I can't simply trust it for company without dark M$ reputation and EEE experience.
I hope this is sarcasm. If not, what did I miss? This is the same empty phrase that facebook, google, etc. use. Why is Microsoft more trustworthy in that regard? I am 100% sure they use the data to make money in short terms or in a long run. They for sure use it to make VSCode better, but only to get more people use VSCode and make them dependent on it. VSCode is a prime example of the Embrace, Extend and Extinguish strategy. I already see them grasping for the Python community.
A need for telemetry indicates that contributing is too difficult.
Consider doing that before running binaries from more or less unknown sources.
I don’t remember the exact gulp command off hand, but if you check the gulpfile there are myriad build configs for full minified packaged builds.
I think I would be okay with companies like Microsoft collecting data on me if they make it more clear what data they are collecting, what they’re using the data for, having the ability to disable data collection (defaulted to disable preferably), and being able to download and have a guid to understand my own data.
As a dev myself I know all of that is difficult and sounds ridiculous, but I really do think we have a right to the data collected on us and on our behavior. Transparency, ownership, and access, that’s all I ask.
Does anyone know what exactly is being tracked?
But, speaking as a developer of an open source software product that includes telemetry, I expect they're tracking really basic stuff, like: DAU, MAU, edited file types, project size, crash reports, etc. Basically, information that helps to internally justify the continued existence of the project, and data that lets them better prioritize resources on the project.
And then what they do with the data is also left unrestricted by phrases like "ways we use the data include..."
And then when you point this out, everyone tells you that you're being paranoid and they're just covering themselves and don't be silly.
And then when they do precisely what their policy legally enables them to do (cough Facebook cough Cambridge Analytica cough) everyone is aghast.
This is basically down to the reputation of the vendor and what information I can guess they gather based on what sort of outrage they would face if they cross a line.
There are two messages here:
1. Legal. Basically a catch all that says they might sample your blood in the future
2. Non-legal e.g developers. Says they gather harmless statistics.
Obviously #1 smells. But that’s how US corporate legal culture works. The judgement I have to do is whether the vendor can be trusted to do only what they say in message #2. I wouldn’t trust all companies in this respect, especially not those that trade in information like Facebook, but I do give Microsoft the benefit of the doubt.
I've got one Windows machine here, just so I can run one specific client that I have to use for an internally-hosted application. That machine doesn't have a default route, just a single static route that lets it communicate with the (internal) things it needs to and, just for good measure, there are firewall rules (on the router connected to my upstream) that block any traffic to/from this machine and the Internet. (Sadly, I would not be surprised to learn that it can "fallback" to using DNS queries or some such to report back to the mothership.)
I think we'll eventually get to the point where, in general, devices won't have a default route. It might take a while, though -- currently, way too many people are still completely okay with every device and application they use spying on them and reporting back on what they do.
So-called "default deny" firewall policies for incoming traffic are pretty common nowadays. I can't wait for "default deny" policies for outbound traffic to become standard as well.
From my experience code (Arch) is entirely debranded, so I can't imagine telemetry was left in.
Both pull from MS source and are prebuilt, so there should be no telemetry in either. This explains it well:
In case anyone wants to compare the minutiae:
This can certainly be the right call for a project. But maybe it's an untapped opportunity for the broader community?
Has anyone looked at running a tracking fork, say mechanically massaging vscode into a monorepo?
As I explore opportunities for coding inside VR, having a more integrated ecosystem for creating IDEs would be nice.
I just don't think there's a big evil conflict of interest or whatever for this stuff to get slippery.
No yet anyhow.
there is no hope for you then I suppose
Additionally, a URL is most likely not copyrightable.
Flathub and Arch Linux are both releasing open source builds of vscode with this configuration:
The commit was quickly reverted, but they didn’t rewrite history to totally remove it, and now here we are.
If at some point I want to turn my application more commercial friendly by amending the license or changing it completely, and people tell me "No haha sorry you released as MIT at some point so it's now free forever mate" I would get pissed off and stray away from open source altogether. At least this is how the "you can't revoke a license" argument feels to me. But like I said, past versions that include a specific license should still be governed by that license.
A better way to do that would be to add some trivial access protection (like a password they didn't accidentially publish). No matter how weak, any attempt to circumvent it would violate anti-hacking laws in most jurisdictions.
> "No haha sorry you released as MIT at some point so it's now free forever mate"
For you existing code that's exactly how it works. Otherwise the concept of licensing something becomes close to meaningless. Imagine Google releases Kubernetes as open source, you build your business on it, and suddenly Google turns around and says "just kidding, everyone who wants to use Kubernetes after next monday has to pay us absurd licensing fees". Using anything open source would be an insane risk if that was possible.
Instead what people usually do is to say "everything I do from now on is closed source. You can maintain a fork of the old version, but good luck keeping up with my version". Or alternatively "everything I do from now on is under [GPL/AGPL/similar restrictive license], if you want to use it beyond that contact me for a more permissive license deal". You can give people more permissions on things you own, or attach fewer permission to new things than you did in the past, but you can't take permissions you already gave away.
Yes this is what I was describing as reasonable. "Everything after this is governed by X terms" is reasonable. But the whole thing can sound like even if you change terms, previous licenses would still apply, which would be wrong.
Though it should be added that I'm just expressing the common understanding, barely anything surrounding open source licenses was ever actually tested in court. There are also some obvious legal positions that would completely change this: does every change need to state the license, are open source licenses actually legally binding etc. However nobody would ever argue those positions because they are detrimental for everyone (ok, the latter one was once argued in a GPL trial, but the court decided not to decide on that)
It doesn't help anybody except probably the one guy who 'Showed Microsoft'.
Telemetry helps improve products and opt-out means the product company will miss out on the behavior of power users thereby not being able to optimize their software for their usage.
Did you mean to write "opt-in", or are you suggesting it's OK for software to refuse the choice to opt out of data tracking? "Power users" simply do not use such software or block connections via firewall.
You can hardly dismiss all telemetry concerns as invalid when "telemetry" is a catchall term for most any data collection. And the other side of it is the user and being in control of the software they run, which entails opt-in or at the minimum leaving the option to opt-out.