Getting data about which commands are used most, if they're successful or throwing errors, possible typos, unclear command flags, etc, are useful when shipping software frameworks used by 10s of millions to build products that run the entire Fortune 500 and thousands of other businesses.
This is not ubiquitous spyware, and is about as generic as it gets while still remaining useful. Every software company (many of which are featured here on HN) does the same analytical tracking to figure out how their products are being used and it results in better features. What exactly is the danger here, especially with the lack of personal info and an opt-out method?
They even make the data public along with insights into how it helped: https://blogs.msdn.microsoft.com/dotnet/2017/07/21/what-weve...
Exactly. Command line tools don't usually collect your data and send it off somewhere without asking. You wouldn't expect ls, cat and grep to spy on you. That's why it's so surprising.
But even big interactive applications like IDEs don't normally do this without asking. IntelliJ had a dialog pop up just a few days ago to ask me nicely to participate, defaulting to non-participation (!). Visual Studio and Office used to do the same thing, as I recall.
(Additionally, once you figure out how to opt-out in .NET Core, does it tell you somehow that you did so successfully? I think I've set the environment variable correctly, but how can I be sure?)
Except in Ubuntu Linux where apt-get it has done so for ages without anyone complaining:
I think it comes from Debian actually, where it's also opt-in – my Debian machine doesn't have it installed. Like it says in the Debian package description: "Vote for your favourite packages automatically". It's voting, not spying – strictly voluntary. Those of us who don't want to vote don't have to.
Fair enough, but it wasn’t always like that.
> “Vote for your favourite packages automatically". It's voting, not spying
That’s just playing with words. What it does is exactly the same: provide the upstream author statistics about what is being used and not.
Surely you can’t mean that this is a serious argument.
When people go to vote for their government, is that implemented by sending government agents to eavesdrop on people's conversations to suss out what their political preference might be? No. People voluntarily (this is the core of my analogy) go to vote, to make their voice heard.
That's the difference between voting and spying.
Of course, if popcon were enabled by default, then that would be different and not like voting at all! But that's not what it said on the page you linked.
Imagine I ran a command "make.exe my-secret-project-name" or "make.exe process_gdrp_removal_request SOME_SSN".
And as the linked page admits, they really do collect all command line arguments, not just a white list of valid commands. Perhaps someone accidentally pastes an AWS API Secret into the wrong terminal window and it ended up as "dotnet.exe MY_AWS_SECRET". ("You will notice misspellings, like “bulid”. That’s what the user typed. It’s information").
"My understanding is that this was recently discussed with our privacy team and we concluded that collecting the arguments themselves (hashed or not) is not acceptable per our privacy policies. Not sure whether the code already reflects that, but it's being worked on."
Somebody in the thread for the issue said it's "not acceptable per our privacy policies" and "being worked on" two years ago, but did they ever actually stop collecting command line arguments?
Pushing a secret API key to github, even accidentally, does not happen covertly or as an unexpected side effect of some other tool written by github, and you'll soon realize what happens because you'll see your key sitting there. You also have a general idea of what you are playing with when you start using github, whether it's private repos or public repos.
Having a local CLI tool submit your secret API key as part of telemetry collecting command line arguments is totally unexpected and invisible for the user, and you won't even see it has happened.
While I'm not defending the practice, I don't see how, when you make a copy-paste mistake into a wrong app, you blame the app for receiving the result of that mistake?
Yes. People all make mistakes. Good tools mitigate mistakes, while bad tools (like this telemetry) exacerbates problems.
Does it happen in other apps? Sure. But I would not expect my command-line builds are phoning home what exactly I am doing.
Is the difference not obvious?
When you run make.exe some-program-that-uses-aws YOUR_AWS_SECRET intentionally it is troublesome by itself.
You have a strange notion that small troubles and big troubles are all the same.
OpenJDK is fine however.
The installers I've been getting directly from the Oracle JDK pages haven't installed anything extra, or asked to.
It's of course possible that some users don't experience these "sponsored offers" due to geotargeting, A/B-testing, lack of available "offers" at a specific time, or any other reason controlled by the server side of the installer.
And also for what it's worth, it appears that they are using this ID to correlate what users are doing across their products - as indicated by this comment: "// The hashed mac address needs to be the same hashed value as produced by the other distinct sources given the same input. (e.g. VsCode)"
Edit looks like it can be disabled with the "DOTNET_CLI_TELEMETRY_OPTOUT" environmental variable, following https://github.com/dotnet/cli/issues/3093#issuecomment-22997...
A little while back, an update to Office for Mac caused each application to pop a dialog on startup to share "diagnostic data" with Microsoft. The only options were "Full" and "Basic" data sharing. The buttons in the dialog were "Accept" (once you'd clicked an option) and "Learn More". No "Don't Share Data" option.
Many users complained in the support forums about this, saying they wanted a way to completely opt out. Worse: if you closed the dialog without selecting any option, you were automatically "opted in" to "Full" data sharing (as you could see if you then went into settings and looked at the privacy tab).
This week they rolled out another update. Which sneakily looks like it fixed this; now the dialog has buttons "No" and "Yes", but if you read it the question it's asking is "Can Office send enhanced error reporting?" There still is not a "do not share data" option.
What bothered me is that when I asked questions back, I would get no reply. If they are going to collect this data, the least they could do is help developers out with some aggregate data. I've been selling this product for 10+ years, and to this day, I still vacillate on which version of the .NET framework to target.
There are fair arguments against the nature of opt-in telemetry, but saying "they violate GDPR" is just hyperbole, imo.
No need to panic.
I know the all the spin-doctors Silicon Valley and the privacy-hostile tech companies can afford is trying to make it seem like GDPR makes everything into a giant mess where you need a team of lawyers to just write basic web-server logs.
But just because the spin-doctors are trying to spin things that way, doesn't mean it's true.
This is ofcourse big scale privacy-violators trying to give the general population the impression that the GDPR is "ridiculous", something to disregard, and in the process, trying to sabotage a law which will make it unlawful for them to keep tracking, mining and selling the shit out of your personal information.
They are trying to make you turn your back on someone who is actually fighting for your privacy. That's just scummy as heck.
GDPR is about common sense and basic decency. It works out just fine for everyone not into scummy user-hostile shit. You just need to be up front about what user-data you collect and how you intend to use it.
No user-data? Move along. No GDPR.
Timestamp Occurences Command Geography OSFamily RuntimeID OSVersion SDKVersion
5/8/2017 12:00:00 AM 3 fable Madagascar Windows win10-x64 10.0.14393 1.0.3
6/8/2017 12:00:00 AM 1 fable Germany Windows win7-x86 6.1.7601 1.0.1
4/11/2017 12:00:00 AM 3 user-secrets Vietnam Windows win10-x64 10.0.14393 1.0.0
5/1/2017 12:00:00 AM 1 user-secrets Thailand Windows win10-x64 10.0.14393 1.0.0-preview2-003131
4/3/2017 12:00:00 AM 1 restore3 Peru Linux debian.8-x64 8 1.0.1
That argument can be extended to anything and as such cannot be deemed valid.
I could accidentally post AWS secrets as my user-agent in my browser. Does that mean everyone on the internet should be held accountable to GDPR because I might visit them and my user-agent end up in their server-logs?
Microsoft is clearly showing a reasonable effort w.r.t. limiting what gets logged and how. That should be more than enough to appease any GDPR-concerns. They can't be held accountable for all possible errors or error-modes in the known universe.
If anyone here is claiming they should, what they then also claim is that any software handling user-data must be 100% bug-free to be GDPR-compliant. And that's obviously a ludicrous position.
Stop the nonsense.
There's a difference between not being bug-free, and purposefully implementing a user-hostile feature like this.
Please, try to avoid crediting me for absurd arguments that I haven't made.
Not all metadata is personally identifying. A very careful bundle of metadata might be.
But it still makes a shitty user experience to do something that is counter to what your users expect and want.
Edit: also from : "Hashed MAC address — Determine a cryptographically (SHA256) anonymous and unique ID for a machine. Useful to determine the aggregate number of machines that use .NET Core. This data will not be shared in the public data releases."
I think a machine could be a user in this circumstance? Also a hash isn't anonymous if it's idempotent based on the IP address alone as it's merely a derived value.
Source: I work in IS for healthcare providers
 - https://www.hhs.gov/hipaa/for-professionals/privacy/special-...
One way to improve shitty user experiences is to study them. This requires information.
How do I answer questions like: what is the most frequently-used command? Which command causes the most people to give up? Which argument is being asked for, which already exists? What effect does a change in documentation have? How many projects do people work on and do we need to invest in tools for repositories with many projects?
This is all Product Management 101: form hypothesis and test it.
I really like VSC. But things like this really question all the positive experience. I doubt product management is going to help here.
Will now scan network transfer VSC is causing. In many other cases I wouldn't even bother anymore.
This is exactly what they do.
I'd prefer opt-IN but I fully understand if microsoft don't.
Having something be opt-OUT and not displayed to the user (meaning they are in no position to opt out because they aren't aware they even need to) is of course entirely unacceptable no matter which way you turn it. It can't be in fine print it needs to be in big bold letters on first run, install, etc.
When one of my kids ask for something and I say no, they might ask again. But the more they continue to ask, the more determined I am to say no. The difference between me and Microsoft (in this context), is I'm trying to bring up respectful children.
This is just what Microsoft are doing - judging by the +1's, there's a huge majority of users who just don't want this. Instead of giving their customers what they're asking for, they just dig their heels in further. This is one of the main reasons I started the transition away from Windows since the 8 preview.
They said that making it opt-in would mean less data. Well, doesn't that tell them something?
I'm sure the .NET team would have got the same answers if this where opt-in - and let's face it, the discoveries aren't exactly ground breaking.
If you ask me, I probably would have opted in. I'm sure other people would do the same (if they could). But it's not, so I jumped through the hoops to disable it (added it to /etc/environment). You don't ask, you don't get.
And another thing, why not disable it using the registry when running on Windows? Personal experience tells me that only a minority of Windows devs even know what an environment variable is, let alone know how to set to globally and persistent.
It's arguable whether or not this is personally identifiable. Even if it's not, it still leaves a bad taste as it's still classed as "spying" - they're sneaking the data out by making people take effort to disable it; it's beyond a simple Y/N, people have to actually learn how to do something. And it's also giving people the impression that their own apps will be contaminated.
It's a command-line tool I'm running on my local computer for professional use, not a click-bait website or social media platform.
The only way to sort this out is someone suing Microsoft and then let a judge declare whether telemetry not being opt-in is a GDPR violation or not.
I'm all for continuing the fight, but I'm suprised this very old news is on the front page again.
PS: this should be opt-in due to community request and not by some philosophy or law
I suspect the answer is that no concrete improvements have been made and all it's accomplished is to make them look bad.
"Hashed MAC address — Determine a cryptographically (SHA256) anonymous and unique ID for a machine. Useful to determine the aggregate number of machines that use .NET Core. This data will not be shared in the public data releases."
I wonder why they didn't use something like bcrypt/scrypt with lots of rounds.
That cuts the search space down to 23000 vendors * 0xFFFFFF = 385875945000. With a hashrate of 60000MHs, you could SHA256 hash that entire space in 6.5 seconds. If you have an NVidia GTX 1080, you can do it in ~2 minutes 16 seconds.
This is nothing they couldn't have to through more traditional and less invasive feedback mechanisms.
> Production apps running on machines you don’t control?
Yes I have. Last time I did it via a dialogue that would pop-up asking the user to submit the information along with a text area that contained all the information being sent.
It's possible to get the information you want and respect your users.
Versus "screw you we're taking it anyway".
In some environments this is a massive compliance issue.
Unfortunately it's still opt-out, doesn't show what data it's sharing and looks to be purposefully misleading, so no info from me.
Wasn't it always that way? I remember those opt-in crash dialogs from a long time ago – I think they already had them in Netscape, or at least the (pre-Firefox) Mozilla suite.
I'd risk a guess that the average user doesn't have any problem with aggregate, anonymized data being collected for statistical and debugging purposes. A lot of people here on HN do and they like to extrapolate that to the whole society, but it's a stretch.
So add "it saved the company buckets of money" to the list of reasons this is a better approach.
PS: according to documentation of the feature
Edit: manigandham answer is more correct
They haven't changed a lot. They're still patent trolls, and if they still exist is mostly because they're still living off the momentum of their 90s monopoly.
This is the type of BS that comes out of the great minds at Microsoft: https://www.cbsnews.com/news/hiybbprqag-how-google-tripped-u... <- a great example of how your analytics data might be used.