Hacker News new | comments | show | ask | jobs | submit login
Microsoft/.Net Foundation added telemetry to the dotnet command line last year (opinionatedgeek.com)
142 points by mel919 on July 24, 2017 | hide | past | web | favorite | 92 comments

For reference, they collect[1]:

    The command being used (for example, "build", "restore")
    The ExitCode of the command
    For test projects, the test runner being used
    The timestamp of invocation
    The framework used
    Whether runtime IDs are present in the "runtimes" node
    The CLI version being used
I'm actually OK with this to be honest.

Here is the telemetry code itself: https://github.com/dotnet/cli/blob/5a37290f24aba5d35f3f95830...

They also publish all the telemetry data (Change 2016 and q3): https://dotnetcli.blob.core.windows.net/usagedata/dotnet-cli...

1. https://docs.microsoft.com/en-us/dotnet/core/tools/telemetry

Also, when you run `dotnet restore`, you get the following message:

  Welcome to .NET Core!
  Learn more about .NET Core @ https://aka.ms/dotnet-docs. Use dotnet --help to see available commands or go to https://aka.ms/dotnet-cli-docs.

  The .NET Core tools collect usage data in order to improve your experience.
  The data is anonymous and does not include command-line arguments. The data is collected by Microsoft and shared with the community.
  You can opt out of telemetry by setting a DOTNET_CLI_TELEMETRY_OPTOUT environment variable to 1 using your favorite shell.
  You can read more about .NET Core tools telemetry @ https://aka.ms/dotnet-cli-telemetry.

  A command is running to initially populate your local package cache, to improve restore speed and enable offline access. This command will take up to a minute to complete   and will only happen once.
Sure its enabled by default, but at least they clearly notify you about it. So its strange that the author says: 'I’ve been using the dotnet core since well before then and I never knew about this.'

The author must not be used to the new spyware-by-default mentality coming from Microsoft.

Hard to believe, but they used to sell products a while ago and had no telemetry.

If you want to see how it's done properly, look at OmniGroup: their apps have toggleable telemetry and it's off by default.

@blub can you explain me how it's exactly "spying on you"?

There is difference between collecting information about how many people are using vs whether a particular person is using.

Collecting diagnostic information from windows application failures/how many failures etc are there ever since Windows 95 era.

Similarly, collecting information about how many people are using dotnet core build/test/publish is similar to how Google/Mozilla tracks how many users are running which version of their product and experience issues.

If Microsoft/Google/Mozilla or any other company uses that information to identify a specific person is "effectively spying on you". Until that's not there, the same functionality exists in almost every product. Just click bait article.

Spyware is software collecting information about someone without their consent.

Doesn't have to be malicious, doesn't have to be what's legally defined as personal information. The fact that many companies are doing it doesn't make it less inappropriate.

Reputable companies will clearly inform users and ask for their confirmation. Then they respect their choice.

Disreputable companies such as MS or Google take without asking, use dark patterns to trick users, default to always on, reset privacy settings, etc.

As someone who has removed my fair share of spyware infections I'll say "easy now".

I think I'll be happy the day EU and American consumer protection agencies start looking closer into Googles business.

I'd also applaud even more visible information about what exactly gets collected and sent (the old gds "Read very carefully - this is not the usual yadda yadda" would be a good start).

However IMO we shouldn't call legitimate telemetry "spyware". I thing that is what you call "crying wolf".

Mozilla asks you, whether you want to send the telemetry.

If you say no, it won't send anything.

No, the settings do not mysteriously reset themselves.

Firefox tracks users with Google Analytics in the add-on settings | https://news.ycombinator.com/item?id=14753546

  "Someone submitted a PR to Mozilla to fix this, and the Mozilla devs closed it"
Impossible to opt-out until about 2 weeks ago.

Come on, that was a bug in the new preferences pages.

The telemetry I was talking about is exactly the one, where you get a bar at the bottom during first launch. Try it, you will see it.

Perhaps this instance was an honest mistake.

The specifics of a custom deal with Google and the circling of the wagons (specifically opinions expressed by multiple Mozilla employees in an official capacity) prior to reversing course does not strengthen that case.

> If you say no, it won't send anything.

This simply wasn't true; I am glad that the implementation was fixed.

> The author must not be used to the new spyware-by-default mentality coming from Microsoft. Hard to believe, but they used to sell products a while ago and had no telemetry.

Yeah well you and the author's first clue should have been when you stopped paying for said products.

And in this specific case, it's really not spying, it reveals pretty much nothing about you and help them figure out what is used and what fails.

I use a ton of software I don't pay for which also doesn't spy on me.

Or are you making some weird accusations against the FSF and the GNU Project?

Give them time to find new "opportunities" to monetize...


What use is off-by-default? Who turns telemetry on?

Debian pop-con is opt-in. http://popcon.debian.org/

This says that telemetry is the wrong solution then.

Why? Defaults are important and the vast majority don't care (assuming correctly selected telemetry data) and the majority can't be bothered to change the default in either case.

Again I am making a huge assumption about correctly selected telemetry data here but opt in mechanisms won't get even 10% of the data they currently do.

Defaults should respect the user first. Consent has to be given, not taken as a default.

Sure ask up front explicitly but don't in passing invoke the first capture before consent has been taken. That's a shitty tactic.

Collecting basic usage data is not disrespecting the user.

It is when you know you can't persuade them you have a good-enough reason to need it, so instead you don't even try.

In your opinion. Mine differs.

That's not quite correct. More is collected, but the docs are still being updated. [0]

The other things being collected are:

* Geographical location

* Operating system and version

[0] https://github.com/dotnet/docs/pull/2706/files

> For reference, they collect

That's not all that matters. IMO the real decision is: do you /trust/ MS ? Do you trust that they anonymize collected data and that they won't secretly change collected data? Do you trust future MS with that information.

> I'm actually OK with this to be honest

That's perfectly fine if you trust them. Many people don't. Personally I wouldn't trust any dev tool that uploads my usage.

>That's not all that matters. IMO the real decision is: do you /trust/ MS ? Do you trust that they anonymize collected data and that they won't secretly change collected data? Do you trust future MS with that information.

You don't need to trust them. The telemetry code is open source AND they release the aggregate data it collects for anyone to use/inspect.

If it's completely open, how do they keep it from being spammed?

> do you /trust/ MS ?

Why do you have to trust MS? You can read the source code to check for yourself whether sensitive information is sent. You don't have to take Microsoft's word for it.

> That's not all that matters. IMO the real decision is: do you /trust/ MS ? Do you trust that they anonymize collected data and that they won't secretly change collected data? Do you trust future MS with that information.

Bear with me. This seems like the wrong question, but not for the reason you might expect. Rather, I think that it might be wrong because, even if Microsoft acts in completely good faith, it is damn near impossible to anonymise collected data properly [obligatory citation of the 'anonymised' AOL search data]. It doesn't matter whether I trust someone to do something if they (probably) can't do it.

So I assume you don't use web apps.

These tools are not web apps. They work entirely on a local machine. Their fundamental mode of operation is not to run on a remote machine.

Thanks. As I was scanning through the article, this is exactly what I was looking for but couldn't quite see for all the salt.

And, the 'secret' environment variable to disable it is actually printed in the text of the last (installation successful) dialog of the install wizard, at least on OSX for the 2.0.0 preview...

do you actually inspect every github commit, that this won't change?

One of the items added:

> +- Geographical location†

I feel like that's one of the pieces of information I'd expect a new opt-in or notification to appear for at the very least. Did that happen?

well even if they do that now, there is no guarantee that a future release will remove notifications.

just look at the automotive industry in germany. if you give them trust, they probably will do shady stuff, no matter how good their initial behavior was.

never trust a company.

Well, as long as you make sure that the project name doesn't give away anything that could compete with a Microsoft product or that would leak information about some confidential product you are working on...

It's not just independent devs that are using .net. And the name of the company appears often in the assembly.

So this is yet another case of someome blowing something completely out of proportions and spending their time working on something completely useless that will never benefit them.

"Out of proportions" for now. Nothing stops them from changing this later, updating the small print saying "oh we changed that" and blaming you for not checking for changes to their EULA regularly.

If this is your fear how do you use any software?

MS could update your OS to do anything tomorrow, Canonical could hide some literal malware in any number of packages for Ubuntu tonight, Intel could write a backdoor into your machine in it's next microcode update.

And OSS doesn't fully prevent this either. GCC could add some kind of nefarious exploit in the next version of it's compiler (knowingly or otherwise). Just take a look at the underhanded c competition for just how scary easy it is to hide exploits in plain sight!

I can't even fathom the amount of work it would be to personally review every line of code that goes into your machine from the microcode up to the newest NPM module (even if it were all open and it was possible to do). At some point you need to trust someone else.

That's why betrayals of trust - such as adding spyware that takes data without the user's informed consent ("opt-out") - are such a big problem.

You're right - there isn't enough time to audit everything, so we have to rely on trust. "Relying on trust" means instead of reviewing code, you have to review trustworthiness.

Coming from Europe, I'm a little worried by the general attitude here. We tend to side with privacy first. There are some real genuine concerns from real people like myself who have to work with this tooling. I'll detail my thoughts:

1. It's setting a bad precedence for data collection by default. Name one other tool of the same class that actually sends telemetry data home by default?

2. It's much harder to ensure that the tooling is compliant with data protection policies within an organisation if the tooling by default sends telemetry. We now have to assume it's going to send stuff by default and configure all build infrastructure, every developer workstation and every piece of the toolchain independently. This is particularly of concern in the finance sector. It also costs us time and money.

3. There's no test cases to cover the telemetry functionality at all. Check the code. What happens if it starts reporting command lines due to a trivial defect.

4. There is a crudely defined document which describes what the telemetry does, but not what it will do in the future. What happens is a PR appears, gets merged and gets pushed out to a new version. To find out what happens you have to read every merge, every PR for a release.

This is a loaded gun waiting for any security conscious team to shoot themselves in the face with. Really this will gate the product into the bin at the first technical review stage for a lot of companies. There is no appetite for being milked.

I'd also like to add the absolute zero communications on this front from MSFT. People have asked directly via PRs to turn this off because they do not want it and they have been ignored for over a year. The usual response from MSFT is never to respond directly to this question and instead outline what the telemetry does expecting the question to remain answered. If there's anything I've learned over the years; you can't trust anyone who won't answer a direct question.

If you are so against telemetry and google analytics specifically maybe you should remove it from your own site?[0]

[0] https://imgur.com/a/NX2Gc

I'm not the author of the blog post. I think you're comparing apples and oranges, also this kind of reasoning is an example of "tu quoque" logical fallacy.

> tu quoque

No, this is not that. The "tu quoque" logical fallacy follows this pattern (from Wikipedia):

   Person A makes claim X.
   Person B asserts that A's actions or past claims are inconsistent with the truth of claim X.
   Therefore X is false.[2]
They are not saying their claim is false. They're saying that if they care so much, why are they subjecting their users to tracking that they are unable to opt out of?

I've struggled with this before on this site. People love to pull out fallacies. But they forget that fallacies are only fallacies if they are used as a counterargument. And even when they are used in such a way, the other side then has to deal with fallacy fallacy. You can't immediately discredit an argument just because it contains a fallacy.

It wasn't stated explicitly. The assumption that I made was it related to the discussion about validity of the topic. Assumptions can be misleading but human language operates in a context. Formally, should the context be taken out, to operate only on the words of the post - you're right - it is not "tu quoque".

What's with this exaggerated blog post?

1. It was announced in the open in June 2016 that .NET Core includes telemetry: https://blogs.msdn.microsoft.com/dotnet/2016/06/27/announcin... 2. If you use something you could at least follow changes between major releases, no?

When did engineer stop being responsible people and read before using things? :-O

Here's what Microsoft have learnt from the telemetry [1].

[1] https://blogs.msdn.microsoft.com/dotnet/2017/07/21/what-weve...

I think it's noteworthy that they even include command line arguments that are mistyped, for example "bulid".

What happens if you accidentally paste an AWS secret key or similar in the middle of a command line argument? Will that too appear in public csv files a year later?

Hi. Team member here. We used a simple algorithm to prevent that. We essentially got the data itself to vote on what a real command was for exactly this reason. This means that a lot of people typed "bulid" since the vote passed on that one. I don't have a count, but many rows were not included in the data since they didn't pass the minimum threshold for being a real command. Imagine you spelled "build" backwards for some reason. That would have been quite uncommon.

This is one of my objections. There are also no test cases for this piece of code either.

They don't include command line arguments (yet). They include the command verb (dotnet [build/restore/etc]) that was ran.

Well, it looks like they are including "command verbs" even if they are mistyped, for example "bulid".

What happens if you accidentally paste an AWS secret key or similar in the middle of a command verb? Will that too appear in public csv files a year later?

See my comment to the grandparent comment on our approach to only including common command strings (which wouldn't include anyone's AWS key). Also, and more importantly, we will only collect known arguments. From the blog post:

> Only known arguments and options will be collected (not arbitrary strings).

We don't want your AWS secret key in this data as much as you do. We have put systematic mitigations in place to ensure that this doesn't happen.

We are struggling, philosophically, with anti telemetry posts (like this one).

We are turning on telemetry in the next release for our open source tool. https://github.com/getgauge/gauge

We are small team with limited resources.

In our tool, it's easy to turn telemetry off, inspect what data is sent and the data collected is public.

The data "really" helps to make the tool better and an opt-in skews the data.

We've published an blog post https://blog.getgauge.io/why-we-collect-data-b19df366b677 and will put it up in the release notes and the download section.

What else can be done so that users don't blow up?

Let's just be clear that it's entirely OK to add telemetry to your code. The objection here from most of us I suspect is that it is on by default. If you package a tool so it does an unattended installation in some way i.e. via a package manager etc, the default state of the code should be opt-out of telemetry. If you have a GUI installer, ask the user if they want it and outline the benefits and what you collect.

If you get an uptake of say 5-10%, if that's worth it then problem solved. If it's not then don't bother adding telemetry to start with.

But before you do this, you have to ask the question: how did the software industry get by before the sudden rise of telemetry? It engaged the customer.

I think a lot of cases it is used it is used as a substitute for engaging the customer.

Indeed. Though on-by-default telemetry gets a different set of data than engaging with the customer.

If adding telemetry is faster and easier than engaging with the customer, then you'll see projects that add telemetry that wouldn't otherwise have the bandwidth to engage with the customer.

In general, I think the best way to go is to ask in the installer or initial setup, whether you want to send telemetry, and have a sane default according to whether you gather potentially personal information (location? personal, commands run (without args), not personal).

Example Prompt: Send telemetry (commands used, version) (y/n)[y]:

We've experimented asking users on install or initial setup. But most of the time our tool runs in non interactive environments e.g. on a CI/CD set up (install and execution).

An additional flag for non-interactive installs can solve the problem, but that's a broken setup experience, someone has to look up the documentation after a failure to install.

Turning it off by default in case of a CI/CD setup means losing most of the data.

If you're setting up automated non-interactive installs, your job is to check for install failures and consult the documentation; I know that's part of my job.

I'd recommend a required installer flag forcing the user to make a decision, but I'm a user who generally leaves telemetry on.

Thanks for all the suggestions.

Here's what we are doing next release (out this week)

* Ask if the user wants to opt out in the graphical installers.

* Print a message after non-graphical installs about data collection and link to documentation on how to turn it off.

The minimum should be a clearly presented option to turn off telemetry either during install or at first startup.

Just a mere suggestion: you may look into how it's done in yeoman (https://github.com/dotnet/cli/issues/3093#issuecomment-22034...) and read this https://github.com/dotnet/cli/issues/3093 as there's lots of user input there.

On mac you can always use little snitch (https://www.obdev.at/products/littlesnitch/index.html) to reliably block outgoing connections. No need to muck around with environment variables, and you don't have to guess which domains dotnet uses, little snitch will tell you, even if they change them in the future.

I'll just set the environment variable thanks.

Haha, who makes sure that dotnet actually honors the env variable? It could still connect to servers and exfiltrate data.

OTOH nobody gets around a firewall which blocks all outgoing connections ;)

This probably feels more unusual in the world of shell-based development tools - not many these days blink an eye for this sort of behaviour from an IDE package. Still, as a .NET core fan, definitely not a fan of this practice. To be expected from Microsoft, though - they bet big on telemetry in their tools and encourage developers to do the same (through tools like App Insights, for example).

My impression is that no-one uses Application Insights. Total of 643 questions about Application Insights on SO, either it's the easiest tool to use ever, or no-one uses it.

Though this data is more or less benine, the point remains. I don't think it's appropriate for a tool like this to phone home, and if it did, it should at least be opt-in, not opt-out (especially considering the opt-out mechanism is something as clumsy as setting an environment variable rather than a config somewhere).

This tool compiles code. Why does it need to make a network call at all? That's going to slow down your builds for the sake of phoning home to Microsoft, a company we don't exactly trust for being good stewards of our information.

Come on folks, this is printed out on the use of the command and basically any site today does more intrusive telemetry.

I think they should ask people like Yeoman, but I don't think they deserve this much shit for such a small thing.

> and basically any site today does more intrusive telemetry

So the next version of Bash should have telemetry?

No but it's big differences in the projects. If Powershell would have gotten telemetry I would understand the objections.

There is an earthshattering difference between a website, a place I go to let someone else run code, and a build tool I use to run code I write.

What? You run the code in the browser when it comes to javascript just as you run the .NET SDK. The difference is that the .NET SDK tell you that they send telemetry, how to disable it and what they store are not really any sensitive information. Most websites run code with the sole purpose of identifying you.

JS engines are supposed to be sandboxed, and have limited APIs to draw from. Unless you use a jail, a local application can do just about anything.

The difference is expectation. I expect websites to run things I don't control. I expect a local application to behave in a certain way.

Well yes, and it behaves as expected, don't it? They are very open with what they are doing and you can build from the sources if you do not trust the binaries.

The point is, I understand why people dont want telemetry. I don't. I think they should ask before they do it, a lot of people are probably willing to share the data. BUT I also understand why they are doing it and I think they've done it in a good manner still.

You should also think about your expectations, one shouldn't have to expect that every site is trying to track you.

My point is:

> and basically any site today does more intrusive telemetry

has absolutely nothing to do with a local application.

This has been discussed for over an year on this issue: https://github.com/dotnet/cli/issues/3093

They are just ignoring to let the issue die silently.

It's somewhat ironic that he feels so strongly about privacy but when I hit his site I get this message: "This website uses cookies to ensure you get the best experience on our website - More info" that links to Google's policy. Regardless, as folks point out, you're notified https://news.ycombinator.com/item?id=14837097 so it's not clear when he missed this.

My favorite actually is https://github.com/dotnet/cli/pull/3494 . Of course sending things like IP addresses is unavoidable. I should also mention https://twitter.com/NerdPyle/status/863456558172168192

"You should be able to run a command that doesn’t use the network, knowing that it won’t open a network port." Is the reader supposed to stop reading there? Because they must be using a different dotnet than everyone else, considering microsoft's dotnet does package management and download iirc?

"I don’t want your tools spying on you either." how virtuous. Some people don't care though, some people actually prefer it

"I don’t want your tools spying on you either." how virtuous. Some people don't care though, some people actually prefer it

Then it won't be a problem to disclose exactly what is proposed, get those people's informed consent, and leave everyone else alone, will it?

"telemetry", what a euphemism.

I'm not sure what their goal is with this data.

Do they want to use this data to create a good tool?

Or do they want to use the data to create a tool that appeals to the average user?

tl;dr please?

Microsoft introduced telemetry enabled by default to .NET Core CLI.

The OP is not happy with the fact collects the telemetry data when you use their .NET tools and demands that they stop. (I'm almost sure this post will be featured on n-gate.com.)


> And that's easy - it's a setting. (Non) problem solved.

I still don't think this is a non problem. When you are using many different tools that are updating constantly, it is easy to not notice one adding telemetry. And even if you disable it, it very well may be silently reenabled in the future.

My opinion is coloured by the fact that I think the telemetry gathering is harmless and in fact useful, so not worth getting worked up about.

In fact if you use a product why would you want to conserve your 'precious body fluids' (telemetry) instead of helping improve the product? Beats me.

"Telemetry" ... nice spin.

People remain the same people and companies remain the same companies.

It's in microsofts DNA to build stuff that captures and watches and monitors and logs.

Just because they've started to be more open, won't change the fundamental company attitude and approach to doing things.

Microsoft will simply be bringing more "Microsoftiness" to the open source world. Get used to it, there's more coming cause that's the way they build software.

I would suggest that it is time to rethink some of those outdated assumptions that tools won't spy on you. Microsoft have arrived at the open source party, so open source isn't the same any more, just accept that the world has changed and now it's entirely possible that your open source is logging and watching.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact