Hacker News new | comments | show | ask | jobs | submit login
Want to spell check? Read the fine print (samnewman.io)
248 points by exolymph 482 days ago | hide | past | web | 126 comments | favorite

there is a rather large amount of traffic generated from my machine for things like Sonos and Dropbox and the like, but eventually I tracked down what was being sent. Sure enough I could see all the text being sent, unencrypted, over HTTP

Another consequence of the "cloud everything" trend. I feel like it's almost a deliberate plan to make everyone's machines constantly send and receive data over dozens of active connections, so the odd one occasionally sending out something that shouldn't be will easily "get lost in the noise"... You could even say that at least this time they were nice enough to tell you and send data in cleartext so you can easily see what's going on. Imagine if it used HTTPS, added another layer of encryption/obfuscation on top of that, and the notice was buried deep in a long license agreement, how long would it take someone to discover it?

What astounds me is that 20K(!) Visual Studio users --- so presumably NOT the "average barely-computer-literate" user we often like to think of as being the ones to get fooled by schemes like this --- probably saw the notice, but didn't give a second thought to installing something like this? These are developers, the people writing today's and tomorrow's software. That makes me sad and scared for the future of privacy/security.

Then again, Microsoft's official Visual Studio privacy policy isn't all that much more reassuring:


For pre-release and free versions of the software, users cannot opt out of usage data collection.

Bluntly stated, "You're the product."

What if I want to be the product? I could pay for Visual Studio Enterprise/Ultimate or whatever, or just accept that my usage is being tracked by MS. As long as they aren't being so egregious as to send the contents of my files, or sending over an unencrypted connection, I don't see why the makers of the product, offering it for free, can't track basic data.

They're not offering it for free then, they're offering it in exchange for your data, and that should be made clear when you make that transaction. You can't have a fair economic exchange when one party is deceived about what the costs are to them.

Who's being deceived? As GP points out, it's in the privacy policy: "For pre-release and free versions of the software, users cannot opt out of usage data collection."

There isn't enough time in the day to read all the privacy policies and terms of service for all the things that we use. Just because something is written in the small print does not magically make it OK.

I agree mostly, but come on, whenever a big company like MS, Google etc puts this in privacy policy, anyone on HN learns about this pretty quick,and tbh I don't think anyone will upgrade to paid version for preventing basic usage data collection

Your data is a form of currency. Spend it however you wish.

It's probably worth pointing out that this is Visual Studio Code, rather than Visual Studio - the audience for the former is fairly small by comparison!

I've never used wireshark, but fwiw it's trivial to filter by destination with, for example, tcpdump.

The problem isn't filtering but determining what to look for - a lot of these are hosted on things like AWS or some CDN, which means machines with very generic hostnames, and you'd have to catch a meaningful DNS lookup to get started. If the traffic is encrypted, you still have no great idea what's actually being sent (is it fragments of the file you're working on, which keys you've pressed in the last 10 secons, or an automatic update check? They could all be similar sizes), and if the application is doing security "correctly" it will be very hard to MITM.

It's actually pretty easy to mitm your own https with tools like mitmproxy: https://mitmproxy.org/

But in this case getting the application to use the proxy may have been tricky.

Wireshark's display filters are much nicer than tcpdump's BPF; a much simpler language definition.

Known as "capture filters" in Wireshark and as easy as: "host www.afterthedeadline.com".

yep cloud everything... noticed a while ago (after installing little snitch) that anything typed (and pasted?) into OSX ⌘-space search also calls out to the cloud to get suggestions. not cool.

This is explicitly called out when you start using Spotlight in OS X, and it links to an option which allows you to turn the feature off. This is a complaint about nothing.

I have never seen that dialogue. I started to use spotlight before there was internet feature to third party, and have upgraded machine since that.

I guarantee you that you did see the dialog the first time you pressed ⌘-Space after upgrading to a version of OS X that includes Siri Search. You probably just skipped past it without thinking and then forgot.

Yep, it's still there in that case.

You also get told before you are mugged; telling you about it has absolutely no bearing on whether the action that follows is acceptable.

Even if they tell you about it, it's still a bad idea. There's plenty of legalese that people don't read because life is too short. It would be better if this wasn't the default.

Okay but in this case the mugger (spotlight suggestions) can also be told that no, actually, I don't want to be mugged, and it won't mug you.

I know, no analogy is perfect. I am saying that getting mugged or having your keystrokes sent somewhere is inherently bad as default and should be opt-in, not opt-out.

I think the post is an overreaction. The plug in author clearly stated the consequences of using his plug in. The blogger clearly wants to make this a scandelous "expose" but it just isn't because there is no effort to deceive anyone of anything.

I also decided not to use the plug in a few weeks ago but was impressed that the author was open and transparent of its shortcomings. Labeling his efforts as "shocking" or "insane" is a tad over dramatic isn't it?

Besides all this is on github. Fork pull and push your alternative then post it on HN. Done!

It's not at all obvious to 95% or more of the plugin users that this is just an API for a web service. I wouldn't say it's intentionally deceptive or anything, but it's not an appropriate amount of notice.

I guess when one is considering to use a package they're supposed to read its README file. If they can't be bothered, well, that's their problem, even pages of notices won't help.

Some concepts have upsides and downsides such that notice may be required in order that the individual can make an informed decision and if they choose poorly it's on them.

Some ideas like this one are so bad that they shouldn't exist at all on the premise that the idea of spell checking requiring sending all your documents over http is so bad that anyone that runs it must perforce either not know or not understand because understanding would lead any reasonable party not to install it.

Therefore the addon exists only as a trap for the unwary or the stupid and makes the plugin ecosystem worse for existing and thus ought to be deleted.

It's a clear and concise notice that both explains the issue, and warns about the implications, in less than three sentences. To me, it's perfectly appropriate. If you're in the 5%, then you're done grokking the notice, and you can move on however you choose. If you're in the 95%, then you have all of the keywords with which to do your due diligence before proceeding.

A lot of people sign NDA's when they work as a programmer. Installing a spell-checker and accidentally break ones nda's seems as a valid surprise. That and passwords in wp-config and similar files being commented out occasionally make me actively avoid cloud solutions for spell checking.

> Installing a spell-checker and accidentally break ones nda's seems as a valid surprise.

No, that is a shocking lack of due diligence IMO.

Seriously, if we as developers can't be trusted not to install random junk without checking the consequences how do we expect the user-on-the-street to stop installing malware because they just have to take that "what Disney character's left testicle are you most likely find in your coffee tomorrow" quiz?!

I've done projects with NDA for the Android platform where using Android Studio today is almost mandatory.

After reading this post I remembered that Android Studio already has some kind of spell checking active by default and to be honest I didn't read the complete source code of Android Studio and all packages that are shipped with it by default, who does that?

Maybe you do, but who is going to pay you for completing that task (as a programmer) and reviewing every single update in the future.

There is a wide gulf of difference between installing an IDE from a trusted corporate entity who would be sued if they did this kind of thing by default without warning vs. installing a 3rd party open source plugin from a developer you've never heard of without reading the description.

>a trusted corporate entity who would be sued if they did this kind of thing by default without warning

Microsoft does key-logging by default in Windows 10. They don't hide the fact that they do key logging, but they don't advertise it to users either. And yes, it's key-logging even if they claim it's for "telemetry purposes only guys, for realsy".

There is a difference but it doesn't change anything for me as a contractor. I'd imagine suing Google or Apple would be no fun when I'm sued by some bank for a breach of NDA.

As a software dev we ideally should keep in mind for what kind of target audience we are developing our tools, and in this case it is obvious that this tool would be a problem for many if not most companies for security reasons.

Is there even a reason why anyone would want to have all his source code sent to some unknown third party? Or why this would be necessary for something like spell checking?

I doubt that the dev has bad intentions with this plugin, but imho this tool is badly designed and unusable. Doesn't matter how visible this information is, there are no justifications I can think of why I should allow my source code to be sent to a third party.

That is a lot of trust in a corporate entity who in all honesty trys to profit from you wherever possible, and they would not be sued if such items were spelled out in the TOS. In this plug in, the plug-in author placed a notice where it could be found by anyone, not hidden away. Here, just as dspillett said, do your own due diligence and select products accordingly.

At the same time,even large trusted corporations can impose interesting license agreements (runtime or otherwise) that you better be well versed on before you start creating releases of your product.

> but who is going to pay you for completing that task (as a programmer) and reviewing every single update in the future

If you can't factor it into your costs of doing business (that you pass on to the client) then you either have to factor it into your costs of doing business (that you have to eat) or decide to take the risk of not bothering.

The risk is your's to take should you chose of course (or if you don't work alone the risk is your company's to take) and depending on the 3rd party involved that risk might not be particularly high (as others have pointed out the risk profile of relying upon Google is vastly different to that of a small add-in developer hardly anyone has heard of), but if the worst happens and you end up in court you won't be able to just dismiss it as "well, how was I to know?".

If you leak NDA covered client information through your use of a tool or service and the client finds out, "but you'd have never paid me enough that I could afford to be more careful" is not going to be a defence that will get you very far, unless of course you have paperwork that states they were aware (perhaps you included the time in your quote but they asked for that bit of work not to be done due to the expense).

> If you can't factor it into your costs of doing business (that you pass on to the client) then you either have to factor it into your costs of doing business (that you have to eat) or decide to take the risk of not bothering.

Thanks, I know this myself. But there is a reality in which no one will accept you factoring in the costs of analysing every tool used.

You try to do this -> someone else gets the contract

You try to change the contract to cover for this -> someone else gets the contract

So you have no choice but to take the risk. Fine. But that doesn't make it a "valid surprise".

In ideal world it wouldn't matter: we'd have time to properly analyse everything we use and clients wouldn't mind paying to having things done properly. We don't live in an ideal world so someone somewhere needs to decide if the risk is worth taking. If you don't push that decision on to the client (because your competitors don't and you fear it will reduce your edge too much) then you have to make the choice and take responsibility for it.

But in this case the very top of the description was the warning that it is sending this data to a 3rd party to do the spell checking.

You don't need to rest he whole source code, but you should read the description...

My point is that this software is badly designed since almost no company would ever accept that the source code of their products will be sent to unknown third parties.

I mean the person that developed this plugin will probably also develop for some company and should know that.

So I can't see for which target audience this plugin is because almost everyone doing software development for a company would be excluded.

It's as if you designed a gun that will explode in your hands once you pull the trigger. What is the target audience here?

> You don't need to rest he whole source code, but you should read the description

Would you dare to test this in a court? I wouldn't.

I mean the person that developed this plugin will probably also develop for some company and should know that.

The author of this plugin works at Microsoft - he is the PM for Visual Studio Code. I have no further words to express my astonishment at this fact.

> who does that?

The person(s) who decided that "using Android Studio today is almost mandatory".

If the plugin author was really honest, he'd mention the fact that this spell checker is web based in the title or at the top of the description, where people actually read it.

Nobody would complain if this was called the "Web Based Spell Checker" or "After The Deadline Spell Checker".


From the article: "But then, at the top of the description, I found this message greeting me: [...]"

It's this way on the linked extension's page as well: https://marketplace.visualstudio.com/items?itemName=seanmcbr...

I think that's fairly prominent place and the only explanation for installing this extension could be either that it's acceptable - for example if one's using MSVSC to edit Wikipedia articles or something like that - or negligence to read anything but the title (which can't be helped).

(Well, author could've put "[INSECURE!!!! WILL STEAL YOUR CODEZ!!!!1one]" in the title... but should he?)

Apparently asking people to read the descriptions of plugins they are installing is too much effort

Look at the package file: https://github.com/Microsoft/vscode-spell-check/blob/master/...

The description, which is the only part visible in the search interface, contains no mention that this uses a web service.

See this comment how it should be done correctly: https://news.ycombinator.com/item?id=11805189

Interestingly it's just had a version bump which makes the description a lot more obvious:

"Uses a web service to detect mistakes and suggest fixes - great for Markdown or any text file."


> I think the post is an overreaction.

Thank you for writing this! By reading the article I had exactly the same feeling.

The author of the extension was open and clear, the author of the post need to keep this anger for secret but explicit surveillance programs, not optional online spellcheckers...

Yeah really, it just means now maybe people will have more options.

You can choose an easy simple not-private way of doing things, or you can do things in a private way.

Being up front about it is 100% the only way to manage this properly so what else can be done?

It's not like we should suddenly not make useful tools because they dont use end to end encryption on everything, not every action needs that level of protection, but some do.

> The plug in author clearly stated the consequences of using his plug in.

Only if you happen to read the readme, right? It doesn't pop up a warning when you install it (if it did I think that would be fine).

I don't think it's an overreaction at all. Simply building the extension with those shortcomings, regardless of the documentation, was a terrible, irresponsible idea.

Why a plug-in has to send everything through the web instead of just having a dictionary is beyond me.

Jesus, it doesn't. So go fork it and change that! It doesn't mean other people aren't free to design their plugin as they see fit.

Because it's more time and effort to make a service that checks spelling (are you accounting for stems? Possessives? Other uses of the single quote such as contractions?) and suggests corrections than to wrap an already-existing service that does the same. I find it highly plausible that the author of the plug-in wrote it to solve their needs and just made it available because hey, it doesn't cost them anything.

I want to believe, but https://news.ycombinator.com/item?id=11805189

Instant uninstall. This is exactly the kind of crap I expected when MS announced they'd invest in open source.

I've found local spellcheckers really unreliable. They never catch obscure or niche words, or grammar. Chrome has an option to ask google for spelling suggestions and it makes it much more reliable.

I agree, also considering that virtually every spell checker these days uses an online API rather than offline dictionary + semantic grammar rule checks you shouldn't use any of those for sensitive documents regardless of them using HTTPS or not.

Fixed - at least in the teacher upstream source.

> This peaked my interest - who were the people behind this service?

I just had to check, and am sad to report After the Deadline wouldn't have caught this error ("peaked" for "piqued") even if the author had still been using it.

More relevantly, am I wrong in thinking that After the Deadline does actually support HTTPS? An HTTPS request [1] seems to work fine. The article muses on this point a little, but maybe it is just the "teacher" module at fault after all?

[1] https://service.afterthedeadline.com/checkDocument?key=test-...

"Peaked" is such a common spelling of "piqued" that their dataset of English text probably contains many examples of it, and so doesn't recognize it as an error.

Could be, but even so that wouldn't fix the "throwing your IP into a black hole" problem.

I'll update the post, thanks!

Since you're here:

  > ... any text opened in Visual Studio Code
  > with this extension loaded would be send ...
I suspect that should be "sent".

Thanks - fixed!

So I spotted the same problem on 6th May 2016 and sent a PR (which was merged) to update the description shown in VS Code to "Detect mistakes as you type and suggest fixes using a web service"

PR https://github.com/Microsoft/vscode-spell-check/pull/30

But it seems this was reverted in a more recent commit.


This was an oversight by me in a recent update - however -I've resolved it and with any luck made the statement even more visible for the users/anyone who does an update.

They reverted it 'by mistake'.

This cloud thing has gotten really ridiculous lately. I was at the Maker Faire recently and this guy at one of the booths was pontificating on the virtues of their 3D printing platform which ran in the cloud. Finally he finished with, "I'd love to give you a demo but we've been having trouble with the wifi all day." If I was drinking milk it would have shot out my nose.

At least you weren't drinking the Kool-Aid.

The plugin author should definitely be blamed for this. But I think that the root problem is with the `teacher` npm package. An issue[0] was also opened just 6 days ago raising doubts about using http.

[0] https://github.com/vesln/teacher/issues/4

No, the root problem is using an online service for spellchecking when every other decent editor does it locally.

The Plugin Author is a Principal PM on Visual Studio..

That makes this even more shocking. But, after seeing Microsoft's recent stance on privacy in general with things like Windows 10, maybe this is just par for the course.

This is my thinking too. The use of HTTP is bad, but the real issue is trusting a third party with large amounts of potentially very sensitive information. Especially when the service in question was designed for people checking the spelling and grammar of their blog posts, something designed to be shared anyway.

Online spell checkers are better along many axis. For example they can spell check names in the news and slang that's likely not in an offline dictionary

Why can't you just add those names and slangs to the dictionary?

Because an online service can dedicate 10TB of disk space to spell checking, while your laptop with a 512GB SSD won't like to store >100MB for the spellchecker.

If you had 100Mb of data I strongly suspect it'd return too many false positives - the likelihood of a string being in there would be too high. A spellchecker isn't very useful if it knows "Donald Trumo" is a real name.

For reference, the hunspell dictionary file used in apps like Libre Office and Firefox is about 400kb. That's effectively the whole of the English language.

Look up "bloom filter" and the computer science of spellcheck.

You are off by a factor of 1000 or more for cost of spellcheck

I think "the cloud" is making people forget how simple some common features actually are. The default assumption seems to be that it needs to be in the cloud because it's too hard to do locally.

Besides the good points made by other commenters, I'll raise you another one: want to trade space? How about cutting out some of that fat modern software comes with? Change the configuration language from XML to JSON and you'll suddenly have space for 10 spellcheckers. Cut out that pointless high-resolution banner and you'll have space for 10 more.

> Change the configuration language from XML to JSON and you'll suddenly have space for 10 spellcheckers.

Change it from JSON to YAML or canonical s-expressions and you'll have room for even more …

> spell check names in the news and slang

Very useful features for writing code.

You'll probably have more luck raising issues on an "official" Microsoft repo: https://github.com/Microsoft/vscode-spell-check/issues/33

There's a PM at Microsoft called Sean McBreen who works with Visual Studio. Probably the same guy. He's posted his email address publicly before - smcbreen@microsoft.com (i.e. here https://github.com/Armitxes/VSCode_SQF/issues/2 )

Thanks Tim - I had just started trying to track Sean down, so this will help! Will update the post if I hear anything back from him.

What does it take to get listed in the Visual Studio extensions dialog? If there is a review process, it should probably include a requirement that "transmitting your code across the wire" requires explicit consent each time, or something similar.

The plugin is managed by a Microsoft employee, but unlike other MS plugins, it has the author name instead of "Microsoft", which makes it even more suspicious.

The description was changed to reflect that it uses a web service at one point, but the change was reverted.


So no, the post is not overdramatic at all.

I work for a large organization and make and manage personal programs all the time with absolutely no relationship to or input from my parent organization. Occasionally I am allowed put them on the Org's Github if they're useful and relevant to the mission. It's a cool little perk because it can give personal projects more visibility.

I'm not saying it's not suspicious (I don't think it is though), but I don't think that a Microsoft employee creating their own plugin, even under Microsoft's Github, makes it suspicious.

And what about the secretly reverted plugin description? Must have been a weird accident.

If you are on OS X, buy and run Little Snitch. The connections for outbound requests are quite amazing.

It gets really annoying really fast though.

I was also annoyed to tears by it. That's why I built an alternative called Radio Silence (https://radiosilenceapp.com).

Radio Silence doesn't use popups or alerts. It's completely passive and invisible when you don't have the app's UI open. If you want to monitor connections, you can open the app and take a look at the monitor tab.

Good opportunity to market you app you got there, hehe. Anyway, I thought the thing about little snitch was that even before the connection was made for the first time you were able to cancel it, you would miss out on that with your app.

Can you also disallow certain requests with your app? It seems to be app-wide only, but if I wanted to block e.g. chrome (or, all app) from specifically connecting to "adWare.com", would that be possible?

Nope, Radio Silence only blocks whole apps/processes. Little Snitch is probably still the best choice for fine-tuning and reactive blocking.

Apropos of anything else, I definitely appreciate your ability to acknowledge and recognize a competing/separate product as better suited than your own for certain things.

I agree. Run Charles Proxy on their free trial for 15 minutes with the OS X proxy turned on and find out what your Mac is really sending out and taking in. Then buy it because it's an incredible tool that proves extremely useful when debugging your own work!

Technically I think they are different beasts. Charles proxy is for, as you say, inspecting and debugging. Little snitch is for making a white/blacklist of connections.

That being said I 100% agree that Charles is a fantastic application.

Getting offtopic here - if I wanted egress filtering at the router level, what could I add to my network that wouldn't force LAN traffic through the same port? OpenWRT isn't an option on my router because the 802.11AC radios aren't (and probably will never be) supported.

Happy to add another {mips32,armv7} box to my network, though.

I'd love to help you but this is not my area of expertise. (Just responding as you replied to me, hopefully someone else can chime in).

You could buy a different router. OpenWRT does support the ac chipset in my TP-Link Archer C7 and many other routers.

Charles Proxy and Little Snitch offer insights into what is going on.

The both happen to have different use cases.

Charles is my go to for tracing what is going on when I need to snoop SSL traffic.

A decent free alternative to Charles is https://mitmproxy.org/

I'll give a reason why I would have chosen this route by way of example. A Hololens hackathon I went to had us using Unity and C sharp. Not knowing C sharp I still wanted to participate so I learned enough of it to be able to post to a python script in AWS where I could actually do some work.

Could I have done that locally? Sure but I didn't know the language. I just wanted to get something done and it did the job pretty well.

Maybe that applies in this situation? I don't know. Course what I was doing was for a throwaway project that I wasn't planning on releasing to the world.

We could really use a decentralized spell checking service. I'm thinking a blockchain, maybe Ethereum.

Edit: Poe's law.

Spell checking is already "decentralized" and has been for the last three decades or so, in the form of a local dictionary file on each machine you want to run spellcheck on.

I wonder what would be a good storage structure for the dictionary? Oh.. a dictionary.

Probably a trie, although tries are sometimes implemented with dictionaries.

Please make a compelling case as to why a distributed solution is warranted for spell checking?

I don't actually think it should be, but I can make a compelling case.

While it's possible to have a dictionary of all of the words of a language, it will always be missing the proper nouns. It will also be missing the data needed to take my very poor spelling and figure out what I'm trying to say.

Combining these two problems, I've found that oftentimes my local spell checker can't figure out what I'm trying to say, but Google can figure it out no problem.

I don't mind having to go to Google once in a while to do this manually, but that can is the case for a distributed spell check service.

Doesn't Google use a large corpus of documents along with machine learning/statistical analysis to do spelling and grammar checking? The same way they do language translation.

You are confusing index building with index lookup.

Yup - this is my fundamental problem. HTTP or HTTPS isn't the issue - sending everything I open in my editor to a third party is the problem.

Hadoop cluster?

Well.... Don't know if you already know about it... but After the Deadline was written by Raphael Mudge, the creator of Armitage and Cobalt Strike. See the 1st video on this site http://www.hick.org/~raffi/afterthedeadline.html Also if you go to 02:40 he says that Hacker News is his favorite site. Inception...

I kind of wonder why no one just forks the (Libre/Open)Office spell-checker and design a plugin around that. Doesn't that work totally fine offline?

I have finished a few projects as an external contractor for companies in the financial sector and many would be surprised how paranoid these companies are about the source code.

Aside from the (sometimes insane) checks they have in place I've also had to sign a NDA.

I doubt that the creator of this plugin had bad intentions, but using this plugin could cause some programmers to be dragged in front of a court.

I believe the warning should be more visible.

I guess a spell checker once again is a major feat of software engineering: http://prog21.dadgum.com/29.html

Well this seems to be an exaggeration. Does this plugin also send anything for non-text based files at all?


Wait a minute. Microsoft did something to deceive the users of its "free" Visual Studio? Why this does not surprise me...

I can understand why people used Visual Studio 20 years ago - decent free alternatives were lacking. But in 2016? And no less than for markdown!

What alternatives are there? Just recently there was a HN story about a great add-on for Emacs that would find the definition of a symbol. That's been in VS for a decade or two but open source editors apparently don't have it. Same goes for renaming things and having them automatically renamed everywhere else. Background compiler that underlines and even corrects your errors as you type - what else does that?

> Just recently there was a HN story about a great add-on for Emacs that would find the definition of a symbol. That's been in VS for a decade or two but open source editors apparently don't have it.

ctags has existed since 1979, which is almost 40 years ago. The thing you read about implements a different way to find definitions.

The IntelliJ IDEA Open Source edition?

1. It is a third party plugin. 2. VSCode is OSS 3. VSCode != VisualStudio. Conclusion: You clearly dislike Microsoft and hence confirmation bias has lead you to post an inflammatory comment without even reading the article.

Have you read the comments? Who is the author of that "3-rd party" plugin? No connections with Microsoft?

It would help inform your discussion of the article if you read the first paragraph. A) This is VS Code, not Visual Studio, B) This is a third party open source plugin

It’s not third party – the dev is a microsoft employee, managing the VS Code team.

Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact