Google Docs Is Randomly Flagging Files for Violating Its Terms of Service (vice.com)
79 points by gridscomputing on Oct 31, 2017 | 57 comments

It was nice when everything was saved in your PC .. unless you forgot to back things up.

I think services like Dropbox/Mega are the happy medium. All the files are actually on your computer. If they go away, you don't lose your files, they just stop syncing. When you put all your crap "in the cloud" (rolls eyes), it means the services you like may stop working, or they could release a new crappy version with no ability to revert to the previous one.

Make sure you have separate backups too. If Dropbox deletes a file from your account, the client will delete the file from your computer!

One of the reasons I switched out of Dropbox was because they got rid of the version history feature, which I only learned about after desperately trying to recover an old file. I think now they give you 120 days of version history, but only if you pay for the newly created “professional” tier.

I have a little Nextcloud server set up with unlimited file history now. Or at least it’s only limited by how many old hard drives I have plugged into it. Couldn’t be happier with my decision to switch.

Supposedly the free accounts still have 30 days: https://www.dropbox.com/help/security/version-history-overvi...

Nevertheless, definitely agree about version control being a killer feature. Might need to go down your route at some point.

Onedrive has a "Recycle Bin". Worked well when I needed to restore some accidentally deleted photos. But their paid tier is horribly priced. Ugh.

I use an encrypted container in Dropbox. I retain complete control over my files but they are still backed up, and Dropbox is smart enough to only upload the diff if something in the container changes. It's a little inconvenient in that I can't for example access individual files on my phone without a lot of hassle but overall it's an adequate solution for my needs.

A delta of changes in an encrypted container should be very close to the size of the full container, shouldn't?

It seems this will only be feasible when we have homorphic encryption ready.

If someone is interested on that kind of encryption, see "A FULLY HOMOMORPHIC ENCRYPTION SCHEME"[1] and "Fully Homomorphic Encryption without Bootstrapping "[2]

[1] https://crypto.stanford.edu/craig/craig-thesis.pdf

[2] https://eprint.iacr.org/2011/277.pdf

Well the whole hard drive is not overwritten every time a file changes when using disk encryption, so definitely not necessary.

Any encrypted disk image will work this way. It doesn’t require anything like homomorphic encryption, just a bit of metadata and key management.

No. AES encrypts blocks (I think 256kb?). So that's the only thing you have to sync.

If you use plain AES ECB, then little penguins will show up in your encrypted data. You would want to use a disk encryption scheme, like XTS, where the encryption is not just based on the data of the block, but also on the index of the block, to prevent identical blocks at different locations from looking identical. You also want it to be based on a nonce of some sort, to prevent attackers from reverting a block to an older copy.

Penguins? What are you talking about?

I don't care about low-level encryption details. That's why I use encryption software instead of coding my own.

You should absolutely care about the difference between ECB and a stream cipher mode like CBC. ECB is way less safe.

That would depend on the cipher mode, no? Assuming the container is using something like CTR, the diff should only consist of changed bytes.

May I ask which container? Something like VeraCrypt or?

I have thought about doing this also. How about the sync'ing? Does it sync only parts of the container, when you change something or does it have to sync everything, when you make a small change?

I'm not the original commenter, but I just started using cryfs in Dropbox and it's working very well: https://github.com/cryfs/cryfs

KDE plasma has just started incorporating this into their desktop as a product called 'Vaults'.

I have been using Software called BoxCryptor for many years. Quite convenient.

Containers lime VeraCrypt would be quite inconvenient, but work if you don’t make changes on many computers at the same time.

Dropbox only syncs the difference of the encrypted files. Truecrypt uses by default AES which encrypts blocks of size 256kb. It's almost as efficient as syncing without encryption.

I am very happy with cryptomator.org

It's open source.

Oblig. Plug for cryfs.

Sorta off-topic, but I have had nothing but positive experiences with Mega. They have an enormous 50GB free tier, but I've been thinking of upgrading anyway and just using it as a full backup for my media hard drive.

I've read that Kim Dotcom as no involvement with the site anymore and it's been taken over by a combination of the NZ Gov't and Hollywood. He says he doesn't trust it. Whether he's just fuming or it's a legit concern of trust is up to you to decide.

Figures it'd be too good to be true. Got any links for that? It really depends on what "trust" means in this case.

Yup. "Nice files you have there. It would be a shame if something were to...happen...to them, right? Oh, and btw, we have another, more expensive, cloud tier, much safer, sure."

Agreed. I like Dropbox Paper a lot too.

Just a solid all around company.

I keep a virtual server for precisely this reason. All my docs sit in a git repo and sync to the server. It even works for binary stuff with git-annex! I know this kind of thing isn't for everyone, but I love it.

I made a Google Docs alternative where not only the documents are private, but you also have your own private installation of the app, even though it's hosted in the cloud: https://www.airbornos.com. If you enable "Notify me before updating Airborn OS" when registering, Airborn OS won't update for you unless you want it to. Theoretically, since you have your own installation, you could even modify the code you're running - but there's no UI for that yet.

I don't know if it's fair to say that they are "randomly" flagging items - I'm guessing they're using a similar set of tools that they use to algorithmically determine if, say, Youtube videos violate their TOS.

Probably a recent update / bug increased the number of false positives more than expected.

Not that I'm sympathizing with Google - just seeing if there's a less click-baity reason for what's going on.

I think “randomly” is being used here in the sense of “without a good reason.” Regardless of how the content gets classified, they shouldn’t be policing private files.

The anecdata from Twitter were stories about shared docs getting flagged as a ToS violation. Did anyone see the false ToS violation on a totally private, unshared file?

Where do you get that from? I randomly checked a few of the tweets and didn't see anything about shared. Nothing about private, either, but that seems like a reasonable default assumption.

Yup. It's a big leap from "Google has a bug" to "you don't control any of your content on the Internet."

Well, even a PRNG is deterministic at some level. I think it’s a fine descriptor given the context.

Scary stuff and can only hurt Google's move to become your go-to office suite. Imagine your PC decided which files it did and didn't like and removed them without your permission? Sounds like the reality now if you own a Chromebook.

I wouldn't be surprised if it ever started happening with Windows/Office, they already have tools that scan for unlicensed media.

Microsoft has an extremely long history of doing business with companies from small to very large enterprises in size. Google as a productivity software company that deals with business clients, is still using training wheels by comparison. Why does that matter? Microsoft is less likely to make mistakes like this around any of their productivity tools businesses (including storage).

Besides that, Google is currently on a fanatical binge about censoring content they disagree with. Microsoft is not, which vastly reduces the likelihood of such a mistake in the first place.

Article provides a good chunk of insight beyond the original tweet currently with less discussion on the front page:

Draft of a story about wildlife crime was frozen for violating Google Docs' TOS | https://news.ycombinator.com/item?id=15593750

When will they start flagging Gmail accounts and locking you out from Android phones?

That happens to me often. When my home ip changes and k9 mail connects for the first time to gmail, they always lock my account out until I log in from a desktop they recognize. Freaky freaky.

I actually like it when services that host my data have some protection against unrecognized devices accessing it.

I disagree. It should treat unrecognized devices as any device. In other words, it shouldn't remember devices. Because you know, privacy.

The only thing I ask these services is that they won't let anyone in who doesn't have the right password. I think it's not too much to ask.

In terms of user security, that's just not a good idea. Google has likely prevented an absurd number of account compromises (and therefore identity theft, fraud, personal information leakage, espionage...) by recognizing logins from new devices and unfamiliar locations. Google's user account security practices are pretty much the best in the business.

It's silly to think Google doesn't already know everything about every device you log in from, so that horse is already out of the barn and running on the highway privacy-wise. They might as well use that information to actually protect their users since they're already using it for advertising.

I'm sure that Google's decision has improved the account security of the average user, but I'd really like it if there were some way I could signal them that I'm not an average user. My password likely has more entropy than the hash they check it against; if that gets compromised, the attacker also has access to any other information Google would use to identify me. Which is a joke anyway, since "which city do you usually log in from" is hard to answer when you've been using a VPN for more than a year. I dread the day when they make 2FA mandatory and my account security becomes vulnerable to a social-engineering attack hijacking my phone number.

I thought I had a way around that, but no.

You CAN add a phone number, then ask you use FreeOTP token, then delete the phone number. Great, right?

No. Because if you click that "I forgot my password / don't have access to my 2FA" button, they do let you use your phone number to identify yourself, even though you've deleted your number from your google account.

Fuck these people.

It'd be great if it worked right.

This is something you'd expect in China, not the US. It raises real questions about whether Chromebooks are viable for business.

All your data are belong to us.

When stuff like this happens at Google it's tempting to think that it is in fact random, and Google uses complaints as feedback for their derp-learning algorithms in order to become better at finding content that is truly violating their terms of service.

> Google ... derp-learning algorithms

Probably a typo, but I laughed pretty hard. You're not wrong.

Regardless of the bug, we need a service that cannot read your docs/email/data.

GSuite has to be the most paradoxical Google product - you pay Google and "trust" them to not go through your company data!

As opposed to Office 365?

Office 365 reads your data. They just don't advertise against it because their ad business isn't significant

Why would their ad strategy, or lack thereof, prevent them from enforcing TOS in their document tools as Google apparently does?

Were GSuite customers affected?

This is what happens when you tolerate walled gardens.

This means Google is scanning your stuff one way or another. How much would one like to bet that they're surreptitiously stealing secrets to use to make money?

"Google Drive Terms of Service [google.com]

We may review your conduct and content in Google Drive for compliance with the Terms and our Program Policies.

When you upload, submit, store, send or receive content to or through Google Drive, you give Google a worldwide license to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes we make so that your content works better with our services), communicate, publish, publicly perform, publicly display and distribute such content. The rights you grant in this license are for the limited purpose of operating, promoting, and improving our services, and to develop new ones. This license continues even if you stop using our services unless you delete your content. Make sure you have the necessary rights to grant us this license for any content that you submit to Google Drive."

Oh, look, right there it says they can use your stuff for pretty much anything. Funny how many of the people having their file access suspended are people in the middle of very large important projects or journalism reports.

Oh, would you look at that! One of my accounts which is used for my energy efficiency research is suspended. And my LED lighting. That's pretty suspicious.

