I have had a Google sheet I created years ago and edit periodically that has also been flagged but not removed. It has a number of URLs. I suppose it's possible at least one of the URLs is no longer legitimate.
There is a yellow banner "This file looks suspicious.
It might be used to steal your personal information" and an option to "Request a review".
It also says "This file can still be viewed, edited, and shared, but users will see a warning that alerts them that the content may be harmful. These restrictions were put in place because this content violates Google Drive's Phishing policy."
There is no indication of why the file was flagged.
Despite my concerns about a human looking at my personal file, I bit the bullet and clicked review several times but the banner remains after several months. Google support hasn't been helpful even though I'm a paid user.
I have now received 5 emails from "Google Drive Safety" notifying me of this alleged violation.
screw this technocratic neofeudalist garbage. ive been encrypting my entire google drive for a while now for exactly this kind of overreach in the past.
More people need to be as angry as you are about this kind of shit. If people keep viewing these issues as one-offs, we'll soon find ourselves "owning nothing, and being miserable."
It would likely be called a "new technological freedom" from hackers and scams etc. Like how the marxists like to call marxist monetary policy a "new economic freedom".
This is a notification that your requested “being miserable” service will be billed to your government tokens account on a recurring basis at a rate of $23,465 per month.
Thank you for your purchase. Sincerely, your caring daddy, government.
The only people who are being miserable are the ones complaining about this shit. Rest of the world is happily leading their lives or have other real world problems that they have to deal with and not worry about some elite, upper-class problems
I can care about politicians trying to legislate me out of existence and about the privacy and integrity of my data at the same time. In fact, the two concerns are tightly interwoven.
Perhaps the problem is that the privacy conscious have already deplatformed & do their own syncing to private cloud, whereas the others are still using the platform & don't care as much.
I'm in the first group, but have become increasingly weary of keeping servers online (I self host everything) & I'm see a lot of promising results in Crytomator... but there's one inevitable gotcha: those very platforms can cut you off at some point in the future. So, you'll need to multi sync across multiple platforms for redundancy.
Yes. The issue is that security and privacy are tragedies of the commons.
Individuals rarely have the time or skill to improve the situation, and it's almost never rational for individuals to do so — the benefit to them is probabilistic and low-probability, and the cost in hours of time is high.
But the benefit to society is great of doing better on security and privacy.
This is why governments should regulate security and privacy. The whole point of governments is to solve collective problems together.
Just got a Synology 4-bay NAS (their SHR hybrid raid is pretty neat) to keep a full backup of everything I've ever put in cloud storage for this exact reason.
Plus side is plugging in hard drives whisked me back to building computers as a kid.
I am using a synology ds418 (current model is ds420, probably releasing the DS422 soon) and it has been an absolute joy for the last four years, of all my technology devices, it requires the least care and feeding, it is the closest thing to a technolgical toaster I own. It backs up to aws glacier, but really nice having all my stuff on a 2 x 1gigabit interface here in the house.
There is a whole ecosystem of plugins and stuff, but out of the box it works fantastic.
420+ with 4 8TB IronWolf drives providing just under 24TB of usable space (with SHR-1 which is single drive failure). Way more than I'll ever need personally. Also desperately trying to pretend I didn't read the comment below about a 422+ (last 2 digits are the release year).
I turned my old desktop into a NAS drive with 3 8tb drives shucked from a discount bin at Best Buy a few years ago. Total cost (out the door) was about $400 for the drives and the time to set it all up, and it has way more storage and convenience than I think I can use in a long time.
You mean "just" require everybody else to read that article and follow the setup instructions with shared keys _and_ set up a different sharing folder and shared key for each distinct set of people you share with.
If I were you I'd back the whole of your Google account up ASAP. It wouldn't be unheard of for Google to suspend your entire account for a violation on a single product.
I suppose you're right that it doesn't explicitly say a human is doing the review.
It says:
Review process
If you think this is an error, or if you've modified the file to comply with Google Drive's Terms of Service, you can request a review.
1. Your file will be reviewed
This file will remain restricted during the review.
2. A decision will be made
If the file is found to be safe, all restrictions will be removed and you’ll be able to use and share it with others. If it's found to be unsafe, the restrictions will remain in place.
I find this kind of language positively disgusting:
> Your file will be reviewed
> A decision will be made
> If the file is found to be...
> ... restrictions will be removed ...
It is the archetypical "mistakes were made".
Note the constant use of passive voice, intended to hide the actor, and to keep you from even thinking about who is doing these things and thus who is responsible.
By the way, no better is the most likely alternative "Google will review...". I'll leave deducing the reasons why as an exercise for the reader.
Google has created the idea that "an algorithm is responsible" for something. This is never the case. An algorithm is programed by human beings, therefore the humans who created and approved the algorithm are responsible for its decisions. It is just a form of hiding their intent under the disguise of an algorithmic intelligence. I'm pretty sure that if something is bad for Google, it will not be approved despite what "the algorithm" thinks.
Or someone who understands that it's not magic, and there's still humans choosing the dataset, labelling the data and picking the loss function. This is exactly the kind of gaslighting GP was talking about.
Just because there are more edge cases where the human has no idea what their algorithm does, doesn't absolve them of responsibility. You don't get to go free after running a bunch of pedestrians over by claiming you were too drunk to know where the road was or which direction your car would turn when you moved the wheel. If I put a running metal lathe in a kindergarten, I don't get to throw my hands up and say 'you clearly don't understand machining' when some children get dismembered.
Because if it's measured in watt hours, this computational process is almost certainly drastically cheaper than using humans to do the same work, especially if you account for 20+ years of unprofitable training time per compute unit and 76% weekly downtime even at peak productivity.
I think parent comment mean "a longer automated check than the one that flagged the content". I think he meant no human would make the call, but another routine.
but ultimately, the "executives" making the decisions of what software to develop, which bugs to hunt, and so on, are all basing their decisions on data, reports, and many sorts of 'metrics'.
Additionally, I suppose in a company as big as Google, all this decision making is backed by written communications.
All this written stuff is subject to legal scrutiny. All must abide by 'approved' rules, guidelines, etc..
Squinting a little bit (maybe some may need to squint harder) it's clear that in the end everything is being done according to a computational (written) process.
this is essentially what 'skynet' (from the terminator) or the matrix actually means. if every human involved is following data and written guidelines, the decisions are being made by a computer program running across many hundreds of humans pushing "paper" around.
Far more scary is that Google already has automated systems to classify content and probably very closely works with government to keep a close eye on citizen.
> It has a number of URLs. I suppose it's possible at least one of the URLs is no longer legitimate.
> "These restrictions were put in place because this content violates Google Drive's Phishing policy."
> There is no indication of why the file was flagged.
Isn't your answer in the warning (albeit lacking some specificity)? If you have a bunch of URLs in there and admit not knowing about their legitimacy, it seems reasonable that at least one of them matches a database of known phishing URLs. Isn't it also reasonable to expect (and in many cases want) a company hosting publicly-shared files to notify users when content matches some heuristics for unsafe content, especially given the prevalence of phishing attempts?
That said, there may still be a case to argue that the review process, heuristic, lack of transparency, and other implementation details are flawed. But I don't have a problem with them posting a warning message on content that looks suspicious.
Complete removal, on the other hand (as in the case of OP) is another story, especially given Google's laughable process (or lack thereof) for appealing such cases.
You're right - a URL is either a technically valid URL or it isn't, but in this context, we're clearly talking about links to phishing sites (or not).
As someone who operates a service that allows people to create and share lists of links on a public site, I can tell you from experience that it's not as innocuous as you might think. Scammers routinely use a trusted domain to host a link to their malicious final destination since the initial, trusted domain in often the one given the most scrutiny (e.g., from an email). It sounds silly to more technical users who understand how the web works, but unfortunately it's effective and super popular in phishing campaigns.
To your point, yes - a list of phishing URLs would be useful in a lot of cases, but it's difficult for automated tools to tell the difference between those legitimate use cases and the much more common cases used for phishing, so they err on the side of caution. As mentioned, the human review / appeal process surely has room for improvement.
It's a shame that Google are choosing to focus on policing what people think of as their private data.
Meanwhile, they seem to be taking their eye off the ball when it comes to the spam filtering that many of their users might wish for. In just the past week, I've received noticeable amounts of spam via: Gmail, Google Drive, Google Calendar, and Google Photos.
I'm puzzled about how Google are choosing to allocate their resources. It doesn't seem likely that governments would be asking Google to specifically police spreadsheets for possible phishing data. The owners of the files definitely aren't asking for their access to their own data to be cut off. So what are the origins of this effort?
I have to admit, I originally thought that Google Drive was the obvious choice over every alternative that existed, but I can see now that I would prefer an offline or privacy-driven alternative. The risk of losing files to a Google Drive machine learning black hole & then facing Google's customer service black hole might be small, but it's also nightmarish.
The spam that's allowed to be sent through Google Docs @mentions has been horrendous lately. What's even worse is they only let you ban the individuals. You can't turn off the notification email messages.
Thankfully I'm about ready to close down my Gmail account (still stuck needing a Google account for now) but it's a good reminder of what I won't be missing when I finally get out.
The risk of facing the "black hole" may be larger then you think. There's a person who posts a common "recovery how-to" in the support forums when people get locked out and that page got 83,878 views last month.
I've never encountered this @mentions Spam, are they using it as some mechanism for verifying a Gmail account?
I received a notification from Gmail the other day that "Someone added <made-up-email-address@my domain> as their recovery email.". And my only option was to disconnect the email. By disconnecting the email, I would be confirming to the account owner that the email landed in someone's inbox. By not confirming the email, I would be allowing my domain to be associated with a Gmail account potentially used for nefarious activity.
Given Google's scorched earth approach to deactivating associated Google Accounts when it comes to things like the Play Store, I felt I needed to disconnect the account.
Just for sending plain old spam because they know these notifications bypass a lot of the filtering. Removed the names in case they've come from hacked accounts.
Google Drive
> Random Name (randomcharactershere@gmail.com) has invited you to view the following document:
Private__file-Nyela-(random characters)
> If you don't want to receive files from this person, block the sender from Drive
The painful part is the last bit, you can only block that single sender, you can't turn off all notifications.
Google Docs
> Random Name (randomcharactershere@gmail.com) mentioned you in a comment in the following document
Direct_message_with_Salma
> I have to admit, I originally thought that Google Drive was the
obvious choice over every alternative that existed, but I can see
now that I would prefer an offline or privacy-driven alternative.
I'm not being miserable or spiteful when I say I'm really glad this
happens. Mono-cultures are dangerous and undesirable, and whether the
cause is "network effects" or more active monopoly tactics it's good
that there's some drivers towards alternatives.
Ultimately, when we have interoperability (by legislation if
necessary), people will have the genuine choice to share documents via
Google, where their work may be censored or arbitrarily deleted, or
through another service. It will not matter whether others in the
group use AcmeDocs or GoogleDocs, we will all be able to share and
edit via interoperable protocols. But when Google screw up like this,
they will lose customers who have no barrier to migrate to another
Docs client.
That will raise software quality. It would be a real marketplace.
> It doesn't seem likely that governments would be asking Google to specifically police spreadsheets for possible phishing data.
I would think any first class modern day regime would be on the lookout for anyone who might have it in their mind to challenge that authority, and election related keywords could certainly fall into that filter depending on how coarse it is. I wouldn't expect they would just blatantly take down documents though, so I suspect I am being excessively conspiratorial here.
I suspect is a security "hysteria", everyone wants to market their shit as secure, one a website we were allowing users to host their pdf files, some asholes are making some missleading pdfs pretending they are some company and they put in the pdf a link to some bad stuff.
One report and the entire website is blocked, we clean the website immediately but already all the shitty antivirus and security software is blocking our domain, it took us weeks to have those bastards fix the issue, one of the big security companies even had the form you need to use to submit reports broken for weeks... So unfortunately if a "bad" link appears on your website it will costs you a lot of time to get it all back to normal, my advice , for user generated content just use a different domain so if shit happens your main domain does not get blocked by the browser makers and antivirus companies.
Google wasresponding to groups that complain that the spam filter was heavy. Now some spam is getting through, people start to complain about that which will make the spam filter heavy again. Not sure what the right answer is.. fastmail has seemingly figured it out.
Removing files in google drive will kill drive. How can you trust them?
The spam that is getting through seems like really obvious spam. Not just for a human, but for a classifier, too.
I looked at an "out of the blue" example just now:
1. Sent from a Gmail address where the name doesn't come close to matching the name suggested by the sender address.
2. Email body is in Latvian, a language I have no association with.
3. About 20 other recipients, none associated with me.
4. Subject line is nonsensical (not even words) and an email address.
5. Email body is one line.
6. PDF attachment, with no mention of it in the email body.
7. Looks automated, but came from a Gmail address.
Of course, none of these in isolation is a definitive indicator of the email being spam, but given that there's at least 7 anomalies, considering the amount of data Google has & that they pride themselves on machine learning, shouldn't they be catching something like this?
Arriving in the inbox is one thing, but the thing really throwing me is they're getting the "important according to Google magic" flag applied. How can mail that fails all the tests you've mentioned (I just ran through them on some spam I've been getting) get such a high importance rating?
tbf, the Latvian bit probably counts in its favour because I doubt many spammers bother with Latvian (or that it's particularly unusual for Latvian speakers to have English Gmail accounts).
Still scratching my head at how their calibration can let something like that through and stick a random genuine circular from Twitter in my spambox though...
I think no matter what google does here they won't win - if they increase the sensitivity of spam filters, people will complain (rightly) that legitimate content is getting blocked. If they lessen the sensitivity people will complain (rightly) about too much spam.
I taught data journalism at the graduate school level (not to the OP) and I regularly used Google Sheets as a channel for hosting data, which were typically U.S. public datasets, like city crime reports, Census demographics, election results, etc. This is in addition to hosting CSV and .sqlite files on the class homepage and Github repo.
The reason why Google Sheets was so handy was because it's a great interface for data exploration. I could show students features about the data, and sort/filter/highlight, without having to distribute/create spreadsheet files. I guess if Google Sheets ends up being draconian with its content filters, I could link to Github-hosted CSVs, but it's not quite the same.
Haven't checked my classroom Google sheets recently, but if they're untouched, it might have helped that they were all set to public view. Maybe the flagging algorithm looks at private sheets with much more suspicion.
For sure no. If you are storing a trove of copyrighted movies, revenge porn, or child porn then you would for sure not have it marked for public access. Google clearly doesn’t want their storage to be used for these nefarious purposes but anyone who has tried to train a ML algorithm knows, false positives is the name of the game so they should really only be used in places where false positives do little to no harm.
Have you tried using anything non-Google recently? Everything else is much, much worse IMO. Remember we all mostly moved to Google because they were the only ones who could actually deal with spam, gaming, and the more sophisticated attacks at scale. Now that they're the only game in town, adversaries are focusing hard on getting past their filters etc, and some are getting through.
When you have literally millions of people whose livelihood depends on gaming algorithms, or getting past content filters, then some highly intelligent and motivated individuals are going to get through, that's a fact of life.
When someone figures out how to train their robots better than Google, then Google will have some competition, but they seem to be about a decade ahead of everyone else, and only accelerating relative to the rest of the field, so I'm doubtful that anyone is going to catch up any time soon.
In terms of spam using website specific email addresses has solved it for me (eg. hackernews@mydomain.example) No spam filter required.
Of course Google and Microsoft both punish me by sending my mail to spam (even years later) for using my own domain but that's what I'd expect from the big players.
I've done the lot. SPF, not blacklisted, DMARC signed and set to reject.
From what I can tell it's just that small domains are guilty until proven innocent. However it seems at lower volumes you can never send enough email to get past the thresholds they've set so you're forever guilty in their eyes. Very convenient.
I've also signed up to both Microsoft and Google's email tools and neither shows any information because I don't send enough mail while they still keep making my mail as spam. It's getting to the point where I'll probably have to use a 3rd party SMTP rely as the family don't appreciate their mail going to spam while all their friends on GMail go straight to the Inbox.
A nice new addition is Microsoft is flagging any links I include in my message as "suspicious" and giving the users scary warnings. Meanwhile I can see they still take the opportunity to scan and fetch the URL and yet still can't determine it's safe.
Oh yes the scan and fetch on Outlook is magic, especially when you are sent links that can be access once before self destruction. So fun. It was at precious job, forgot if I was able to deactivate it or just had to use another mail.
I think to switch to my own domain mail adress but I think to use Migadu [1] instead or even try to host anything.
Interestingly when I logged into Microsoft's Smart Network Data Services I could also see my VPS provider had claimed the IP block as well so they were clearly invested in making sure spam was under control. But nope, still not enough for Microsoft. At one point the mail was even getting hard 550 refused. I contacted their support who to their credit replied, but to less of their credit gaslit me and claimed there was no blocking occurring despite me pasting them full error message with all their tracking IDs in it. The next day mail was allowed in again but still ends up in spam.
> Of course Google and Microsoft both punish me by sending my mail to spam (even years later) for using my own domain but that's what I'd expect from the big players.
If only they would send everything to spam. Microsoft outright blocks all mail from my VPS because some other IP on the block allegedly sent spam. Of course since I'm not running a commercial business I can just tell people to use a better mail provider if they want to create an account - not worth my time to deal with shitty megacorps.
> Remember we all mostly moved to Google because they were the only ones who could actually deal with spam, gaming, and the more sophisticated attacks at scale.
No, we moved to google because they provided three orders of magnitude more storage than other free mail providers. Also, the interface was slick and fast.
Spam is an overrated problem - it doesn't take much time at all look over a dozen or so mail subjects a day to determinine if they are legitimate. Scammers are a problem for the more vulnerable but Google is never going to block all of their own kind either.
> Have you tried using anything non-Google recently?
Yes. I went from self-hosted to Proton recently. Spam filtering is better, UI is better, and customer support / transparency is way better than anything I've seen from Google in the past decade. Of course the various outages / issues over the past three days have been frustrating so there's that.
"Sadly it won't even be the next generation as they're all rusted into the ecosystem with Chromebooks provided by their school districts. "
This is so true that it hurts. I went back to college later in my life starting in 2018. It's astonishing to me how ingrained Google services are to the kids I'm in school with. Even though we have Office 365 and a full suite of apps available to us for free, when we do group work the de-facto decision that students come to is to create Google drive shares, or shared Google docs that are tied to personal accounts. It gets so stupidly messy and the apps just seem so inferior. Everyone uses the same Slides template so everything looks the same. Ugh.
Sometimes it is also due to the university changing their systems too often. When I was at uni, we had to switch from self-hosted + roundcube webmail to full Google suite to full Microsoft 365 : for mail and tooling in less then 9y. With transition period where everything was half-broken.
> Is this something researchers using Google Drive should be aware of?
This is something EVERYONE using Google Drive should be aware of. Your data on Google Drive is scanned by Google (and not only for policy violations), meaning it is not secured FROM Google, and false positives can and will cause your data to be lost and cause your account to be flagged. Using Google Sheets means that is your only copy of the data, so you also have no backup.
This is why I use local storage as my backup methods. Just upgraded my total storage capacity to 28TB with 3 copies. I may add cloud backups once I figure out how to make sure my uploads are encrypted, but owning your own storage is the only surefire way to make sure nobody else can play games with your data
3 copies is great, but is one of them offsite? Without an offsite location I can't consider my personal data safe, and offsite is generally harder if you are buying your own disks/using local storage. If all three copies are in your house, you are still vulnerable to thefts/acts of god etc, regardless of who owns the disk the data is sitting on.
Personally I think local storage is a bit of a losing battle for long term personal backups; the running costs of expanding and replacing disks at 28tb size start to get significant if you do it properly with offsite compared to the range of affordable encrypted cloud backup services out there, many of which let you provide your own private key for storing the data encrypted at rest on their side (how I use backblaze right now for all my machines).
I'm honestly not sure I could do local storage well at less than the $65 per computer per year I pay Backblaze - one new drive a year for a NAS is already more than I pay Backblaze annually before other costs and my time to maintain are factored. While I'm not accusing you of this, my experience is many people running their own NAS do so because they enjoy running their own NAS if they are totally honest with themselves, not because its necessarily the best or most convenient option.
I do keep one copy offsite. However, I'm not against cloud backups, I just prioritized setting up my local setup first so that I'm not dependent on the cloud. If I have 2 HDDs + backblaze and one HDD dies, it would be a pain to download 10TB of data from backblaze to set up the replacement HDD. I totally understand your point about running a NAS and I've actively tried to avoid setting one up. My storage system is entirely external HDDs. I want to expand to a NAS in the future but that'll be for fun, not backup (at least until I can afford 3 separate NAS instances for backup).
That being said, do you have any advice on how to encrypt data before uploading to cloud? I'm not at that stage yet so I haven't done deep research, but I've heard of things like veracrypt/truecrypt? And I've heard enough good things about backblaze (and enough bad things about other companies like google) that I'd be comfortable going with them if the price is in my budget and if I can just figure out the encryption angle.
With backblaze its as simple as typing your own private key in a box on the app preferences page, assuming you trust backblaze etc- that's all thats required to encrypt your backups such that backblaze cant decrypt them server side. The caveat is of course if you lose that key, you lose access to your backups and there is nothing Backblaze can do to help.
Following 3-2-1 backups is definitely recommended, and if you have a good NAS setup, it can be pretty easy, even with encrypted backups.
My NAS runs TrueNAS Scale. Any data I need to keep around long term goes into a personal share. I have restic installed on the NAS, with a USB drive hosting the repository for it connected to the NAS. At this point, copy may or may not exist on my personal computer, does exist in the personal share, and is copied to an encrypted backup repo locally. Finally, there's a sync job pushing the encrypted restic repo to Wasabi. I only follow 3-2-1 on things that need it, so things like my media library aren't part of it (I can just rebuild this stuff). At less than 1 TB, Wasabi can be pretty cheap.
In terms of documents, though, I tend to use tools like LibreOffice rather than cloud office tools. Having real documents that I can include in my 3-2-1 backups, as well as being able to share, is convenient and useful. Maybe not the best for collaboration, but that's easily worked around.
I still work with cloud office documents from time to time, as documents shared to me, not vice versa. If it's something important that I need to keep a copy of, I export it.
This isn't necessarily aimed at you but a bit of a rant because honestly I feel like encrypted backups for personal use are evangelized to the point that it's ridiculous. If my entire hard drive was dumped on the internet, there is nothing that would happen. That isn't to say people should use them but people should consider their threat model.
I prefer to encrypt my backup before sending it somewhere I don't control. Having an offsite backup implies not having control over it. "The Cloud" is just someone else's computer, and they can do with it what they want. It's not just about protecting the data from being leaked/stolen, but also to protect it from being tampered with, such as injecting unlawful material, malicious software, etc.
I don't have a problem with you saying they're ridiculous; I'm sure a lot of things are over-evangelized. I have no emotional attachment to anything except my data. However I am very private so I simply feel more comfortable having my data encrypted in the cloud and ideally encrypted on my HDDs. The former for backup privacy, the latter for privacy if I need to RMA an HDD one day.
The key point, to me, is that Google monitors all content. I don't think it's much of a surprise, but in context of Google Drive, it's pretty terrible. Drive is a anti-product like this. It's like if Western Digital decided to update the disk firmware to just ship your data to them for scrutiny.
> The key point, to me, is that Google monitors all content.
Google monitors and restricts shared content. This is reasonable, IMO. If someone is hosting documents publicly on the Google domain, it's reasonable for them to make judgment calls about what can and cannot be hosted.
Note that the file wasn't removed, it just had public sharing and other public features restricted. (Read the dialog in the Tweet closely)
... that might be a stretch. If this were a private file, I don't think it is evaluated in the same way. But he's sharing it, so if it is linking to malicious content using Google, they could be perceived as complicit so they need to take action. Otherwise any spammer in the world can host whatever link farms they like.
It's that we don't really know what is being inspected. I don't, at least. I've been looking, and I can't find any official statement or specific language in the myriad policies.
This is a pretty pessimistic comment. Cloud storage companies should, can, and have been held accountable for violating their users' privacy, including when they themselves are the violators.
Using data to detect abuse is a very different situation from using it in something like ad personalization. The companies that provide cloud storage still need to make a profit, and that means reducing overhead on things like human spam and DMCA processing; A few people abusing free/cheap accounts will just increase everyone's data costs.
This isn't an acceptable justification for what is happening. Removing people's legitimate private, non-spam, non-abusive documents should never ever happen.
There is no reason a priori that companies should be looking at private data, even automated agents thereof. The fact that they do monitor that content represents some kind of deviation from the null hypothesis that is important to identify and explicate.
In fact the opposite is true, companies care about what their servers are being used for, both to prevent abuse and legal liability. Any impression users may have to the contrary is a result of marketing and the widespread concept of "the cloud" as a nebulous entity.
E2E encrypted services are the exception, not the rule. This mischaracterization extends so far that when Apple announced plans to turn on E2EE for photos (necessitating an on-device scanning tool for banned photos before upload), the uproar against it was vast and unanimous. So now we still have the same photo scanning on Apple's servers, and no E2EE.
Stab in the dark but I imagine some of those campaign website domains lapsed, were picked up by domain expiry people because they probably have very good visibility (i.e. a lot of links to those pages from official/.gov sites), and had phishing or sketchy landing pages put on them.
Google then prunes these malicious pages from their search and then also flags any docs that have those links in them.
A question with no answer. It's an unnamed node tuned with automated routines. Classifier curators will scratch their heads too meanwhile victims have to wonder why this is acceptable.
The cloud is someone elses computer. Act accordingly.
It's a shame this person lost their work. More and more stories like this need to come to light and we need to convince others the cloud is a risk, not a solution.
It was a solution. Google Drive was great until they turned user/customer hostile. Arguably, no one could really have seen this coming, because this doesn't even seem to make business sense. Google are seemingly acting against the reputation of their own services.
It does make business sense. Drive is facing serious abuse problems. See for example this article [0] for the other side of this coin. Google is trying to come up with automated solutions for shutting down 'abusive' items. The OP is an example of the automated solutions not working that well.
You are conflating spam from 3rd parties with having a private file that Google thinks they need to block you from accessing.
It is 100% possible (and not even difficult) for Google to prevent me from receiving spam from other Google Drive users whilst also allowing me to access my own Google Drive files.
I suspect OP has not really lost access to their own files. Per the blog post about this system [0], "When it’s restricted, you may see a flag next to the filename, you won’t be able to share it, and your file will no longer be publicly accessible, even to people who have the link."
My guess is OP has multiple accounts, created this file on account A, and can no longer access the file while logged in as account B, so it seems to be "removed from Drive." But I could be wrong, in which case I agree with you, this doesn't make sense as an anti-spam measure.
> Arguably, no one could really have seen this coming
Maybe 15 years ago when all this was new....but it's been pretty obvious that this was the course of things over the last 6-7 years. It makes business sense when there is no competition.
> Would that be harmful to Google to have Google Sheets be used to distribute? Yes.
If it's a private file (not shared), the question doesn't even make sense.
Some degree of control might make sense for 100% publicly shared content on Google Drive, but how does it make any sense that they are cutting off access for the owner of the content?
Because part of being big & successful if you're Google involves operating a business. If your approach to everything is "why risk it?" then you eventually just choose to shut everything down.
One of the killer features of a QNAP NAS is the built in cloud backup utilities for Google and Microsoft - it always amazes me how often I have to fight with people to turn them on.
You'd think after the original Photobucket meltdown people would be more cognizant of stuff like this, but here we (still) are :(
Part of the response is for users to trust the cloud less, but I wish part of the response would be to increasingly hold cloud storage companies accountable such that in the future they can't get away with this kind of abuse without severe consequences for them.
It's getting more and more apparent moving Office/Productivity apps from your local system to web based is a huge mistake. In 20 years, I'm going to be a like the guy who writes manuscripts in Word Perfect 4.
It's always been apparent to some of us. Unless I have an explicit need to actively collaborate with another person on some document I've continued to use locally saved LibreOffice documents for everything, and even then I still download a copy from Google Drive when finished/periodically.
Am thinking that a few of the domains on the list expired and were snapped up by malware/scammer sites causing the spreadsheet to seem super sus to a machine scanning/checking urls.
If this were the case, then the reason for flagging should be fully disclosed in a transparent fashion to the end-user, with specific details.
Otherwise it's super confusing to have the entire document be marked "suspicious", and end-user may reasonably (naively) still paste and visit the link. When this happens, the entire effort to keep them safe is subverted by Google's own poor design.
How do you know 'google takes election stuff seriously'? Causation!=effect. TBH I'm surprised the OP even asked the question 'why did an algorithm remove my stuff?'. No google employee is going to tell you (even if they knew, which is quite doubtful given the complexities of machine learning). If you sign up for a service which is machine-managed, you are subject to the whims of the machine. Occams razor suggests it is unlikely there is a conspiracy, or malice at play. The machine is just doing its job for its masters, possibly badly, possibly not, but the general populace will never know one way or the other.
I can provide another datapoint. When I was at Arist (YC S20) we experienced tons of content filtering at the carrier level around election time. Messages that merely mentioned the word "vote" or "election" were getting blocked by many carriers. A few months later none of these restrictions seemed to exist. So not just Google.
Oh yeah carriers regularly filter based on content. I've posted about it extensively on HN before. I only have line-of-sight into them doing this for SMS messages arriving from businesses i.e. via Twilio, Telgorithm, Bandwidth, etc. I don't know one way or the other whether they also do this for regular SMS messages but I imagine automated 10DLC are under more scrutiny than a regular SMS user.
The filtering is phrase and word based generally, but sometimes also seems to use an ML model. We were able to use our own ML analysis to figure out the exact words causing content blocking and which carriers in a lot of scenarios. But yeah, it's definitely a thing at every major carrier.
what did your voting tool do? I suspect google like many other tech companies are trying to avoid having their platforms used to manipulate election activities and the definition of manipulate might be very broad. It being funded by a campaign might not matter to google at all.
Being funded by a campaign might even be a negative, as it would likely increase the odds that the address is being used for election manipulation.
It would be easy to register <state>voting@gmail.com, start using it as a contact for public-facing election-oriented documents, and get at least some people confused about whether I was officially associated with <state>'s election apparatus. Given the current environment, I'm positive Google very much doesn't want their systems being used to (even accidentally) impersonate election officials.
This is some heavy slippery slope shit. If you told people that this would be happening in 15 years back in 2007 you'd be called a paranoid conspiracy nut.
Yeah it was something people laughed at you for predicting would happen in the age of "everything online". Which is a synonym for "everything on the cloud". Which really means "all your stuff on someone else's computer"...
> It has a number of URLs. I suppose it's possible at least one of the URLs is no longer legitimate
Just a guess, but since it is a file which lists website urls one of those domains was probably flagged as a phishing site, and since your doc had a url with the same domain it got flagged as well. It wouldn't surprise me to find that one of those candidates' sites had actually been hijacked by some phisher. Unfortunately just visiting each site wouldn't necessarily reveal which one it was since they'd likely have tucked the php shell or other phishing stuff at some path away from the root of the site to avoid detection from the site's maintainer.
This is our regular reminder from Google that "the Cloud" means "someone else's computer". Whatever you store in the cloud is not under your control.
The cloud can be convenient, but it's not under your control, so always also keep a copy that's under your control. Of course your local copy isn't secure either, so having a copy in the cloud is still a good idea, but it shouldn't be your only copy.
I've found mega.nz to be a great alternative to Google Drive. I've never had issues with feature parity and it even has a proper Linux client.
The combination of flag-happy AI and the way Google will nuke your entire account without recourse* or a human ever being in the loop makes them a non-starter for me.
*Google might give you recourse if you can get a public uproar going on Twitter or HN
Lol, if only there were an open office like suite available with a libre license which could generate files in some sort of open documented format which could be shared by all through a clickable interface within a browser...
I'll bet that some of the candidate websites this file linked to had been taking over by malicious actors and WERE being used for phishing. By linking to them and sending them traffic, you were perceived as a bit complicit.
Sooner or later, and I suspect later, companies and individuals will realize that they can’t trust cloud providers-period. And all the connected gadgetry like Siri, Alexa, et al are not in anyone’s best interest.
Just call their support to sort out what happened and get it fixed.
Oh wait.
Welcome to our glorious future of “internet scale” companies which make money by taking advantage of the information YOU GIVE THEM don’t have care, and can’t be made to.
Well, as it has been pointed many times in the past, private companies have no obligations to provide any kind of a service, for free or for profit. See Youtube removing videos, Twitter banning users, etc.
You mean like back in 1987 - when Joe D. User could install MS-DOS from a couple floppy disks, then WordPerfect and Lotus 123 similarly, then just use his PC for years - withOUT knowing what Windows Registry entries, security patches, spam, phishing e-mails, virii, typosquatted domain names, SaaS, etc., etc., etc. were? Let alone having to regularly pour time & grief into dealing with those?
That I am aware of, every one of my family members who has retired in the past 15 years was stuck using MS Windows at work. Every one of them has been happy to delete MS and Windows from their life - getting rid of their home computer, buying an Apple iPad, and running their modest on-line lives from that iPad.
I'd really like to know if Google does process every single file. I'll operate as if it does, but I wish someone from inside would confirm and elaborate.
Google can't even spare humans to rescue their paying Google One customers from automated account bans. [1] There's no way a human is involved in this ban.
I’ve always assumed that Google Drive can not be used for anything important/critical. I store files on local disk and back up to Dropbox. Other backup services should be Ok too, I presume?
Strictly speaking, it's better than the alternative, but the warnings imply that anything in Office saved to a network location is "public facing", and if that implies Microsoft has inside information that it is...
You have to make stuff public on purpose in 365, of course they know what is visible.
Awhile ago there were a lot of fake login forms being made and getting hosted on official microsoft domains, via forms and Flow. Dont see that one as often anymore.
>You have to make stuff public on purpose in 365, of course they know what is visible.
You are saying that it truly groks the permissions and other security applied to every accessible shared drive, every SharePoint location, anything not a local file, before it warns you about PII?
As well as all the relevant regulations from HIPAA, to European privacy laws, to company & corporate classifications...?
Sure. It's called DLP and if you pay them, you can have it warn you about your own specific regulations before you commit a share based on the content of the file. There are premade options for a lot of different country's regulations or you can roll your own.
I think we may be at cross purposes, because nothing you have written makes any sense to me. We might just be thinking of different things.
First of all, I wasn't talking about a situation where people use OneDrive to store their files. Was that what you meant by "You have to make stuff public on purpose in 365"?
Secondly, "of course they know what is visible" doesn't track. Microsoft isn't a person and I assume has internal silos. I also have not gone over all the user agreements with a fine-tooth comb, plus all of the privacy and other options configured in particular instances. I do not believe there is a general artificial intelligence in anything from MS that can manage the complete context, from firewalls to hackers.
Finally, in your follow-up comment, to which I'm directly replying, it sounds like you are talking about a specific configurable feature, whereas I was originally referring to default behavior and it obviously not being useful or intelligent.
I highly recommend everyone stop using Google, including Google drive. But self-hosting "cloud" file storage is complicated for the average user, and easy to break, so I recommend using Dropbox (which for the past few years has used MS Office Online as editor for MS Office docs) and back that up on your own. Why Dropbox? It's super reliable, doesn't force its own filetypes on you (mostly), and has native clients for everything including linux cli. So this is something you can set up for your entire family, have backups go reliably and in the background, so that if something like this happens you can restore. And of course, for any sensitive files you don't want Dropbox to read, you gotta encrypt it yourself - but I'd say for the average user that encryption is only needed in rare occasions.
There is a yellow banner "This file looks suspicious. It might be used to steal your personal information" and an option to "Request a review".
It also says "This file can still be viewed, edited, and shared, but users will see a warning that alerts them that the content may be harmful. These restrictions were put in place because this content violates Google Drive's Phishing policy."
There is no indication of why the file was flagged.
Despite my concerns about a human looking at my personal file, I bit the bullet and clicked review several times but the banner remains after several months. Google support hasn't been helpful even though I'm a paid user.
I have now received 5 emails from "Google Drive Safety" notifying me of this alleged violation.