Hacker News new | past | comments | ask | show | jobs | submit login
Google Safe Browsing can kill a startup (gomox.medium.com)
1714 points by gomox on Jan 16, 2021 | hide | past | favorite | 543 comments



This is actually funny, because I was involved with the creation of this list, way back in 2004. The whole thing started as a way to stop phishing.

I was working at eBay/PayPal at the time, and we were finding a bunch of new phishing sites every day. We would keep a list and try to track down the owners of the (almost always hacked) sites and ask them to take it down. But sometimes it would take weeks or months for the site to get removed, so we looked for a better solution. We got together with the other big companies that were being phished (mostly banks) and formed a working group.

One of the things we did was approach the browser vendors and ask them if we could provide them a blacklist of phishing sites, which we already had, would they block those sites at the browser level.

For years, they said no, because they were worried about the liability of accidentally blocking something that wasn't a phishing site. So we all agreed to promise that no site would ever be put on the list without human verification and the lawyers did some lawyer magic to shift liability to the company that put a site on the list.

And thus, the built in blacklist was born. And it worked well for a while. We would find a site, put it on the list, and then all the browsers would block it.

But since then it seems that they have forgotten their fear of liability, as well as their promise that all sites on the list will be reviewed by a human. Now that the feature exists, they have found other uses for it.

And that is your slippery slope lesson for today! :)


This is an amazing story. It really demonstrates the way we pave our road to hell with good intentions...

We should really do something about this issue, where so few companies (arguably, a single one) hold so much power over the most fundamental technology of the era.


Here-here! I really wish there was more human involvement in a lot of these seemingly arbitrary AI-taken actions. Everything from app review to websites and more. This heavy reliance on automated systems has led us down this road. Shoot, keep it, just give us the option to guarantee human review - with of course transparency. We don't need anymore "some human looked at this and agreed, the decisions is final, goodbye."

I know it's easier said than done, especially when taking the scale of the requests into account, but the alternative has, does, and will continue to do serious harm to the many people and businesses caught in this wide, automated net.


It's interesting how closely the unfolding of this awful scenario has followed an entirely predictable path based on the shifting incentives: now hundreds of thousands of businesses face the same massive hazard of blocklisted without adequate human review, and with mediocre options to respond to it if it occurs.

Without a shift in incentives, its unlikely the outlook will improve. Unless the organisations affected (and those vulnerable) can organise and exert enough pressure for google to notice and adjust course, we're probably going to be stuck like this -or worse- for a long time.


Blacklisting a site incorrectly seems like a perfectly adequate reason for a defamation lawsuit. So, I think the real issue is with the legal system.


> this awful scenario has followed an entirely predictable path

The interesting things about predictable paths is that at the start there are a LOT of them, then over time there becomes just one of them. I don't see that this path was any more predictable at the start than any other.


It feels like the need for automated systems is a result of the ever-increasing size of the world (there are now nearly 5 billion internet users[0]). For Apple, app review can take days, mainly because doing human review [consistently] well and constantly for 8 hours a day isn't easy[1], leading to staffing issues when bad reviewers get weeded out and only a small percentage of hires stick around. Outside of hiring 10,000 employees just to endlessly review phishing links for 40 hours a week, you need automation to triage these phishing sites and deal with the outcome later such as via on-demand review by a human (which worked in this case, but won't always work - humans still make mistakes). I'm not sure if there is a solution for this problem outside of just not having the safe browsing product if 'makes no errors' is a requirement.

0: https://en.wikipedia.org/wiki/Global_Internet_usage

1: https://www.businessinsider.com/heres-why-it-really-sucks-to...


There's no reason the number of humans dealing with these problems can't scale alongside the number of humans creating them.

But it's a lot cheaper to pay for a few really expensive programmers to make a just-good-enough AI than to pay for thousands of human moderators. So we end up with a stupid computer creating tonnes of human misery all for the sake of FAANG's already fat profit margins.


"So we end up with a stupid computer creating tonnes of human misery all for the sake of FAANG's already fat profit margins."

I don't want to blame this entirely on the big companies, though. Also the people want and expect "free" things on the internet. This is how we ended up like this.


> There's no reason the number of humans dealing with these problems can't scale alongside the number of humans creating them.

I would think the attackers are using automation also, to spam attacks as in other areas of fraud. It can only be a battle of AI ultimately.


Depends on which problem your tackling. With App reviews for example it is very easy to rate limit the 100 USD developer licenses. And also in cases like the one the medium article is facing businesses would gladly pay a hundred bucks to get real humans to produce competent answers/reviews/decisions. And if you dislike this solution because it creates a google tax (pay us or we'll block your site), make it not a service payment, but a security deposit which they'll only keep if you are fraudulent in some way.


Is it just me but the way things are currently stacked, human insight is still the best line of defence? The OP and other anecdotes in the comments are examples why we’re not quite at “AI vs AI” yet


:s/Here-here/Hear hear/


Hear hear!

Where where!?

Here here!


> I really wish there was more human involvement in a lot of these seemingly arbitrary AI-taken actions.

Narrator: but it was only ever to get worse


Couldn’t agree more, the transparency is key. It enables faith in the system and outcome.

The counter argument to transparency will be that it provides too much information to those who aim to build phishing sites not blocked by the filter.

That said, we’ve experienced systems in which obfuscation wins out over transparency and it would be nice to tackle the challenges of transparency.


Are you implying that the list no longer has a good intention? I wouldn't be surprised if there are multiple orders of magnitude more phishing and hacked websites in 2021 than there was in 2004. Even with human checking, I doubt you'll even have 0% failure rate. Is the solution to just give up on blocking phishing sites?


The failure rate doesn't need to be 0%. If the solution is good, at least it'll be close to 0% which means that it'd be possible for the vendor to provide better support for the small number of mistakes so that they can be clearly explained to the affected party and rectified more quickly. If the failure rate is too high to make better support infeasible, then the current solution is not really a good one and we need to consider a revision.


> Are you implying that the list no longer has a good intention?

Most of the time I run into blocked sites they seem to be blocked because of copyright infringement, not phishing. The only phishing sites I've seen in the last year or so are custom tailored. For example, I had to deal with a compromised MS365 account last year where the bad actor spun up a custom phishing site using the logo, signature, etc. of the victim.

So IMHO the intentions are no longer pure plus the effect is diminished and being worked around.


The solution is for the legitimate sites that are driven out of business by Google AI to sue Google for tortuous interference and libel.


This helps one group and hurts another. If Google is liable for blocking potential malware and phishing pages, they'll either stop blocking it, or adjust their algorithm to strongly err on the side of allowing phishing sites.

Businesses become safer, but more regular people will get phished.


>or adjust their algorithm to strongly err on the side of allowing phishing sites.

It'a not the role of Google to disallow phishing sites (as a browser) just like it's not the role of the ISP.

Make it hookable so people can chose their own phising protection service.


People wouldn't know or care which to pick. They would see the pop-up asking to select a phishing protection provider, would get confused and angry and think "where do I click to get past this pop-up, I want to go on Facebook and this stupid computer is nagging me with stuff again!"

Phishing protection is mostly needed for people who have no clear concept of phishing or technicalities. They just want to do things on the internet, like social media, they don't care about things behind the scenes, that's boring uncool nerd stuff.


And then they will choose the same block list and sites will have the same problem.


All? I doubt it. Not to mention they could offer control to override whatever you like.


> they could offer control to override

Chrome lets you override and proceed to the site. The problem for the small business is that a large fraction of their customers see the scary red warning page.


Well enough that it will still be a blocker.


Well, that goes without saying. If you want a blocker, you want a blocker. So all the nigerian princes and the like should still be blocked.

You just don't want to give control over the blocking blacklist/whitelist to a single entity, even less so to a huge powerful one, possibly in a country other than your own (which e.g. forces their foreign policy dictums to your blacklist), and even less so the one that already makes your browser, that should be a totally neutral conduit.


I don't think this solves the problem from the article, since small businesses will still have to deal with getting mistakenly blocked by whatever the popular blockers are. With 40,000 new phishing sites per week, it's not an easy task. If the blockers are free (I imagine they'd have to be to get widespread adoption), who's going to review the false positives? Volunteers?

But also, it would leave the people most vulnerable to phishing unprotected, namely those not tech-savvy enough to install a phishing protection service. Most internet users don't even have ad-blockers.


The problem isn't the company that blocked it. The problem is the company that reported that there was a problem when there wasn't. In this case it sounds like Google is both companies.


>Is the solution to just give up on blocking phishing sites?

IMHO yes. It's too much power for one company to wield. And especially a company with such questionable morals as Google. This cure is worse than the disease.


I thought you said, the curse is worse than the disease... which also would've made sense.


" Is the solution to just give up on blocking phishing sites?"

But maybe not do it by default on browser-level.

But if you do, then there really needs to be ways to combat wrong decisions in a timely manner.


The solution is simple: Liability. As soon as it becomes legally infeasible to let algorithms block people, it will stop happening.

Make it easy and affordable to submit legal complaints for tech misbehavior and make the penalties hurt.


Ah, so you suggest liability for the vendors of the software blocking websites, with, in practice [1], no liability for the operators of a compromised website, if it is phishing/malware?

This is a great approach, if your goal is to optimize for increasing the amount of dangerous crap on the web. But, eh, that's surely worth it, because the profitability of startups is more important then little things like the security of the average netizen...

[1] Even if you make the operators liable [2], in practice, you'll never be able to collect from most of them. Whereas the blacklist curators are a singular, convenient target...

[2] If you can demonstrate how the operators of compromised websites can be held liable for all the harm they cause, I will happily agree that we should do away with blacklists. Unfortunately, the technical and legislative solutions for this are much worse than the disease you are trying to treat.


Since phishing is not going to go anywhere with or without blacklists - for obvious reasons, e.g. lists can't cover everything, and you can't add sites to the list instantly - I am willing to tolerate a slight increase in fishing which is going to exist anyway in exchange for not having Google (or any other megacorp, or any other organization for that matter) as a gatekeeper of everybody's access to the internet. The potential for abuse of such power is much greater and much more dangerous than the danger from tiny increase of phishing.


> I am willing to tolerate a slight increase in fishing

According to Google's most recent transparency report[1], as of December 20th of last year they were blocking around 27,000 malware distribution sites and a little over 2,000,000 phishing sites.

In your view, would turning off those blacklists and allowing those >2,000,000 sites to become functional again count as a "slight" increase?

(edit: That's a real question, incidentally, not a disagreement or an attempt at a 'zing'; I have no knowledge in this area but went to look up the numbers, and am curious whether 2,000,000 is truly a vanishingly small amount, relative to everything else that's out there that's not already on the list)

[1]: https://transparencyreport.google.com/safe-browsing/overview


I'm not sure what is counted as "sites" - i.e. if Google closes foo.bar/baz123 and the same server gets assigned bar.foo/zab345 and continues to serve malware, is it 2 separate sites? Did Google really achieve this much by forcing the changing of the URL? Sure, bunch of people that got the phish link in the mail that was sent before switch but then shut down won't be phished, but I have no idea how much that changes the picture - I'm sure phishers are well aware that their domains are short-lived and already adapted for that, otherwise they'd be extinct. However, I'd be glad to read some field-validated data about how much closing those 2M sites, whatever is meant by "sites", actually helps against phishing.

I mean if we could trust Google (or anybody else of that kind) to have blacklist strictly limited to reasonable definition of malware and phishing, and knew that usage of such list if strictly voluntary under control of the user, it would be an acceptable, if decidedly imperfect, remedy. But we know we can't trust any of this, even if whoever works on this at Google right now are sincerely ironclad committed to never any mission creep and abuse happen, once the means exist, these people can always be replaced with others that would use it to fight "misinformation", or "incitement", or "blasphemy", or whatever it is in fashion to fight this week. There's no mechanism that ensures it won't be abused, and abuse is very easy once the system is deployed.

Moreover, we (as, people not in control of Google's decisions) have absolutely no means to prevent any abuse of this, since Google owns the whole setup and we have no voice in their decision making process. Given that, it seems to be prudent to make all effort to reject it while we still can. Otherwise next time you'd want to make a site questioning Google's decisions about the malware list, nobody would be able to read it because it'd be marked as a malware site.


You can also be certain that these numbers include all the false-positives. One of the Open Source pages I maintain got blocked as well, because too many AV reported one library package as malware.

There's no "report as false-positive" button at Google, so these reports likely have a lot of false positives in them...


This was the case with railroads too, only a few controlled the biggest and most transforming and business-integral tech of 1800s.

Prior to that it was those that controlled the printing presses.

...

History continues to repeat itself.


2 millions phishing sites and counting... with 40000 websites added each week.

https://transparencyreport.google.com/safe-browsing/overview...

I guess the automation started in 2007 or so.


Like some kind of perverse blockchain, no site is ever removed, even though most phishing sites don't live long.


I think you mean 2017? 2007 is when the feature launched.


january/february 2007 looks like the time the list jumped from a few hundred to tens of thousands of sites.


That was the normal volume of manually identified sites at the time. Before 2007 there weren’t a lot of participants because it was in beta.


Malware may be easier to catch, but for phishing, it was fairly small (under 150k) until around 2016 where it starts growing linearly.


Something similar I've just read in zero to one (by Blake Masters and Peter Thiel). Peter argues that computers can't replace humans - it'd be foolish to expect that at least for coming decades – strong AI replacing human is the problem of 22nd century. He proposes Complementarity and provides a successful implementation of this idea in PayPal fraud detection system way back in 2002 when purely automated detection algorithms were quickly overcome by determined fraudsters. He went on founding Palantir based on the same idea.

>>> In mid-2000, we had survived the dot-com crash and we were growing fast, but we faced one huge problem: we were losing upwards of $10 million to credit card fraud every month. Since we were processing hundreds or even thousands of transactions per minute, we couldn’t possibly review each one—no human quality control team could work that fast. So we did what any group of engineers would do: we tried to automate a solution. First, Max Levchin assembled an elite team of mathematicians to study the fraudulent transfers in detail. Then we took what we learned and wrote software to automatically identify and cancel bogus transactions in real time. But it quickly became clear that this approach wouldn’t work either: after an hour or two, the thieves would catch on and change their tactics. We were dealing with an adaptive enemy, and our software couldn’t adapt in response. The fraudsters’ adaptive evasions fooled our automatic detection algorithms, but we found that they didn’t fool our human analysts as easily. So Max and his engineers rewrote the software to take a hybrid approach: the computer would flag the most suspicious transactions on a well-designed user interface, and human operators would make the final judgment as to their legitimacy. Thanks to this hybrid system—we named it “Igor,” after the Russian fraudster who bragged that we’d never be able to stop him—we turned our first quarterly profit in the first quarter of 2002 (as opposed to a quarterly loss of $29.3 million one year before). The FBI asked us if we’d let them use Igor to help detect financial crime. And Max was able to boast, grandiosely but truthfully, that he was “the Sherlock Holmes of the Internet Underground.” This kind of man-machine symbiosis enabled PayPal to stay in business, which in turn enabled hundreds of thousands of small businesses to accept the payments they needed to thrive on the internet. None of it would have been possible without the man-machine solution—even though most people would never see it or even hear about it.


Liability was my first though. How is an assertion that a site contains malware not libel? Site would be easily able to demonstrate lost revenue.


Can someone dig out that old agreement to see if Google can be sued big time for this?

I doubt it but I must say it would make me happy and that would be weird because Schadenfreude normally isn't my thing.


> since then it seems that they have forgotten their fear of liability

They most likely have offloaded the liability to a “machine learning algorithm”. It’s easy for companies to point the finger at an algorithm instead of them taking responsibility.


Which then leads them to the awkward place of having to be transparent about how their algorithm work

Either take responsibility, or be transparent.

But we all want our cake and eat it


I take offense to this. Sure, I like to eat cake.

But if I liked to eat cake as much as Google does, I'd have died of obesity (= have my life ruined by legal issues) a long time ago.


Simple solution = let google use their, imperfect (false-positives) filter, allow them to collect $12 / year not to be blacklisted, and google to send all revenue to the Electronic Frontier Foundation or similar internet defending foundations.


Another road to hell paved with good intentions. Once everyone’s paying, who’s to stop them from pocketing the money instead?

“After careful review, we’ve concluded that the Electronic Frontier Foundation no longer aligns with the goals of Google or its parent company Alphabet Inc. to the extent we require from recipients of our Freedom Fund. We will place these funds in a separate account and use them in ways we believe will be in the best interest of digital freedom, both now and in the future.”


Worse, getting such money flow, EFF will get corrupt very soon.


Absolutely. There’s nothing like guaranteed money or power to corrupt people.


"For years, they said no, because they were worried about the liability of accidentally blocking something that wasn't a phishing site."

Can anyone explain how a web browser author could be liable for using a blacklist. Once past the disclaimer in uppercase that precedes every software install, past the Public Suffix (White)List that browsers include, how do you successfully sue the author of a software program, a web browser, for having a dommainname blacklist. Spamhaus was once ordered to pay $11 million for blacklisting some spammers, but that did not involve a contractual relationship, e.g., a software license, between the spammers and Spamhaus.


I think the situation is actually exactly like the Spamhaus case you describe: it wouldn't be the browser user that sues, but the blocked website's owner. The website's owner need not have accepted any kind of agreement from the browser maker in order to be harmed by the blocklist.


Perhaps the website would sue the author of the list.

That does not explain why this comment suggests a browser author was afraid to use the list.

The browser author could easily require the list author to agree that the browser author has no obligations to the list author if the list author gets sued by a website, and the list author must idemnify the browser author if the browser author is named in any suit over the list. The list author must assume all the risk.


That's very interesting. Would you not think for a moment that such mechanism could be abused?


The internet was a much kinder trusting place back then. We assumed when the browser makers agreed to not use it for bad things, we believed them.


I think as always great ideas do not account for human nature...


> there is no chance in hell that the government will try to break them up.

Government is not the only option. Railroads were fixed by Congress. If you want to fix or split Google, writing your representative about your concerns might help.


After years of seeing developments like this, getting worse and worse, it fills me with rage to think about how clearly nobody in power at Google cares.

I naively used to think, "they probably don't realize what's happening and will fix it." I always try to give benefit of the doubt, especially having been on the other side so many times and seeing how 9 times out of 10 it's not malice, just incompetence, apathy, or hard priority choices based on economic constraints (the latter not likely a problem Google has though).

At this point however, I still don't think it's outright malice, but the doubling down on these horrific practices (algorithmically and opaquely destroying people) is so egregious that it doesn't really matter. As far as I'm concerned, Google is to be considered a hostile actor. It's not possible to do business on the internet in any way without running into them, so "de-Googling" isn't an option. Instead, I am going to personally (and advise my clients as well) to:

Consider Google as a malicious actor/threat in the InfoSec threat modeling that you do. Actively have a mitigation strategy in place to minimize damage to your company should you become the target of their attack.

As with most security planning/analyzing/mitigation, you have to balance the concerns of the CIA Triad. You can't just refuse Google altogether these days, but do NOT treat them as a friend or ally of your business, because they are most assuredly NOT.

I'm also considering AWS and Digital Ocean more in the same vein, although that's off topic on this thread. (I use Linode now as their support is great and they don't just drop ban hammers and leave you scrambling to figure out what happened).

Edit: Just to clarify (based on confusion in comments below), I am not saying Google is acting with malice (I don't believe they are personally). I am just suggesting you treat it as such for purposes of threat modeling your business/application.


Jon Williams, circa 1987, wrote a story of a far-flung humanity's future in "Dinosaurs," in which humans had been engineered into a variety of specialized forms to better serve humanity. After nine million years of tweaking, most of them are not too bright but they are perfect at what they do. Ambassador Drill is trying to prevent a newly discovered species, the Shar, from treading on the toes of humanity, because if the Shar do have even a slight accidental conflict as the result of human terraforming ships wiping out Shar colonies because they just didn't notice them, the rather terrifyingly adapted military subspecies branches of humanity will utterly wipe out the Shar, as they have efficiently done with so many others, just as a reflex. Ambassador Drill fears that negotations, despite his desire for peace, may not go well, because the terraforming ships will take a long time to receive information that the Shar are in fact sentient and billions of them ought not to be wiped out ...

Google, somehow, strikes me as this vision of humanity, but without an Ambassador Drill. It simply lumbers forward, doing its thing. It is to be modeled as a threat not because it is malign, but because it doesn't notice you exist as it takes another step forward. Threat modeling Lovecraft-style: entities that are alien and unlikely to single you out in particular, it's just what they do is a problem.

Google's desire for scale, scale, scale, meant that interactions must be handled through The Algorithms. I can imagine it still muttering "The algorithms said ..." as anti-trust measures reverse-Frankenstein it into hopefully more manageable pieces.


> Google's desire for scale, scale, scale, meant that interactions must be handled through The Algorithms

That's fine when you're a plucky growth startup. Less fine when you run half the internet.

If Google doesn't want to admit it's a mature business and pivot into margin-eating, but risk-reducing support staffing, then okay: break it back up into enough startup-sized chunks that the response failure of one isn't an existential threat to everyone.


This lack of staffing is something that really annoys me. It's all over the big tech companies, and is often cited as the reason why (for example) YouTube, Twitter, Facebook, etc cannot possibly proactively police (before publishing) all their user content due to the huge volume.

Of course they can; Google and the rest earn enough to throw people at the problems they cause/enable. If they can't, then they should stop. If you cannot scale responsibly, then you should not scale at all as your business has simply externalised your costs onto everyone else you impact.


There is a limit to which problems you can throw people at, though. Facebook’s and Youtube’s human moderators suffer from the trauma of watching millions of awful videos every day. Policing provocative posts that are dogwhistling while still allowing satire and legitimate free expression is incredibly challenging and requires lots of context in very different fields. It’s not as simple as setting up a side office in the Philippines and hiring a thousand locals for moderation.


Yes: skilled labor. This is not a novel problem. Other companies create internal training pipelines and pay higher wages to attract those sorts of employees, when they're critical to business success.


I agree. Google is such a large behemoth who actively tries to avoid customer support if they can. Splitting it to smaller business with a bit of autonomy and not having to rely on ad money fueling everything else means those smaller businesses have to give a shit about customers and compete on even ground.

Same applies to Facebook and other tech companies. The root issue is taking huge profits from area of business into other avenues which compete with the market on unfair ground (or out right buying out competition)

However anti-trust in US has eroded significantly.


> However anti-trust in US has eroded significantly.

Perhaps compared to the 40s-70s, but certainly not compared to the Reagan era. Starting with the Obama administration, there's been a strong rebirth of the anti-trust movement and it's only gaining momentum (see many recent examples of blocked mergers)[1].

[1] https://hbr.org/2017/12/the-rise-fall-and-rebirth-of-the-u-s...


The Obama admin used it only to attack enemies.

Renata Hesse was part of that effort, and has since worked for Google and Amazon, and is now expected to be in charge of anti-trust at Biden's DOJ.


And as long as the internet giants are on the correct side of the culture war there will be scant appetite for breaking them up or reigning them in. As long as you need 5 phone calls to silence someone and erase them from the online, there is no chance in hell that the government will try to break them up.


> That's fine when you're a plucky growth startup. Less fine when you run half the internet.

It's never fine.

The abdication of responsibility and, more importantly, liability to algorithms is everything that's wrong with the internet and the economy. The reason these tech conglomerates are able to get so big when companies before them couldn't is because it's impossible to scale the way they have without employing thousands of humans to do the jobs that are being poorly done by their algorithms. Nothing they're doing is really a new idea, they just cut costs and made the business more profitable. The promise is that the algorithms/AI can do just as good of a job as humans but that was always a lie and, by the time everyone caught on, they were "too big to fail".


> It's never fine.

It kind of is, though.

The idea is that the full algorithm is "automation plus some guy". Automation takes care of 99.9% of it, and some guy handles the 0.01% that's exceptional, falls through the cracks, and so on.

The problem is when you scale from 100,000 events per day to half a trillion, and your fallback is still basically "some guy". At ten failures a day, contacting The Guy means sending an email, and maybe sometimes it takes two. At a million failures a day, your only prayer of reaching The Guy is to get to the top of HN, or write a viral Twitter thread.

There are some things which are important enough that they can't be left up to this formula, and maybe you're thinking of those. I'm not, and I doubt the person you're replying to is either.


This is probably a big part of why Google is invested in (limited) AI, because a good enough "artificial support person" means having their cake and eating it too.


The issue with (limited) AI is that it's seductive. It allows executives to avoid spending actual money on problems, while chalking failures up to technical issues.

The responsible thing would be to (1) staff up a support org to ensure reasonable SLAs & (2) cut that support org when (and if) AI has proven itself capable of the task.


> It simply lumbers forward, doing its thing. It is to be modeled as a threat not because it is malign, but because it doesn't notice you exist as it takes another step forward.

This is a concept that I think deserves more popular currency. Every so often, you step on a snail. People actually hate doing this, because it's gross, and they will actively seek to avoid it. But that doesn't always work, and the fact that the human (1) would have preferred not to step on it; and (2) could, hypothetically, easily have avoided doing so, doesn't make things any better for the snail.

This is also what bothers me about people who swim with whales. Whales are very big. They are so big that just being near them can easily kill you, even though the whales generally harbor no ill intent.


I'm curious if whales more dangerous on an hour-by-hour basis than driving?

That's generally my rubric for whether a safety concern is possibly worth avoiding an activity over.


> I'm curious if whales more dangerous on an hour-by-hour basis than driving?

It depends on how many passengers you pack in a whale.


My understanding is that a Chrysler as big as a whale can seat about 20. (Love Shack, 1989)


> “You will have killed us,” Gram said, “destroyed the culture that we have built for thousands of years, and you won’t even give it any thought. Your species doesn’t think about what it does any more. It just acts, like a single-celled animal, engulfing everything it can reach. You say that you are a conscious species, but that isn’t true. Your every action is... instinct. Or reflex.

Good story. I can imagine what the specialized humans did to the generalist humans eons ago.


Except in our case, Google's terraforming ships couldn't care less. It's just not part of their programming that there might be some intelligent life out there worth caring about that might be hurt by their actions, so there's no way for them to receive this information. It's not that it's hard to explain, there's nobody to explain it to.


Modern large corporations are just an more inefficient, less effective paperclip maximizer, with humans gumming up the works.

Google is striving hard to remove the "human" part of the problem.


After finished reading the parent comment,

> Google's desire for scale, scale, scale, meant that interactions must be handled through The Algorithms. I can imagine it still muttering "The algorithms said ..." as anti-trust measures reverse-Frankenstein it into hopefully more manageable pieces.

I immediately pressed C-f to search the string "paperclip maximizer", and was not disappointed. Thanks for mentioning it.


Your making another perfect case of why Google should be broken up. It’s important that we can choose again.


Sounds like a non-aligned AI.


It essentially is a non-aligned AI. AIs don't need to be implemented in silico. Bureaucracy is by itself a computing medium too.


That makes me wonder if someone has ever written a scientific paper proving that the bureaucratic processes in place at their company are Turing Complete. You can imagine some sort of Rule 110 cellular automaton being implemented in TPS reports.


A cellular automaton over office documents would be a nice thing to try! That said, a proof of turing-completeness of bureaucracy is relatively trivial:

  FROBNICATION QUERY      ID [#1234]

  1. Requester data
     [bunch of boxes]
  1a. (*) Details on stuffs
     [bunch of boxes]
  1b. (*) Details on different stuffs
     [bunch of boxes]
  (...)
  4. Additional documents
    - [Frobnication Registration #432]
    - [Frobnication Query #1111]

  --
  (*) - Fill section a) if $something. Fill section b) if $somethingelse.
With sections 1a/1b implementing conditional branching, and section 4 implementing storage.


> proving that the bureaucratic processes... are Turing Complete

It's called COBOL


Thanks for mentioning this story; I just finished it and it's a great read.


Thanks for the story recommendation!


"never attribute to malice that which is adequately explained by stupidity" and all that, but after the events and the almost perfectly orchestrated behavior we've seen in the past and last couple of weeks it's becoming increasingly difficult, at least to me, to not attribute this to malice. Probably deliberate negligence is a better term. They know their systems can make mistakes, of course they do, and yet they build many of their ban-hammers and enforce them as if hat wasn't the case.

This approach to system's engineering is the technological equivalent of the personality trait I most abhor: the tendency to jump quickly to conclusions and not be skeptical of one's own world-view.

[1] https://en.m.wikipedia.org/wiki/Hanlon%27s_razor#cite_note-m...


"Consciously malicious" is not a good rule of thumb standard to measure threats to yourself or your business; it only accounts for a tiny bit of all possible threats. GP isn't claiming that Google is consciously malicious, they are claiming that you should prepare as if they were. These are not the same thing.

A lion may not be malicious when it's hunting you, it's just hungry; look out for it anyway. A drunk driver is unlikely targeting you specifically; drive carefully anyways. Nobody at Google is specifically thinking "hehehe now this will ruin jdsalareo's business!" but their decisions are arbitrary, generally impossible to appeal, and may ruin you regardless; prepare accordingly.


"The decisions are arbitrary, impossible to appeal, and may ruin you."

This is a monopoly.


Google may be a monopoly, but this quote has nothing to do with monopoly status. It has to do with power.

As a local businessman I can ruin someone’s life by applying the right legal pressure. Likewise, if one of my customers is reliant on my product to run their own business, and I drop them suddenly (akin to what google sometimes does), that could ruin them. But it’s not because I’m a monopoly, only because people rely on me. Monopoly implies there’s no choice, and while that IS true with google and search. It is not implied by “arbitrary, impossible to appeal, and may ruin you”. The two are distinct (though often related) problems that are both exemplified in Google.


Yes, exactly what I meant, thank you.

And very well said I might add. I don't mean to leave a vapid "I agree with you" comment, but your analogies are fantastic. They are accurate, vivid, and easily understandable.


I think mistakes just happen and are possibly just as helpful as they are harmful to Google. If they find something they particularly hate or damaging they can just "oops" their way to the problem being gone. Take Firefox[1], each time a service went "oops" on Firefox they gained marketshare on Chrome.

I have no doubt they'd use similar "oops" for crushing a new competitor in the ad space. Or perhaps quashing a nascent unionizing effort. It's all tinfoil of course because we don't have any public oversight bodies with enough power to look into it.

[1] https://www.techspot.com/news/79672-google-accused-sabotagin...


That's the nature of a dominant position. It gives you the power to engineer "heads I win, tails you lose" dynamics.


Well, I think the stupidity and laziness is exacerbated by their ill will towards customers and users. This is also what prevents them from reforming. The general good will and sense of common purpose was necessary in Google's early days when they portrayed themselves as shepherds of the growth of the web. Now they are more like feudal tax collectors and census takers. Sure they are mostly interested in extracting their click-tolls, but sometimes they just do sadistic stuff because it feels good to hurt people and to be powerful. Any pseudo-religious sense of moral obligation to encourage 'virtuous' web practices has ossified, decayed, been forgotten, or been discarded.


I was thinking about this this week in the context of online shopping with in store delivery. My wife recently waited nearly half an hour for a “drive up” delivery where she had to check in with an app. Apparently the message didn’t make it to the store, and when she called half way into her wait she wasn’t greeted with consolation, but derision for not understanding the failure points in this workflow.

It seems that the inflexible workflows of data processing have crept into meatspace, eliminating autonomy from workers job function. This has come at the huge expense of perceived customer service. As an engineer who has long worked with IT teams creating workflows for creators and business people, I see the same non-empathetic, user-hostile interactions well known in internal tools become the standard way to interact with businesses of all sizes. Broken interactions that previously would be worked around now leave customer service reps stumped and with no recourse except the most blunt choices.

This may be best for the bottom line, but we’ve lost some humanity in the process. I fear that the margins to return to some previously organic interaction would be so high that it would be impossible to scale and compete. Boutique shops still offer this service, but often charge accordingly and without the ability to maintain in person interactions at the moment, I worry there won’t be many left when pandemic subsides.


Very poignant observation. I have run into this as well in situations in meat-space everywhere from the DMV queue to grocery pickup.

Empathy and understanding for fellow humans is at an all time low, no doubt exacerbated by technologies dehumanizing us into data points and JSON objects in a queue waiting for the algorithm to service.

As wonderful as tech has made our lives, it is not fully in the category of "better" by any stretch. You're totally right about margins being too high, but I do hope it opens up possibilities that someone is clever enough to hack.


One of the things I hate the most is people I'm transacting with telling me something has to be done in a certain way because that's how "their system" works.

A recent example, I forgot to pay my phone bill on time and network access got turned off. I came to pay it on Friday, and they tell me the notice will appear in their systems only on Monday and then it takes 2 days for the system to automatically reactivate my access. No, they can't make a simple phone call to someone in the company, yes I will be charged full monthly price for the next month even though I didn't have access for a few days, nothing we can do - ciao


Systems (normally) model organizational processes, so companies with garbage processes usually have garbage systems in place too. This highly specific case reeks of fraud, and you should be able to report them to some kind of ombudsman so you could get your couple days' worth of fees back.


I would bet they have some terms & conditions the person agreed to that leaves them legally SOL.


I would also bet that the right kind of escalation leads directly to the desk of someone who will give them a refund.


Yes, the terms probably were written when it took two days for a check to clear.

No, the ombudsman probably can’t get legal to update the T&Cs


Contracts by definition cannot bind people into illegal conditions, and there's degrees of neglect that can be considered illegal. The entire point of an ombudsman is to keep actors within "this is not illegal" lines; I'm guessing you could do this on small claims court too, but with the plague and everything it can take a lot longer


I have not been noticing that.

I am finding the poorly paid workers who provide service to me polite and helpful.

Perhaps this is geography? Different in different places?


>”never attribute to malice that which is adequately explained by stupidity"

I keep reading this on the internet as if it’s some sort of truism, but every situation in life is not a court where a prosecutor is trying to prove intent.

There is insufficient time and resources to evaluate each and every circumstance to determine each and every causative factor, so we have to use heuristics to get by and make the best guesses. And sometimes, even many times, people do act with malice to get what they want. But they’re obviously not going to leave a paper trail for you to be able to prove it.


> I keep reading this on the internet as if it’s some sort of truism

I don’t believe this statement was initially intended to be axiomatic, rather, to serve as a reminder that the injury one is currently suffering is perhaps more likely than not, the result of human frailty.


I'm not sure it's even attributable to stupidity (necessarily) as attributable to automation or, more long-windedly, attributable to the fact that automation at scale will sometimes scale in wacky ways and said scale also makes it nearly impossible--or at least unprofitable--to insert meaningful human intervention into the loop.

Not Google, but a few months back I suddenly couldn't post on Twitter. Why? Who knows. I don't really do politics on Twitter and certainly don't post borderline content in general. I opened a support ticket and a follow-up one and it got cleared about a week later. Never found out a reason. I could probably have pulled strings if I had to but fortunately didn't need to. But, yeah, you can just randomly lose access to things because some algorithm woke up on the wrong side of the bed.


>said scale also makes it nearly impossible--or at least unprofitable--to insert meaningful human intervention into the loop.

Retail and hotels and restaurants can insert meaningful human intervention with less than 5% profit margins, but a company with consistent $400k+ profit per employee per quarter can not?

https://csimarket.com/stocks/singleEfficiencyeit.php?code=GO...

This is what I'm talking about in my original comment about the malice and stupidity aphorism.

Someone or some team of people is making the conscious decision that the extra profit from not having human intervention is worth more than avoiding the harm caused to innocent parties.

This is not a retail establishment barely surviving due to intense competition that may have false positives every now and then because it's not feasible to catch 100% of the errors.

This is an organization that has consistently shown they value higher profits due to higher efficiencies from automation more than giving up even an ounce of that to prevent destroying some people's livelihoods. And they're not going to state that on their "About Us" page on their website. But we can reasonably deduce it from their consistent actions over 10+ years.


Fair enough. Scale does make things harder but my $FINANCIAL_INSTITUTION has a lot of scale too and, if I have an issue with my account, I'll have someone on the phone sooner rather than later.


You're saying that as if it contradicts (“but”) what lotsofpulp said, but that was exactly their point: If your bank can do it, then so could Google. That they choose not to is a conscious choice, and not a beneficious one.

Conrad's corollary to Hanlon's razor: Said razor having been over-spread and under-understood on the Internet for a long while now, it's time to stop routinely attributing lots of things only to stupidity, when allowing that stupidity to continue unchecked and unabated actually is a form of malice.

(Hm, yeah, might need a bit of polishing, but I hope the gist is clear.)


I'd go with: "Sufficient stupidity[1] is indistinguishable from malice"

[1]: Where stupidity is further defined as "willful ignorance"


Meta: I’ve vouched for this comment. You have been shadowbanned.


I thought I was agreeing. "Fair enough."


I was just paying a bill online.

I had loading images turned off in my browser.

So I get the checkbox captcha thing, and checking it is not enough, so I have to click on taxis, etc. Which didn't initially show because of images being off.

I eventually did turn on images for the site and reload it. But at first, I was like "wait a minute, why should I have to have images on to pay a bill?" and I clicked a bunch of things I'd never tried before to see if there was an alternative. It appears that you have to be able to do either the image captcha or some sort of auditory thing. I guess accessibility doesn't include Helen Keller, or to someone who has both images and speaker turned off (which I have done at some times).

Maybe this is hard for someone younger to understand, but when I was first using computers, many had neither high quality graphics nor audio - that was a special advanced thing called "multimedia". It feels like something is severely wrong with the world if that is now a requirement to interact and do basic stuff online.


Genuinely-handicapped users should certainly have accommodations that allow them to pay bills using the necessary accessibility tools. It's always tricky to keep those tools from being leveraged by spammers and phishers, though, as witnessed by how TDD services for the deaf were misused in the past. Hard problem to solve in general, either through legislation or technology.

But if you're an ordinary user without special challenges, why would you expect anything to work after turning images off in your browser? If you're that much of a Luddite, maybe computers and technology aren't appropriate areas of interest for you to pursue.


Once upon a time, it was not only easy to find the option to disable image loading, but you could easily load them a la carte, by right clicking on any placeholder.

With the browser I use now, it seems to only let you reenable images per-site and then you have to dig in settings to delete the exception.

There IS a Load Image menu item when I right click...but it does nothing! Neither does "Open image in new tab".

I think it's unfortunate if there is a "long tail" of features in a typical application these days that are not expected to work.


What frustrates me personally is that there used to be a Firefox extension to suppress displaying a particular image, which is no longer available. I can't see the utility of disabling all images, but that extension was nice because you could use it to remove things you were tired of seeing like obnoxious backgrounds, avatar photos, and even some ads. Once you right-clicked on an image and told it to remove it, you'd never see it again, even in subsequent browsing sessions.

This extension died during one of Firefox's periodic Purges of Useful Functionality(tm), and I've been looking for another one ever since. So to some extent I see where you're coming from, but a general jihad against sound and images in the browser seems pretty radical.


I would agree. It's not useful in the context of remediation or defense, but on a human emotional level it's extremely helpful.

When Google kills your business it doesn't help your business to assume no malice, but it may help you not feel as personally insulted, which ultimately is worth a lot to the human experience.

Humans can be totally happy living in poverty if they feel loved and validated, or totally miserable living as Kings if they feel they are surrounded by backstabbers and plotters. Intent doesn't matter to outcome, but it sure does to the way we feel about it.


The saying is for your own sanity. If you go around assuming every mistake is malicious, it’s going to fuck up your interactions with the world.

Everyone I know who approaches the world with a me vs. them mentality appears to be constantly fraught with the latest pile of actors “trying to fuck them”.

It’s an angry, depressing life when you think that the teller at the grocery store is literally trying to steal from you when they accidentally double scan something.


One does not have to choose between assuming everything is malice or everything is stupid. Situations in the real world are more nuanced, and hence the saying is inane.


It’s not though. Assuming malice is incorrect 99.9% of the time and correctly identifying that other fraction offers so little upside. What good does it do to realize earlier that the person is malicious and not incompetent?


I think you have a point, and it's important to not be naive as people out there will steamroll those around them if given the opportunity. Personally I try to not immediately assume malice because I've found it leads to conspiracy-minded thinking, where everything bad is due to some evil "them" pulling the strings. While I'm sure there are some real "Mr. Burns" types out there, I can't help but feel most people (including groups of them as corporations) are just acting in self-interest, often stumbling while they do it.


It's a truism not because people are never malicious, but because we tend to see agency where there is none. Accidents are seen as intentional. This tendency leads to conspiracy theories, superstitions, magical thinking, etc. We're strongly biased towards interpreting hurtful actions as malice.


I'd add to this that willfully refusing to remedy stupid can be an act of malice.


That's a very good point. Actually, I just thought about something in the context of this conversation: one's absolute top priority, both in life and tech, should be to stop the bleeding[1] that emerges from problematic circumstances.

Whether those problematic circumstances, harm, arise due to happenstance, ignorance, negligence, malice, mischievousness, ill intentions or any other possible reason is ancillary to the initial objective and top priority of stopping the bleeding. Intent should be of no interest to first respondents, rather customers or decision makers in our case, when harm has materialized.

Establishing intent might be useful or even crucial for the purposes of attribution, negotiation, legislation, punishment, etc. All those, however, are only of interest, in this context, when the company in question hasn't completely damaged their brand and the public, us, hasn't become unable to trust them.

All this to say, yes, this is a terrible situation to be in, how are we going to solve it?

Do I care if Google is doing harm to the web due to being wilfully ignorant, negligent, ill-intentioned, etc? no, not an iota, I care about solving the problem. Whether they do harm deliberately or for other reasons should be of no interest to me in the interest of stopping the bleeding.

[1] https://isc.sans.edu/diary/Making+Intelligence+Actionable/41...


I agree with your sentiment. Modeling intent is useful in two cases: (1) predicting the future, and (2) in court. When modeling intent has no predictive power, it’s generally irrelevant, as you said.


Employees and managers at Google get promoted by launching features and products. They're constitutionally incapable of fixing problems caused by over-active features for the same reason they've launched seven different chat apps.


We are all living at the whim of Google’s technical debt.


I personally find Hanlon's Razor to be gratuitously misapplied. Corporate strategy is often better described as weaponized willful ignorance. You set up a list of problems that shall not be solved or worked on, and that sets the tone of interaction with the world.

Plus financial incentive creates oh so many opportunities for things to go wrong or be outright miscommunicated it is not even funny.


Thanks, I totally agree. Just to be clear I'm not saying it's malice as I don't believe that. I'm just saying the end result is the same so one should consider them a hostile actor for purposes of threat modeling.

Given you're the second person who I think took away that I was accusing them of malice, I probably need to reword my post a bit to reduce confusion.

Accusing them of malice is irresponsible without evidence, and if I were doing that it would undermine my credibility (which is why I'm pointing this out).


> Thanks, I totally agree. Just to be clear I'm not saying it's malice as I don't believe that. I'm just saying the end result is the same so one should consider them a hostile actor for purposes of threat modeling.

No worries at all! I interpreted your post the way you intended; and I agree fully being also in InfoSec.

Going by how you phrased your original post, you're probably more patient and/or well-intentioned than me as I'm farther along the path of attributing mistakes by big, powerful corporations to malice right away.


They probably aren't malicious, but they are definitely antagonists.


Your comment made me think that they have the same attitude with support as they do with hiring, they are ok with a non fine-tuned model as long as the false positives / negatives impact individuals rather than Google’s corporate goals.


I would argue that a consistent behave defeats the benefit of the doubt or involuntary stupidity. Also I believe most of good sounding quotes may be easy to remember but not backed by many truths.


Author here. I don't think it's malice on their part, but their hammer is too big to be wielded so carelessly.


Yes I agree with you (and thank you for your medium post by the way. Our only chance of ever improving the situation is to call attention to it. I fully believe Google leadership has to be aware of it at this point, but it clearly won't be a priority to them to fix until the public backlash/pressure is great enough that they have to).

Just to avoid any misreading, I didn't say I thought it was malice on Google's part. My opinion (as mentioned above, is):

> I still don't think it's outright malice, but the doubling down on these horrific practices (algorithmically and opaquely destroying people) is so egregious that it doesn't really matter.

So they are not (at least in my opinion without seeing evidence to the contrary) outright malicious. But from the perspective of a site owner, I think they should be considered as such and therefore mitigations and defense should be a part of your planning (disaster recovery, etc).


I do not trust management folks, whose paychecks and promotions are dependent on how successful such hostile actions are, to take the right decisions. I also do not think that they are deliberately ignorant/indifferent or that calling attention to it will do any good. These types of individuals got to where they are largely by knowing fully well that their actions are malicious and legal. I used to work under such people, and currently interact with and work with such people on a very regular basis (you could even consider me as part of them tbh). It is very much possible that the management level folks at Google don't have an ounce of goodness in them, and will always see such decisions from a zero-sum perspective.

To make it relatable, do you care so much for a mosquito if it's buzzing around you, disrupting your work and taking a toll on your patience? Because your SaaS is a mosquito to Google. After a certain point, you will want to kill the mosquito, and that's exactly what Google execs think so as to get to their next paycheck.


They have the option of not wielding the hammer. I for one never appointed them the guardian of the walled internet.


So browsers should just let users go to obvious phishing sites?

It's easy to take this position when you're very tech savvy. Imagine how many billions of less tech savvy people these kinds of blocklists are protecting.

It's very easy to imagine a different kind of article being written: "How Google and Mozilla let their users get scammed".


I mean, it was barely a decade ago when my parents computers regularly got filled with malware and popups and scams. They regularly fell for bullshit online. Maybe they have gotten more savvy, but I feel like this has overall greatly decreased, in a world where there's actually increasingly more bad actors.


Even if you're tech savvy. I've been phished. I was only saved by 2 factor and luck.

https://blog.greggman.com/blog/getting-phished/


> I for one never appointed them the guardian of the walled internet.

On the other hand, lots of chrome users most likely do trust google to protect them from phishing sites. For those ~3 billion users a false positive on some SaaS they've never heard of is a small price to pay.

It's a tricky moral question as to what level of harm to businesses is an acceptable trade off for the security of those users.


The trade-off isn't between increased phishing vs. increased false positives. It's being able to get a human on the phone vs Google's profit margins. Break them up already.


I actually don't think this is that hard to fix though.

I'm a fan of google doing their best to protect people from scammers. The real issue here is no way to submit an escalated help request when they accidentally mess up. eg they could build a service where -- and I doubt scammers would play -- $100 (or even $1k) would escalate a help request with a 15 minute SLA. I run a business; we would have no problem paying an escalation fee.


I can already see the headlines on HN:

"How Google Runs a Pay-to-Play Protection Racket"


I mean, that's their whole business anyway, so...

Format your site to suit google, or they don't index it.

Add headers to your emails or google reduces deliverability.

Pay for clicks on your own company's name or google sells ads against the name of your company! They monetize navigation queries.

Run your site through amp and let google steal your traffic or google pushes your search rank down the page.

Let google steal answers to questions contained on your site and display them as answers w/o sending people to your site, or they deindex you (see tons of examples, but also genius).

Let google steal your carefully curated and expensive photographs for google shopping and use them for the item from other vendors or you can't list items in google shopping.

etc etc etc... it's nothing new. So we may as well encourage them to do a more helpful job of what they were going to do anyway.


This was the old Microsoft support model: opening a case cost $99(IIRC), but if the case was actually a MS bug/issue they’d waive the fee.


It might have started at $99 but it's much higher now. I think the last time I used it it was $299 but that was at least 2 decades ago. Fortunately it was their bug.


This. Why is there an implicit agreement that okay Google is the gatekeeper. It shouldn't be. The internet did not appoint Google as the gatekeeper.


>The internet did not appoint Google as the gatekeeper.

Uh, it kind of did, when internet-savvy early adopters (and developers) convinced all their friends, then family, then acquaintances, to switch to Chrome a decade ago.

I know there's probably a very large number of FOSS-only types on this site who would disagree with that assessment, and claim that they've always been in the Firefox camp, but the sheer market share of chrome clearly shows that they are the minority.

Everyone switched to chrome because they were tired of IE having too much power and not conforming to standards. Nowadays web devs often build chrome-first, using chromium-only features, and the shoe has almost migrated to the other foot.


> Why is there an implicit agreement that okay Google is the gatekeeper.

Because they run a popular browser and don't want their users getting scammed?

For each tech savvy person mad about this, there's 10 non-tech-savvy people completely oblivious that could get scammed by phishing sites we'd consider obvious.

Sure, they should do a better job, but that blacklist is probably millions of websites big at this point. It's the kind of thing where a perfect job is essentially impossible, and the scale means that even doing a decent job is going to be extremely difficult.


Have you considered not using a 3rd party for hosting your JavaScript? There is always going to be some risk if the code isn’t under your control.


Is this list only maintained by Google? Do Firefox and Bing use the same list, is their process better/different? Is there any sharing happening?


SmartScreen is a different list. (And has a "This website isn't malicious!" button.)


Agree, we can only vote with our clicks.

Sadly gmail and google docs are top notch products :(


No, we can't vote with our clicks. That's what it means when a handful of companies dominate most of the web and the web playing a dominant role in global economy.

We have very little real choice.

Occasionally people will pretend this is not so. In particular those who can't escape the iron grasp these companies have on the industry. Whose success depends on being in good standing with these companies. Or those whose financial interests strongly align with the fortunes of these dominant players.

I own stock in several of these companies. You could call it hypocrisy, or you could even view it as cynicism. I choose to see it as realism. I have zero influence over what the giants do, and I do have to manage my modest investments in the way that makes the most financial sense. These companies have happened to be very good investments over the last decade.

And I guess I am not alone in this.

I guess what most of us are waiting for is the regulatory bodies to take action. So we don't have to make hard choices. Governments can make a real difference. That they so far haven't made any material difference with their insubstantial mosquito bites doesn't mean we don't hold out some hope they might. One day. Even though the chances are indeed very nearly zero.

What's the worst that can happen to these companies? Losing an antitrust lawsuit? Oh please. There are a million ways to circumvent this even if the law were to come down hard on them. They can appeal, delay, confuse and wear down entire governments. If they are patient enough they can even wait until the next election - either hoping, or greasing the skids, for a more "friendly" government.

They do have the power to curate the reality perceived by the masses. Let's not forget that.

Eventually, like any powerful industry they will have lobbyists write the laws they want, and their bought and paid for politicians drip them into legislation as innocent little riders.

We can't vote with our clicks. We really can't in any way that matters.

That being said, I also would like regulatory bodies to step in and do something about it. To level the playing field. If nothing else, to create more investment opportunities.


Do you think the 1982 breakup of AT&T would have been possible in today's political reality?


No.


Stock picking is not realism.


If by that you mean that valuations are not the result of a rational process, you are correct.

But investment strategy isn't so much about any underlying reality as it is about the psychology of market participants. You don't invest based on what you hope will happen, but what you believe will happen.


Great article. It’s not malice, it’s indifference.

Googles execs and veeps don’t care about small businesses, because most are career ladder climbers who went straight from elite colleges to big companies. Conformists who won’t ever know what it’s like to be a startup. As a group, empathy isn’t a thing for them.


Don't a lot of startup founders go to elite colleges and come from big companies?


The funded ones with the 2 year timelines generally are. But most startups are more bootstrap/angel investor with a bright owner who has a fatal flaw.


Is this an excerpt from your woefully unpublished startup culture fanfic novella? You can't just leave us hanging.


That is malice.

Accidentally unleashing a process that harms people is negligence. Not caring that you are being negligent is malice.


IMHO, it sounds like it worked. The things you changed sound like it's made your site more secure. In the future, Googles hammer can be a bit more precise since you've segregated data.

And you don't know what triggered it. It's possible that one of your clients was compromised or one of their customers was trying to use the system to distribute malware.


It's only more secure from Google's blacklist hammer.

No significant security is introduced by splitting our company's properties into a myriad of separate domains.

This type of incident can be a deadly blow to a B2B SaaS company since you are essentially taking out an uptime sensitive service that a lot of times has downtime penalties written down in a contract. Whether this is downtime will depend on how exactly the availability definition is written.


To add to this - by splitting and moving domains you've hurt your search rank, eliminated the chance to share cookies (auth, eg) between these domains, and are now subject to new cross-domain security dings in other tooling. Lose-lose.


We're talking about user uploads into a ticket system. They should not be publicly available at all. It won't hurt search rank.


If you split up your user uploaded material into per client subdomains you will know which one is uploading the malicious files. And your clients can block other subdomains limiting their exposure as well. Is it a huge improvement? No, but at least it's something


It's not clear from other commenters that had similar issues that GSB would not outright ban the entire domain instead of specific subdomains.

In this case, the subdomain they banned was xxx.cloudfront.net, and we know they would not block that whole domain.

We might consider that approach in the future, but I foresee complications in the setup.


It's probably "scale thinking" that makes google seem like they don't care: Everything is huge when you're "at scale"; the impact of a small blunder can take down companies or black out nation states. It's part of the game of being "at scale". They probably believe that it's untenable to build the necessary infrastructure to where everything (website, startup, person, etc.) matters.

This will sound crass, but it reminds me of Soviets cutting off the food supply to millions of people over the winter, due to industrial restructuring, and they brushed it off as "collateral damage".


Your comment reminds me of the first 30 seconds of this scene from The Third Man https://youtu.be/vSc-91F5Wiw


Of course they care. They've taken over everything they've been able to take over and they're still going strong. This is not by mistake. They just care about different things than you do. This is why Google needs to be broken up.


> I am not saying Google is acting with malice (I don't believe they are personally)

I'd agree. The problem is there is no financial or regulatory incentive to do the right thing here.

It has zero immediate impact on their bottom line to have things work in the current fashion, and the longer term damage to their reputation etc. is much harder to quantify.

There's no incentive for them to fix this, so why would they?


They're never gonna care. They aren't incentivized to care. The only thing that can change the situation is the power of the American federal government, which needs to break Alphabet into 20-50 different companies.


> nobody in power at Google cares

My assessment might be “nobody in power has time to prevent the myriad of problems happening all of the time, even though they handle the majority, with help from businesses, government agencies, etc., and given the huge impact of some problems to society as a whole, they may feel as though they’re rising in the front seat of a roller coaster, unaware of your single voice among billions from the ground down below.”


> they probably don't realize what's happening and will fix it

“If only the czar knew!”


I'm with you on the rest, but what has DO done to not have the benefit of doubt?

Also, to your point, an organization becomes something else than the sum of its parts, especially the bigger it gets.

Google can be a malicious actor without necessarily having individuals make act maliciously.


Yeah that's a fair question. I had a bad personal experience with them, but I've also seen plenty of issues too. There was a big one a little while ago about how Digital Ocean destroyed somebody's entire company by banning them with AI: https://news.ycombinator.com/item?id=20064169 Original Twitter thread: https://twitter.com/w3Nicolas/status/1134529316904153089

In their defense they acknowledged it and some changes. I can't find the blog post now so going from memory. But that only happened because he got lucky and it blew up on HN/twitter and got the attention of leadership at DO. How many people have beenh destroyed in silence?

In my case, Digital Ocean only allows one payment card at a time and my customer (for whom the services were running) provided me with a card that was charged directly.

A couple months later my customer forgot that he had provided the card. He didn't recognizer "Digital Ocean" and thought he had been hacked (which has happened to him before) and called the bank and placed a chargeback.

When DO got the charge back they emailed me and also completely locked my account so I was totally unable to access the UI or API. I didn't find out about the locked account until the next day. I responded to the email immediately, and called my customer, who apologized and called the bank to reverse the chargeback. I was as responsive as they could have asked for.

The next day I needed to open a port in the firewall for a developer to do some work. I was greeted with the dreaded "account logged" screen. I emailed them begging and pleading with them to unblock my account. They responded that they would not unlock the account until the chargeback reversal had cleared. Research showed that it can take weeks for that to happen.

I emailed again explaining that this was totally unacceptable. It is not ok to have to tell your client "yeah sorry I can't open that firewall port for your developer because my account is locked. Might be a couple of weeks." After a day or so, they finally responded and unlocked my account. Fortunately they didn't terminate my droplets, but I wonder what would have happened if I had already started using object storage as I had been planning. This was all over about $30 by the way.

After that terrifying experience, I decided staying on DO was just too risky. Linode's pricing is nearly identical and they have mostly the same features. Prior to launching my new infrastructure I emailed their support asking about their policy. They do not lock accounts unless the person is long-term unresponsive or has a history of abuse.

I've talked with Linode support several times and they've always been great. They're my go to now.


I see where you're coming from. I've also had a bad experience with DO (CC arbitrarily blocked them which ended up with my droplets getting terminated and all data and backups wiped). That was at least as much an error on my part, though.

It does seem that they're unfortunately borrowing the playbook from AWS/Azure/GCP wrt over-automization as they scale. More old-school support could have been their differentiator, but it seems they're going for growth. They're getting close to the razor's edge.


I had a similar experience as well https://news.ycombinator.com/item?id=18145781

I no longer recommend them any production usage.


I'd go a step further and claim that most tech companies are ultimately a threat to people's freedom and happiness. Not the tech itself, but the people that wield and profit from it.


Massive bureaucratic nightmares never act with malice, but the people get crushed all the same.


Worms on the sidewalk.


They care, but the dominant policy in Google's calculus about what features should be released is "Don't let the exceptional case drown the average case." A legitimate SaaS providing business to customers might get caught by this. But the average case is it's catching intentional bad actors (or even unintentional bad actors that could harm the Chrome user), and Google isn't going to refrain from releasing the entire product because some businesses could get hit by false positives. They'd much rather release the service and then tune to minimize the false positives.

To my mind, one of the big questions about mega corporations in the internet service space is whether this criterion for determining what can be launched is sufficient. It's certainly not the only criterion possible---contrast the standard for us criminal trial, which attempts to evaluate "beyond a reasonable doubt" (i.e. tuned to be tolerant of false negatives in the hope of minimizing false positives). But Google's criterion is unlikely to change without outside influence, because on average, companies that use this criterion will get product to market faster than companies that play more conservatively.


Nah-- I think you've got it all wrong. The problem isn't the false positive/false negative ratio chosen.

The problem is that there's false positives with substantial harm caused to others and with little path left open to them by Google to fix them / add exceptions-- in the name of minimizing overhead.

Google gets all of the benefit of the feature in their product, and the cost of the negatives is an externality borne by someone else that they shrug off and do nothing to mitigate.


One solution, perhaps, could be to have some kind of turnaround requirement---a "habeas corpus" for customer service.

By itself, it won't solve the problem... The immediate reaction could be to address the requirement by resolving issues rapidly to "issue closed: no change." But it could be a piece of a bigger solution.


Google Safe Search is only half the story. Another huge problem is Google's opaque and rash decisions about what sites throw up warnings in Chrome.

I once created a location-based file-transfer service called quack.space [0] very similar to Snapdrop, except several years before they existed. Unfortunately the idiot algorithms at Chrome blocked it, throwing up a big message that the site might contain malware. That was the end of it.

I had several thousand users at one point, thought that one day I might be able to monetize it with e.g. location based ads or some other such, but Google wiped that out in a heartbeat with a goddamn Chrome update.

People worry about AI getting smart enough to take over humans. I worry about the opposite. AI is too stupid today and is being put in charge of things that humans should be in charge of.

[0] https://www.producthunt.com/posts/quack-space

[1] https://snapdrop.net/


Isn’t it possible your service was hosting malware and you just didn’t know? This same problem killed Firefox Send: https://support.mozilla.org/en-US/kb/what-happened-firefox-s...


Google has a lot of control of the Web.

Much less control of the Internet.

One lesson is use IP and not the Web.


> I use Linode now as their support is great and they don't just drop ban hammers and leave you scrambling to figure out what happened.

Linode once gave me 48 hours to respond (with threats to take down the site) because a URL was falsely flagged by netcraft based on what looked like an automated security scan of software I was hosting. Granted, they did not take any action and dropped the report once I pointed out that it was bullshit, but I do not consider this great service. If there is no real evidence of wrongdoing I should not be receiving ultimatums.


(Googler)

You are only focusing on the negatives while completely ignoring the positives here.

Here are a few questions to consider that may give you better perspective:

1) Do you know the magnitude of financial and psychological damage caused by malware, phishing, etc on the web?

2) Do you believe that it is possible to have a human review every piece of automation generated malware on the internet?

3) Do you believe it is possible to build an automated system that provides value with zero false positives?

4) Do you think an open standards body or government bureau would perform any better at implementing protections from the threats described here?


Author here - I don't underestimate the complexity of the task that Google Safe Browsing tries to accomplish.

But: Do you believe there is no room for improvement in an automated, opaque system with clear evidence of malfunction, that quite succinctly decides if hundreds of people go unemployed when their company tanks for nothing other than an incorrectly set threshold on some algorithm?

That is the real question to ask. Google is nowhere near its limits in terms of capability, as is made abundantly clear by its extremely comfortable financial position.


I do agree that there's room for improvement. There's always room for improvement, but there are also limits to the transparency one should provide for an anti-abuse system. It's difficult for anybody except for an expert in this area to say what would be a safe and satisfactory way to expose appeal and remediation for false positives. In the example from the story it looks like the turn around time was just an hour for your case, which seems rather good. The fact that not all consumers of this data were as responsive looks out of Google's control, and should be taken up with those companies.

I don't agree with the premise of your last question. It's not Google's responsibility to protect the internet and provide a free anti-abuse database for other browser vendors, and yet Google does do this at significant cost. The fact that they don't do it perfectly is not a rationale for killing it or providing it with infinite resources.


> It's not Google's responsibility to protect the internet and provide a free anti-abuse database for other browser vendors, and yet Google does do this at significant cost. The fact that they don't do it perfectly is not a rationale for killing it or providing it with infinite resources.

I think that's a naive perspective. Google did not create the database to be nice to other vendors, and it also did not make it available to them for that purpose.

An Internet-wide blacklist represents strategic leverage over competitors (or maybe even dissonant voices, should the need arise) and an massive source of data collection probe points. These facts were certainly brought up internally and deemed worth the risk when the massive legal liability of this product was assessed.

Therefore, because of the pervasiveness of this system, it needs to be handled responsibly. They are not doing anyone a favor by making sure it functions correctly. Google is well aware of this, because they don't need regulators and lawmakers gaining yet another excuse to try and dismantle them.


2*) Do you believe that it is possible to have a human review every FALSE POSITIVE result from automated malware detection on the internet, when reported by those adverse affected by the false positive result?

Yes, yes I do. Banks do it for their customers today at scale.


So what happens when the fraudsters automate clicking the "request review" button? They can spin up as many phishing sites as they want, and request as many human hours in review as they want.

With banks, they only have to do that for their customers, whom they've at least had a chance of getting money from. But Google would need to provide it to every site which gets blocked, (as malware sites pretend to be legitimate). Which


There are plenty of mechanisms to tackle this problem. But you have to want to care.


Your clients will hate you for this as you are creating false positives. Sure, Google is sometimes unethical, but calling them a malicious actor? Really?


Following "Consider Google as a malicious actor/threat" with "I am not saying Google is acting with malice" is probably a strong indicator that you should have thought it through before posting it.


"Consider as" does not mean "is". Your lack of reading comprehension is not the fault of the poster.


It's a relatively long article - but it does not answer one simple question, which is quite important when discussing this: were there any malicious files hosted on that semi-random Cloudfront URL? I realise that Google did not provide help identifying it - but that does not mean one should simply recomission the server under a new domain and continue as if nothing has happened!

From TFA:

> We quickly realized an Amazon Cloudfront CDN URL that we used to serve static assets (CSS, Javascript and other media) had been flagged and this was causing our entire application to fail for the customer instances that were using that particular CDN

> Around an hour later, and before we had finished moving customers out of that CDN, our site was cleared from the GSB database. I received an automated email confirming that the review had been successful around 2 hours after that fact. No clarification was given about what caused the problem in the first place.

Yes, yes, Google Safe Browsing can use its power to wipe you off the internet, and when it encounters a positive hit (false or true!) it does so quite broadly, but that is also exactly what is expected for a solution like that to work - and it will do it again if the same files are hosted under a new URL as soon as detects the problem again.


Author here. Nothing was fixed, and the blacklist entry was cleared upon requesting a review, with no explanation.


They seem to be unable to answer this question since Google provided no URL. Without knowing what is considered malicious, how could they check if there was anything? What if it is a false positive?


I am just guessing here, but in case the author had their service compromised, maybe he can't disclose the information. Feels like they know what they are doing, and at least to me, reading between the lines, it looks like they fixed their problem and they advice people to fix it too:

> If your site has actually been hacked, fix the issue (i.e. delete offending content or hacked pages) and then request a security review.


Author here. We didn't do anything other than request the flag to be reviewed.

The recommended steps for dealing with the issue listed in the article were not what we used, just a suggested process that I came up with when putting the article together. Clearly, if the report you receive from Google Search Console is correct and actually contains malware URLs, the correct way to deal with the situation is to fix the issue before submitting it for review.


Yes, I guess if you're allowing users to upload arbitrary files that may contain viruses or malware, and you're not scanning the files, that makes you a potential malware host. That's how Google may see it. They're trying to protect their users, and you've created a vector for infection.


Too bad they don't ban googleusercontent.com.


Whether or not this author's site was or was not hosting malicious content is irrelevant to the thrust of the article, which is that due to browser marketshare, Google has a vast censorship capability at the ready that nobody really talks about or thinks about.

Think about the jurisdiction Google is in deciding that they want to force Google to shut down certain websites that correspond to apps that they've already had them and Apple ban from the App Store, for "national security" or whatever.

This is one mechanism for achieving that.


If there was malicious content, the search console would have provided a sample URL. It didn't.


Our company [0] was also hit by this too.

We receive email for our customers and a portion of that is spam (given the nature of email). Google decided out of the blue to mark our attachment S3 bucket as dangerous, because of one malicious file.

What's most interesting is that the bucket is private, so the only way they could identify that there is something malicious at a URL is if someone downloads it using Chrome. I'm assuming they make this decision based on some database of checksums.

To mitigate, we now operate a number of proxies in front of the bucket, so we can quickly replace any that get marked as dangerous. We also now programmatically monitor presence of our domains in Google's "dangerous site" database (they have APIs for this).

0: https://www.enchant.com - software for better customer service


Author here. I'm not sure exactly how they actually decide to flag. Alternatively, Amazon might somehow be reporting files in S3 onto the Google blacklist.

It would seem surprising, but it's the other possibility.


> What's most interesting is that the bucket is private, so the only way they could identify that there is something malicious at a URL is if someone downloads it using Chrome. I'm assuming they make this decision based on some database of checksums.

Doesn't Chrome upload everything downloaded to VirusTotal (a Google product)?


> Doesn't Chrome upload everything downloaded to VirusTotal (a Google product)?

It doesn't, unless you opt for SafeSearch "Enhanced Protection" or enable "Help improve security on the web for everyone" in "Standard Protection". Both are off by default, IIRC. Without it, it periodically downloads what amounts to bloom filter of "potentially unsafe" URLs/domains.

On the other hand, GMail and GDrive do run the checks via VirusTotal, as far as we know - which means that OP case may have been caused by having some of the recipients having their incoming mail automatically scanned. It's similar for Microsoft version (FOPE users provide input for Defender Smart Screen), at least last time I checked.


What happens if it is a hit against the bloom filter / checksum? Would it transmit the URL so that it can be blocklisted?


https://developers.google.com/safe-browsing/v4/update-api#ch...

TL;DR is you download a chunk of SHA-256 hashes and check if the hash for your URL is there. There is of course the chance of collision but that is minuscule.


Oh I know that's how that works, I meant, does Google transmit back the URLs once it does get a hit, to protect others from downloading that file?


Why would it need to do that? To protect others from the same url, the same hash checking method should work.


The blacklisted URL in this case is found in a downloaded file from a S3 bucket.

Other people downloading the same file would get the same "protection", but in this case this goes a step further:

The S3 bucket itself gets then blacklisted. As it was a private bucket, one of the ways this could happen is that once chrome found the blacklisted URL, it sent back to Google the url (s3 bucket) where the file with the blacklisted URL was found.


The hashes of all things that match a "probably evil" bloom filter, yes.

Hosting a virus on a domain and then downloading it a few times with different chrome installations sounds like a good way to get the whole domain blacklisted...


That's why user uploads are worth some thought and consideration. File uploads normally gets treated as a nuisance by developers because it can become kind of fiddly even when it works and you are getting file upload bugs from support.

It normally isn't that much of a challenge to mitigate the issues, but other things get priorities. Companies end up leaving pivots to XSS attacks and similar bugs too.


Google has a great service for this called Checksum. You upload a file checksum and it validates it against the database of all known bad checksums that might flag your website as unsafe. The pricing is pretty reasonable too and you can proxy file uploads through their service directly.

I'm actually not telling the truth but at what point did you realize that? And what would be the implications if Google actually did release a service like this? It feels a bit like racketeering.


Real shame if this domain got blocked because of a contraband file, eh? Just pay us and we'll make sure you don't have any problems.


Ha! You got me. I was like, wow, that sounds really useful. I'd love to sign up for that, and built my app to use it, if that were the case.

But then, I realized: 1). I'd be integrating further into Google because of a problem they created (racketeering), and 2). They seem to really dislike having paying customers (even if they made it, they'd kill it before long).


And 3), they would later update their evil-bloom-filter and all of the sudden the file you paid to get verified is now an Evil File, and they blacklist you anyway.

They actually blacklist you even faster, because of course they have in their database that you have the now-evil-file.


I wonder if that could be triggered even when the certificate chain is not validated... you could MITM yourself (for example, using Fiddler) and make Chrome think it's downloading files from the real origin. In that case, an attacker could do that from multiple IPs and force Google to flag your whole domain.


Why isn't Dropbox blacklisted? Too big?


Dropbox actually provides an unique domain for each and every user - and separates the UGC from the web front code and Dropbox own assets that way - that's where the files you preview/download are actually coming from. I have no doubt a fair number of those is blacklisted.


unique TLD? that should be very costly?

or does GSB not ban the entire TLD when a subdomain has malicious content?

Would be great if our overlords at least publish the overzealous rules we need to abide by.


Dropbox DL and Preview urls take a form of https://uc[26 character hex string].dl.dropboxusercontent.com/... and https://uc[26 character hex string].preview.dropboxusercontent.com/... - it does not have to be a separate TLD to avoid being blocked, but it has to be differentiated.

This is the same reason why the block of the TFA company did not cause an outage of everyone using CloudFront - GSB does not block full TLDs if it can be shown content is distinct. Same for anyone using S3, Azure equivalents and so on.


I wonder if there's a threshold here.. When I researched this issue (while we were figuring out how to mitigate it), I did encounter some people who had their entire domains blocked because 1 subdomain had bad content. In fact, this thread itself has mention of that happening to neocities


Author here. It's really not clear what criteria GSB uses to decide at which level the ban should apply.


Probably when the ratio of bad sites to good sites at a particular subdomain level passes a threshold.


Or when a significant litigation risk is perceived, if the domain level block review is human


My Google-fu is failing me right now, but there is a list of domains like dropboxusercontent.com that are treated as pseudo second-level domains for purposes like this.

e.g. u1234.dropboxusercontent.com is treated as a unique domain just like u1234-dropboxusercontent.com would be.

Edit: here we go, from another comment - the Public Suffix List: https://publicsuffix.org/


Sounds rather too resource-intensive? I've just tried with current Chrome on Windows and a 32MB zip on my personal domain, Wireshark says the file has not been sent anywhere.


I believe there are limits on the virus checking size. You can see this when trying to download really large files from Google drive (> 100mb)


https://developers.virustotal.com/v3.0/reference#files-scan

Seems I might have just hit the limit? ... Nope, 8.1MB zip file also wasn't sent anywhere.


Wouldnt it be more efficient to grab it in parallel to your download?


That only shifts the bandwidth cost between the original server and the user, Google's resources are unaffected. And it's not what GP claimed.

I just checked the nginx access logs - both the 32MB and the 8MB zip files have been accessed only once (both were created only for this experiment).


Or you could screen your attachments for malware?


We do, but it's not good enough for Google.


Yes, the power of something like Google Safe Browsing is scary, especially if you consider the many many downstream consumers who might have an even worse update / response time. Responsiveness by Google is not great, as expected, we recently contacted Google to get access to the paid WebRisk API and haven't heard anything in a few months...

However, phishing detection and blocking is not a fun game to be in. You can't work with warning periods or anything like that, phishing websites are stood up and immediately active, so you have to act within minutes to block them for your users. Legitimate websites are often compromised to serve phishing / malicious content in subdirectories, including very high-level domains like governments. Reliable phishing detection is hard, automatically detecting when something has been cleaned up is even harder.

Having said all that, a company like Google with all of its user telemetry should have a better chance at semi-automatically preventing high-profile false positives by creating an internal review feed of things that were recently blocked but warrant a second look (like in this case). It should be possible while still allowing the automated blocking verdicts to be propagated immediately. Google Safe Browsing is an opaque product / team, and its importance to Google was perhaps represented by the fact that Safe Browsing was inactive on Android for more than a year and nobody at Google noticed: https://www.zdnet.com/article/mobile-chrome-safari-and-firef...

Lastly, as a business owner, it comes down to this: Always have a plan B and C. Register as many domains of your brandname as you can (for web, email, whatever other purpose), split things up to limit blast radius (e.g. employee emails not on your corporate domain maybe, API on subdomain, user-generated content on a completely separate domain) and don't use external services (CDN) so you can stay in control.


Of particular note:

"Don't host any customer generated data in your main domains. A lot of the cases of blacklisting that I found while researching this issue were caused by SaaS customers unknowingly uploading malicious files onto servers. Those files are harmless to the systems themselves, but their very existence can cause the whole domain to be blacklisted. Anything that your users upload onto your apps should be hosted outside your main domains. For example: use companyusercontent.com to store files uploaded by customers."


Pardon my ignorance as I have few years of web dev experience. What exactly does it mean to store data on a domain? Does he mean serve data via a domain URL? And if so, how does Google have discovery of that data?


Author here. Yes, "serve" is the correct interpretation. It is not clear how Google gets ahold of offending URLs within blacklisted domains (like the article says, there were no offending URLs provided to us).

Theories:

* Obtained from users of Google Chrome that load specific URLs in their browsers

* Obtained from scanning GMail emails that contain links to URLs

* Obtained from third parties that report these URLs


The main way is via the Googlebot crawler.

They also use user reports from Chrome, and links in "mark phishing" emails from Gmail. Those latter two cases the URL is considered private data, so won't be reported in webmaster tools.


We’ve seen internal firewalled URLs in the webmaster tools, so I’m not sure the private data works as intended.


I've seen some bot of Google's in the server logs on my in-construction not-publicly-available page, a minute after I opened the page in Chrome. That was about five of six years ago, shortly before I stopped using Chrome.


Maybe there is some kind of "if multiple users see the same URL, it isn't private" logic going on.


We’re pretty sure they get reports from Chrome. A security researcher at my workplace was running an exploit against a dev instance as part of their secops role and got the domain flagged, despite the site being an isolated and firewalled instance not accessible to the internet.


Yes, I have noticed that creating a brand new dev domain with crawler blocking norobots file, it is not found on any search on Google, until I open the dev url in Chrome, then bam! watch as their crawler starts trying to search through the site just from opening the url in Chrome.

This is why I never use Chrome. They scrape the Google Safe Browsing sent from chrome browsers and just do not care about privacy.


Maybe it's from search suggestion API? Anyway, I turn that off as soon as I create a new browser profile, along with the safe browsing list and automatic search when I type unrecognized URL. When I want to search I use search input of the browser. (ctrl+k) URL bar is for URLs only.


You realize that robots.txt is an "on your honor" system and that any one can write a script that doesn't look at robots.txt and post anything they find to the internet and that therefore other sites could find your site via 3rd party data.

Chrome does not do what you claim it does


I have trialed this several times, not using chrome and everytime I then use it, the site can be found on google. Remember, these sites are completely unlinked and fresh URLs. So, yeah it really does..


But that means they can't verify it, right? Couldn't a malicious actor use this to attack their competitors?

Add an internal DNS entry for your competitor's domain, spin up an internal server hosting some malware and open it from chrome.


We use a fair number of google products, and you can turn on a lot of enhanced protection, and many businesses do. This means even password protected / private URLs may generate scans from what I've seen. I'm not sure how they actually fingerprint files (maybe locally) but it seems pretty broad

This seems to work across a lot of google products (gmail, drive, chome etc) so it scoops up a ton.

More here:

https://security.googleblog.com/2020/05/enhanced-safe-browsi...

Not sure if this is related to safe browsing. We also can turn on more scanning and other features of all email users.

The key though, if you allow users to PUT files onto your S3 (even private / signed in) then google may scan them. That means if your user uploads a suspicious file to a trouble ticket system, if there IS a virus in there and google sees it, wham. Obviously most folks will segregate those uploads off into their own s3 bucket by user/account to avoid contamination, but you really have to be careful not to hose viruses AT ALL on your key domains.


How would you even “store” data on a domain?


Look up how DoH and ECH store public keys in the DNS system :)

Not what the author intended but DNS as a Database is a thing.


Ah yes, customer generated data sounds just like public keys


I imagine your service still won't have a great time when Google blacklists companyusercontent.com

A proper mitigation would be to serve user data from one domain per user, no?


“Don't host any customer generated data in your main domains. ”

This is extremely important for multiple reasons. One reason is the blacklisting as mentioned in the article, the other reason is security: browser typically implement security policies around domains as well, such as cookie scoping and whatnot. Putting all user generated content under a completely separate domain avoids a whole category of potential issues.


How do you do this in practice though? Let's say my marketing site is at turtlepics.com and then the pics, captions, feeds, etc are served off of turtlepicscontent.com.

So I can serve my app off of turtlepics.com, that's fine. But it can't load any content directly. I'd have to have a separate <script src="https://turtlepicscontent.com/feeds/erik"> or whatever that loads a user's feed. But that needs to be authenticated too, so I have to then authenticate the user on that domain (https://turtlepicscontent.com/feeds/erik?onetimekey=a9e79c58...) as well, at which point the credentials are present in the unsafe domain's cookies as well, and the jig is up.

Or do you continually generate fresh one time keys in the safe app, so that you don't need cookies on the content domain?

Even then, someone can still bring down the entire turtlepicscontent.com domain with malicious content. Which... well, at least your marketing site and your login still works. But the site is still fully down at that point. I guess that's better than nothing, but still pretty annoying.

Or is the idea just to wall off uploads specifically, but continue serving text off the main domain, presuming you're sanitizing text correctly?

I guess you could have some fallback to a safe-ish domain with an older read-only backup of the content database? Still not ideal. I guess sharding your users onto multiple domains based on account age might help a bit too.


You don't necessarily need to authenticate users on that domain with a cookie. An HMAC token would be ideal, because you don't have to maintain state.

Don't hardcore the content domain. In case the content domain gets flagged, it should be easy to change to a new domain.

The assets themselves (such as images, scripts, etc) can have any browser cache expiration time. HTML documents cache duration will matter, and once that has elapsed, browsers should start to use the new content domain.


*hardcode


For example, if someone manages to upload HTML and trick your system into serving it with a content type that browsers will interpret as HTML, then they can modify or exfiltrate your user's cookies. This could allow impersonation attacks, XSS, etc.

(Disclosure: I work for Google, speaking only for myself)


Avoids the issue until your ugcweb.com is blacklisted and users who uploaded clean ugc are blocked from the portal.


You upload action is hosted on a different domain from the domain that serves the content.


Yes, but when Google blocks either domain, your webapp will still be broken...


And in 10 minutes you grab a new domain and it's back up. You change the config setting in your app to use the new domain and, boom, done.

That's the point, it's a sacrificial domain. If you lose it you don't care, it's not your brand.


I learned the hard way that other companies than Google also contribute to the blacklist. A site I was working on got falsely flagged by netcraft.com (which they admitted after I spent a week explaining it to them). They do some kind of active AI cyber defence bollocks and have netflix as a customer. Their Automated Idiot classified our login page as trying to phish netflix.

The fun part of this is that I could have prevented this if I had seen the warning email that Google sent me, but since Gmail classified it as an email phishing attempt, I never saw it (straight to spam folder). How ironic.

Consequences:

- Our website was blocked in all major browsers, not just chrome

- AWS, who also look at the blacklist and were contacted by netcraft automatically, threatened to delete our account. I had to convince both parties that we did nothing wrong

- One week offline


If their claim is false, then is it, in any jurisdiction, libelous?

Maybe, legislation to bring consequences for false claims will help ensure algorithms, and the support teams that monitor them, do a better job. In an internet focused world, especially one with lock downs, wiping sites off of the internet with false claims is a heinously bad act.


I'm unsure whether it would be ruled libel, but I lean towards yes it would. There are two ways of seeing it:

1) It is a false statement by google themselves (so no §230 protection) that caused material damage and is thus libelous 2) It is an opinion protected under free speech, and the free behavior of a private company, and the words like "may have" show it is not a statement of fact, and "deceptive" is just an opinion.

Yet it feels very wrong and definately Google's fault, and Google should be responsible for the damages, morally speaking.

It's more than just a false statement, the pop-up is keeping users from visiting the website. However, Google doesn't intend to harm these companies in order to gain competitive advantage, it just harms them accidently, so the monopoly argument also has problems.

It seems to me that we need a new law, or that current jurisprudence has let this one slip through and perhaps there will (in America) never be a proper crime for this situation due to divergent jurisprudence in this space that left open this gap.

I would like to know whether it has been tested in court, or if anyone is in process of doing so.


As of today there are no legal protection framework for digital services.

Banking is heavily regulated , you are protected by hundreds if not thousands of laws.

For digital services ? Twitter and Google can legitimately suspend ALL your accounts because you liked a Trump video on YouTube or Tweeted something « Hateful » to Biden.

You can try to go court. You will loose 100% of the time. They are private businesses operating within their own terms, there is not « false » flag or wrong « ban »

They’re private businesses offering a free service, they can cease to offer that at any moment that they want.


In this case they do not provide a service to the OP. There is no agreement between OP and Google.

This is happening on browsers of their customers. And I'm quite sure that if Google hits a company that competes with Google services there must be a law that they will be breaking.

There was a big case in Poland where Google blocked a SaaS web shop provider using the same exact mechanism [0]. Polish courts decided that Google claims displayed on block page were untrue. Unfortunately, the suing company did not receive compensation, because Google Poland does not operate Chrome browser. The court indicated that the right party to sue is Google incorporated in USA...

[0] https://www.silesiasem.pl/iai-przegralo-proces-sadowy-z-goog...


Aside from abusive dominent position there is no law they would break.

When you download and use chrome you ACCEPT the Terms and Conditions of Google.

There is no law that prevents a web browser from blocking access to a website or modifying the page . If the TOS stipulate « pages may differ from the original or be subject to third party software » , they are in within their rights and the customer accepted it when he started using the product.

Don’t get me wrong. I’m on OP sides and everything , but we have let big tech become too big by giving us free stuff for decades.

Now they they decide what’s good for us or not with side effects that often damage small business.

But I insist that in 99% , they operate within the law.


I'm not a US citizen, but just 5 min of scanning US laws makes me think that there are basis for a lawsuit.

Essential facilities doctrine seems to be appropriate: https://en.wikipedia.org/wiki/Essential_facilities_doctrine


Google has already been condemned by UE for this type of issue for 150M. That's literally nothing , at least not enough to hit their wallet.

This type of battle take months if not years in court and cost millions of dollars.

Google products can be shipped and removed in a few weeks , far beyond the reach of operation of the current judiciary system.

Today the problem is GHS , tomorrow it'll be "GSuite Safe Account" or "Youtube Safe Video" etc...

There is no point in taking Google to court just for a "one time" condemnation, it's a systematic issue that is tied to Google itself.


EU is working on addressing these issues. They specifically want to label companies like Google as "gatekeepers" in The Digital Markets Act.

From: https://www.pinsentmasons.com/out-law/news/gatekeepers-face-...

> In case of violations, the new law would provide for fines of up to 10% of a gatekeeper's total global turnover. Lasseur said: "The fines are high, but they correspond to the usual sanction regime for violations of European competition law."

> In the event of a repeated offence, following a detailed market investigation by the Commission, the company could even be broken if the Commission finds that there is no effective alternative to ensure that the gatekeeper obligations are complied with.

The fines will no longer be just a cost of doing business, but an existential threat. It may take 10 or 15 years in courts but EU will break down likes of Google for repeated offenses even if the relevant products will no longer exist.


You ignored the issue of libel, though. Do you have a reason it's not?


There may be a number of civil causes of action available...

But litigate against a multi-billion dollar tech company? good luck.

These companies are borderline immune to prosecution by the government, much less a small business.


This seems like something the FTC should be looking into, abuse of market position.


I can confirm everything that was said in that article. I run a free dynamic dns service (freemyip.com) and every time someone creates a subdomain that later hosts some questionable material, Google will immediately block my whole domain. Their response time for clearing these up varies from a few hours to two weeks. It feels completely random. I once had a malicious subdomain that I removed within two hours, yet the ban on Google lasted for more than two weeks. Now, this is a free service so bans like these don’t really matter that much to me, but if it was a business, I would have most likely gone bankrupt already.

I noticed that recently, they are only sending me the warning, but don’t block me right away. Perhaps after a few years of these situations I advanced to a more “trusted” level at Google where they give me some time to react before they pull the plug on my domain. I don’t know. But I would be truly petrified of Google if this was my real business.


Have you considered requesting that your domain be added to the public suffix list? https://publicsuffix.org/

If subdomains of your domain should be treated as independent sites, the public suffix list is (sadly) how you communicate that to browsers.

(Disclosure: I work for Google, speaking only for myself)


Fascinating. I had never heard of this, and cloudfront.net is in there, which might provide a clue as to why Google only blacklisted our subdomain and not the whole thing (imagine that!).

Is there any downside to being on this list?


> Is there any downside to being on this list?

If example.com were on list then a cookie set on a.example.com couldn't be read on b.example.com. In this case that would probably be a good thing, since the subdomains represent independent sites, but if a site were erroneously added that could be a problem (mail.yahoo.com and groups.yahoo.com should share login cookies, for example).

The list was originally created to handle cookies, but more recently it's been used for other notions of "site", like cache sharding.


This is the first time I hear about https://publicsuffix.org/ Will definitely check it out. Maybe that will help me solve this problem. Thanks a lot!


> the public suffix list is (sadly) how you communicate that to browsers

Sadly, indeed. Had they never heard of DNS?


How would you propose handling this with DNS? Here are some things it covers:

* a.example.com and b.example.com are the same site

* a.co.uk and b.co.uk are not the same site

* a.cloudfront.net and b.cloudfront.net are not the same site

* a.higashikawa.hokkaido.jp and b.higashikawa.hokkaido.jp are not the same site

* a.example.higashikawa.hokkaido.jp and b.example.higashikawa.hokkaido.jp are the same site

There is a proposal to do something similar using response headers and .well-known urls: https://github.com/privacycg/first-party-sets


  _i_am_tld.cloudfront.net IN TXT "yes"
  _i_am_tld.higashikawa.hokkaido.jp IN TXT "yes"


This requires sites to opt in before it works, right? I think this would have been hard to introduce, because it requires so much coordination.


Isn't opting in how almost everything got on the list?


No: Mozilla wrote the initial list based on their understanding of TLDs, and they maintain it based on a combination of opt-in and people noticing that domains should be on the list.

Have a look: https://publicsuffix.org/list/public_suffix_list.dat


Author here. This is fascinating because I figured Google would definitely not ban cloudfront.net entirely and that's why they blacklisted the subdomain, but had this been hosted on our actual company domain, would we have been spared?


1- Ban self dealing.

Even the appearance of a conflict of interest should be treated as an actual conflict of interest.

Among all the other countermeasures being considering, breaking apart these monopoly's end-to-end integrations should be top priority.

For comparison: I'm a huge Apple fan boy. I'm in a happy monogamist relationship with Apple (h/t NYU Prof Scott Galloway).

There's no question their awesome products are largely due to their keiretsu, monopsony, and other anti-competitive practices. So despite my own joy, I also support breaking up Apple, for the greater good.

The same applies to Google's offerings. Google Chrome cannot be allowed to operate without oversight. Once a product or service becomes an important pillar in a market, it must be held accountable.

2- Fair and impartial courts.

Governments make markets. Google (et al) act as sovereign governments running their private markets. This is unacceptable.

We all must have the right to negotiate contracts, appeal decisions, and other misc tort. To be adjudicated in an open, fair, impartial courts overseen by professional and accountable judges.

In other words, I demand the rule of law.

Again using Apple as my example. As a customer, I benefit hugely from Apple's App Store, where they vet and curate entries. This is awesome.

But Apple must be held accountable for all of their decisions. All participants must have the right to sue for damages. In a fair and impartial court system, independent of Apple's total control over the market.

Similarly, however Google is administrating the Safe Browsing infrastructure, it must be transparent, accountable, auditable.

--

I'm still working on this messaging, phrasing. Criticisms, editing, word smithing much appreciated.


> Criticisms, editing, word smithing much appreciated.

My loose thoughts, feel free to use. (Reordered 2 before 1.)

2. In any bigg-ish privately regulated market, the membership needs to be based on public, objective rules and under a real jurisdiction. If you paid and obeyed the regulations and have been banned/mistreated, you can sue.

1. For any market, if a company (Google or other) has a clear majority of it, they have additional responsibilities.

"Customer is free to go away to our competitors" does not tell a full story (illustrated by OP). The cost to switch is the real deal here.


Must be held accountable -> must be operated autonomously from the rest of the business


Yes. I'd like this better explained. Those "Chinese firewalls" meant to keep biz units apart always seem to be completely fictional. Ditto "self policing".


one of my apps my company makes is a chat app, when someone clicks a link in chat, we bounce them to a URL redirect page ("Warning, you're leaving $app, don't enter your account password/information phishing warning" type page) with a button "Continue to $url" - We also have a domain blocklist to block known phishing sites for our app. Because of this, Google blocked our entire domain due to malicious urls (the "This link was blocked" page) It took us weeks to get it unblocked. Just an utter pain in the butt. We're an established business, but having our entire website blocked by Chrome for weeks nearly killed the entire app.


> I received an automated email confirming that the review had been successful around 2 hours after that fact. No clarification was given about what caused the problem in the first place. ... We never properly established the cause of the issue, but we chalked it up to some AI tripping on acid at Google's HQ.

I expect more of this Kafkaesque experience to come in the future.

This is no longer a technical problem, but a social one. It can only be solved through legislation.


Author here. The second time around, the review confirmation email took around 12 hours to get to us.


Thank you for posting all the info here. And I’m glad that you managed to fix the problem. I think it must have caused you a lot of stress.


Yes, this was a massive headache and we got very lucky with the timing of the incident and the blast radius of the system in question. I can't really say the issue is fixed so much as it is mitigated, hence the writeup to gain some awareness. Some of the other comments have valuable anecdotes too.


This reminds me of email blacklisting. When I was "young" I operated an email server for 6000 users. Keeping that server and our domain away from blacklisting was a full-time job.

It wasn't enough to secure your server: Any spam or virus coming from the internal network through that email server could potentially blacklist us. Basically, you had to treat your users as untrusted, and run anti-spam and anti-virus filtering that was as good as whatever the rest of the Internet was running.

IIRC, although blacklisting was done by non-profits, it was still rather opaque: Blacklisting should be traumatizing, so that you (and your higher ups) are forced to do a proper risk assessment and actually implement it. It was also opaque to make it harder for the bad guys to move quickly.

I hate the increasing influence that big tech has on small tech. But keeping web and email safe and clean is a cat-and-mouse game, which, unfortunately, also adds burden to the good folks.


today Microsoft is the worse. It blacklists your ip from unsuspecting customers using outlook, live.com, etc.. and there is no way to recover from it without becoming yourself a customer. it's vicious because the users of their products are mostly businesses and they are acting as a gateway for doing business with them.


Definitely annoying. But how much is this anti-competitive business practices, and how much is this "raising the bar for the bad folks". Unfortunately, the latter inevitably adds burden to good folks too.


The section about ants and Google shifting on its planetary chair is perhaps the best part of this article. A sobering way to look at it.


I run https://neocities.org, and safe browsing has been my nightmare overlord for a long time.

No way to manage reports via an API, no way to contact support. I haven't even been able to find a suggestions box, even that would be an upgrade here. Digging to find "the wizard" gets you into some official google "community support" forum where you learn the forum is actually run by a non-employee lawful neutral that was brainwashed somehow into doing free work for one of the wealthiest companies in the world. A lot of the reports are false and I have no idea how they are added (this would be an excellent way to attack a web site btw).

Google will sometimes randomly decide that every link to our over 350,000 neocities sites is "malicious" and tell every gmail user in a pop-up that it is dangerous to go to a neocities site. Users are partitioned to a subdomain but occasionally google will put the warning on the entire domain. It's not clear if it's even the same thing as safe browsing or something completely different, and this one doesn't have a "console" at all so I have no idea how to even begin to deal with it. When users complain, I tell them I can't do anything and to "contact google", which I'm sure just leads them to the same community support volunteer.

We actively run anti spam and phishing mechanisms, have a cleaner track record on this than google themselves with their (pretty neglected) site hosting, and because we block uploads of executable files, it is literally impossible for users to host malware on our servers. It is also impossible to POST form data on our servers because it's just static html.

None of that matters. Occasionally we also just get randomly, completely soft-blacklisted by safe browsing for no reason (they call this a "manual action", there's never any useful information provided, I have no idea what they imply and I live in fear of them).

If things ever got extremely horrible, I used to have a friend that worked at google but she no longer works there (I hated using her for this). The other person I knew that works at google stopped responding to my tinder messages, so I'm pretty much doomed the next time they do something ultra crazy and I need emergency support.

It's extremely frustrating and I'm hoping for the day when something gets better here, or they at least provide some way to actually communicate with them on improving things. In the meanwhile, if anyone happens upon the wizard at a ski resort or something, please have them contact me, I have a lot of improvement ideas.

edit: Just to add here from a conversation I had a year ago (https://news.ycombinator.com/item?id=21907911), Google still hasn't figured out that the web is their content providers and they need to support them, and treating their producers with contempt and neglect is a glorious example of how shortsighted the entire company is right now about their long term strategy (how many ads will you sell when the web is a mobile Facebook app?). They should as soon as possible, as a bare minimum, start providing representatives and support for the content providers that make people actually use the web and help them to be successful, similar to how Twitch has a partnership program.


> If things ever got extremely horrible, I used to have a friend that worked at google but she no longer works there. The other person I knew that works at google stopped responding to my tinder messages, so I'm pretty much doomed the next time they do something ultra crazy and I need emergency support.

Doing good for the sake of the web, even while dating, that's some next level dedication. :)


Sad nerd seeks third to join 24/7 D/s relationship with my internet daddy

(just lightening the mood a bit, but it's a true story. I wasn't asking for google support)


Hi Kyle,

I still have the reocities.com domain, would you like to have it? If so I'll be happy to donate it free of charge.


From the Geocities archive? Yeah actually, send me an email let's talk about it.


Yes, indeed. Ok, will do.

Edit: sent.


Just want to add that neocities is a cultural treasure and I appreciate the work you put into it! I'm sad to hear that Google "Safe" Browsing once again rears it's ugly head blocking legitimate websites, yet I still see scams and phishing show up on ad sponsored links for Google search results.

I could foresee in the future all of us having to pay the toll so our hosted websites are considered "safe" too...


Thanks!

Yeah I don't think Safe Browsing shouldn't exist, but it definitely needs some improvements and feedback that's appropriate for how incredibly powerful/dangerous it is.


> The other person I knew that works at google stopped responding to my tinder messages.

This has to be the best anecdote for Google's broken tech support that I've ever heard. :)


Since you are putting different users on different subdomains, have you considered asking to have neocities.org added to the public suffix list? See my response to the person who runs freemyip.com and has the same problem: https://news.ycombinator.com/item?id=25804371


If Google is falsely claiming your malicious and its harming your business it seems like a pretty clear case of slander/tortuous interference.


I'm sure he could get a lawyer better than Google's.


I don’t think it would be as lopsided as you envision.

You might even be able claim strict liability standard since it’s an allegation of professional fraud.

Meaning the standard for proving defamation might be substantially lower than normal.

I’d guess google would settle in the blink of eye unless they had some basis for the claim. And “computer says no” would not cut it in court.

Could still be expensive but not bankruptingly so.


At least for Facebook and Twitter, self-writing and faxing a C&D notice to the legal departments usually helps with getting accounts unbanned.


Curious why this is downvoted? Seems like the first step any counsel would take in this scenario.


I mean we could certainly use the money obviously, but it's not really my goal to sue Google (I probably can't afford it anyways). I just want them to improve. I see them as a partner and only ask that they see us as the same. They certainly have the resources for it.


I honestly wonder if you could take them to small claims court...


> because we block uploads of executable files, it is _literally impossible for users to host malware on our servers_

How does this stop bad actors from exploiting bugs in e.g. V8 with malicious JavaScript?


> How does this stop bad actors from exploiting bugs in e.g. V8 with malicious JavaScript?

You're correct: it doesn't. Blocking executable files aren't enough. Javascript files, zips containing executables, malicious Word files...all of these are vectors.


Look at the file types gmail blocks from being directly attached to emails for a comprehensive list.


I wonder if it would be faster to deal with this through legal. I’m not a lawyer, but I wonder if you could send a C&D to Google legal or something because this seems like an actual case of slander and reputation damage.


To any lawyers or even well-read armchair legal analysts, could this be a case of libel?


If your systems have any number of nines in their SLA, drafting a letter to Google's legal department is not a viable strategy.


If you are a big enough company your lawyers could have a stern but relatively friendly chat with Google’s lawyers.

I can neither confirm or deny this myself...


Yeah my thought behind this was you are a large enough or wealthy enough company that you can afford lawyers. If you are an individual or mom and pop business whose blog or small e-commerce shop are blocked then you are probably SOL.


Once you enter litigation with Google, good luck accessing your Android.

You may believe this is extreme, but many people have had their Gmail account suspended without known reason. So if they also have a reason...


So de-google first, then sue.

Otherwise you might as well give up and conclude that google not just controls the internet but is also above the law.


I provide Windows builds of ffmpeg, linked via http://ffmpeg.org/download.html. The site is entirely static, no user data is collected or stored.

Starting in late October, lasting for around a month, users would get the dreaded red page upon visiting the site at https://www.gyan.dev/ffmpeg/builds/

Search Console would show a couple of files as 'install malicious or unwanted software'. Never mind that all files are plain archives (7z,ZIP) with no installers or even self-extraction, containing CLI apps. These file URLs when scanned via Virustotal (Google-owned) would be flagged by Google Safe-browsing and no other engine. Weird thing is, the same files mirrored at Github would be detected as clean. A review request at SC would get rid of the warning temporarily only to return after a day or two.

I found no support email so I opened a thread at Google Webmaster community (now called Search Central community). But there was no help and none of the regulars seem to be Google employees. Finally, I found an email through Mozilla's page on their use of Google's Safe Browsing blacklists at https://support.mozilla.org/en-US/kb/how-does-phishing-and-m... which leads to https://safebrowsing.google.com/safebrowsing/report_error/?t.... This page's title is 'Report Incorrect Forgery Alert' which would indicate a different purpose but I managed to get hold of human attention. After 10 days or so, the warnings disappeared. Till date, I don't know what triggered the warnings in the first place, and so how to prevent a recurrence.


We got hit by this as well. Very similar story to this and others shared in this thread: Use an S3 bucket for user uploads - and Google then marks the bucket as unsafe. In our case a user had clicked “Save link as...” on a Google Drive file. This saves an HTML file with the Google login page in some cases (since downloading the file requires you to be logged in). The user then proceeded to upload that HTML file. Then it was automatically marked since it looked like we were phishing the Google login page.

It should be noted that Firefox uses the Google banlist as well so switching browsers does not work!


We seriously need to break up Google. This is a chokepoint for innovation, should not be controlled by one company, and has serious downstream consequences on economic growth as a nation.


As a planet you mean.


"as a nation"?


I think another take away from this article is “don’t allow users to upload malicious files that you then host from your domain”

This seems easier to do than jumping domains.


> I think another take away from this article is “don’t allow users to upload malicious files to your domain”

I disagree, at which point did we all accept Google's role as defacto regulator and arbiter of the Internet? Why should we tacitly accept the constraints they deem as appropriate and modify the way we build the web?

In other words, those are our domains, our apps, our systems and we'll do as we please; that includes worrying about content moderation, or not.

When and why did we accept google as the Internet's babysitter?

Apologies if this sounds aggressive, but your takeaway reflects an appalling and quite fatalistic mindset; one which I sadly believe is increasingly common: big corporations knows best, big corporations say and we do, big corporations lead the way.

On the other hand, probably I'm just biased and tired considering how tiresome it's been to explain to my friends and family why Signal is the better alternative after the WhatsApp/Facebook fiasco.

/EndRant


When you installed their browser.


I didn’t install their browser.


Your users did decide to use it, though - and this particular feature is one of the reasons why that particular browser if popular. It was one of the major differentiators of the "better" browsers in the sad old IE days.

For all you "use Firefox [etc], don't use Chrome" pundits: it also uses Google Safe Browsing [0], and for that matter so does Safari, which may compound it by using Tencent version instead if you happen to be in China [1]

[0] https://wiki.mozilla.org/Security/Safe_Browsing [1] https://support.apple.com/en-us/HT210675


This is a weak take. Are we saying that any feature built into a web browser is desirable by virtue of the products popularity? 99% of chrome users use it because they recognize the interface from school laptops. Do you really want to live in this world where massive corporations can put whatever they want in their products and the justification is “yeah well people still downloaded it?”


No, we are saying that a site owner should not get to choose which features of the browser the users decide to use. It's the same reason why HN is dogpiling on any site that announces "Only works in Google Chrome", "Best viewed in Safari" or, for older users, "Designed for IE".

One of the reasons why users decided to jump ship to browsers implemneting more advanced security features (which invariably including some sort of malware/phishing actors filter) was the realisation that even a site that has been safe to visit before may serve you malicious content. PHP.net, for instance, was compromised in a way that is eerily similar to what the author here describes - JS files were variably serving malware depending on certain conditions [0], and the first warning anyone got was GSB blocking it. You can read and compare the outrage that 'it can't be true' that particular blocking has caused at your own convenience [1].

Whilst you can convince the users to jump ship to some fringe browser that does not use the technology (and I do invite you to try to find one which does not use either Google, Microsoft or Tencent filters and has at least 0.1% of global usage!), it is a losing proposition from the start. The take is: the vast majority of users is actually comfortable and happy to get this message, as long as they can trust that it is warranted.

Should filters be hosted and adjusted by a major technology company like Google? Probably not, and some indepdendent non-profit hosting them (for the sake of the argument, even StopBadware that kick-started the whole mess [2]) would be welcome to try to take that responsibility. But the filters are here to stay until we come up with something better as a solution.

[0] https://news.ycombinator.com/item?id=6604251 [1] https://support.google.com/webmasters/forum/AAAA2Jdx3sUpuLmv... [2] https://www.stopbadware.org/


The problem is that the process is opaque so you aren't even given a hint as to why the site is blacklisted. Security filters, fine, but at least tell the developers what the violation is so that it can be fixed. It's the same in the play store controversies, the developers aren't told what's wrong, the app is just taken down. This lack of transparency is the real issue.


> 99% of chrome users use it because they recognize the interface from school laptops.

The implication that less than 1% of Chrome users are old enough that Chrome didn't exist when they were in school is laughable.

Also, if that kind of familiarity rendered feature comparison irrelevant, Mosaic would still have a healthy share of the browser market.


Sorry, but you don't get to tell me I am obligated to browse your site without being notified if you have malware.


You are not obligated to browse anything. In fact, you as a human is obligated to very little. Perhaps keeping yourself alive (which somebody might even oppose as an obligation).

If you enter at site that hosts articles on malware and it allows you to download the malware assets to play with for yourself, you should be a fool for not understanding that the site hosts malware and is not adversarial.


Assuming that this site "serving malware" isn't doing it purposely.

What if someone made a site that inspected malware and went in depth on how it worked and allowed you to download the malware to inspect yourself so you desire. Google would flag this site as bad and blacklist it, but in reality it's a research site.


There are standardized ways to share malware downloads. Google likely respects them.


What is that standardized way?

Encrypted zip files with the password listed on the website is the easiest one that comes to mind. I wonder if googlebot will some day decrypt those files because a lot of pirated software is distributed in encrypted zip files. Scanning those files for viruses would be pretty useful for the average user.

I guess captchas are the only bulletproof solution


Usually people use a zip file with the password "malware".


Might have, judging from this story.


Pretty sure the main point was a private company can effectively delist you from the internet without any rhyme or reason. Most of us have heard Google horror stories when you use their products the fact you can be free of them and have any new customers bounce from your sight in terror is uh, terrifying.

I would like to emphasize of course they have good stated reasons for warning users before accessing websites. The issue is that they are a private company whose behavior affects all major browsers and (for kicks) they have an extremely opaque review process.

If you ran a "divest from Big Tech" website which started gaining steam they could delist like this and the only real force stopping them is public backlash. If you think you can effectively sue Google to stop them I have a bridge to sell you.


Author here.

That is definitely a good idea, and I recommend it. But that should not be the main takeaway.

In our particular case, that was not found to be the problem (we think it was some sort of false positive), and there are valid reasons for users to do that anyway (upload a phishing email attachment onto an IT support ticket, for example).


I think the author highlights the main issue at the end of the article. This is where pressure needs to be applied. I get it, Google’s process probably protects a lot of end users from malicious sites. Getting a real business added to this blocklist by a bot though is not cool. Perhaps a process to whitelist your own domains if this power can’t be wrangled from Google.

> Google literally controls who can access your website, no matter where and how you operate it. With Chrome having around 70% market share, and both Firefox and Safari using the GSB database to some extent, Google can with a flick of a bit singlehandedly make any site virtually inaccessible on the Internet.

> This is an extraordinary amount of power, and one that is not suitable for Google's "an AI will review your problem when and if it finds it convenient to do so" approach.


> Getting a real business added to this blocklist by a bot though is not cool.

Real businesses can (and often do) host malware too. There was a notable event where php.net was hacked and hosting malware, which Google flagged. The owner of php.net was pretty mad at first and claimed it was a false positive. It wasn't.


Not to mention thousands and thousands of unsecured Wordpress and other similar systems which were turned into malware delivering botnets.

At my local faculty there were at some point not less than 6 different malware serving sites (Wordpress, Drupal and some similar unpatched sofware), which were happily delivering all that data from a university domain.


Right, I’m not saying they aren’t a risk. I’m suggesting that if a real business is whitelisted that a automated process shouldn’t be allowed to blacklist it without some type of human interaction.


Easier?

What's the easy way to distinguish between "malicious" and "non-malicious" files?


Being completely blacklisted is very bad, but u know at least that something needs fixing. Imagine if google partially punishes u and downrank you in the search for no reason. This is harder to figure out. It took us several months to discover such a problem until finally we registered to google websmaster tool.


What are you talking about? The article said that they didn't change anything, because they found nothing wrong with the site. The ban from google was totally random without any explanation. And it went away without any changes or explanations about what was wrong.


what was the problem?


> Proactively claim ownership of all your production domains in Google Search Console.

That's one of the first things you should do, when registering a domain and setting up a website. It takes about 2 minutes. So I wonder a bit why a business of this size would learn doing this through such a crisis.


This is sad. When you open a business in the real world, sure you have to tell the authorities about it (because it's the law!). When you open a digital business, you have to tell Google (via Google Search Console) about it... But Google is not the law, not even an authority; it just happens that Google owns google.com and Chrome and that makes Google the de facto Godfather of the internet: if you don't comply, your business is practically dead. Again, sad.


Author here. The impacted domain was a Cloudfront CDN subdomain with random characters in it, not company.com (thankfully!). I doubt anyone signs up for Search Console on that type of domain that they don't even really own.


Is there any reason that Google couldn't, or wouldn't, repurpose Google Safe Browsing to blacklist sites that are "unsafe" due to under- or poorly moderated content? E.g. doing this to Parler after they find hosting again? I can't think of a reliable one.


There's a very obvious reason not to do that: if you apparently maliciously cry wolf a few times, people won't trust your cries any more, and, for example, other browsers might choose to stop using the Google Safe Browsing list.


No, I don't think that's how it would play out.

1. Google bans parler.com on Jan. 8th by adding it as an "unsafe URL" to their blacklist.

2. Mozilla issues statement: "While we don't believe it was prudent to use the Safe Browsing blacklist for this purpose, given recent events, we will not be unblocking parler.com, and do not currently deem it necessary to maintain a separate safe browsing list."

3. Something similar happens a few months from now, and this time there's no statement from Mozilla or Microsoft. It has now become accepted that blacklisting less-moderated social media, which can cause real-world harm, is a normal use for the Safe Browsing list.

The problem is, if a mainstream browser goes against the flow, it becomes "The Nazi Browser." Its market share was already less than Chrome's, and now it's getting all these new users who are outcasts. This is a Hard Problem of moderation in a small market. You can't be the one out of three players who moderates less, lest you be overwhelmed by undesirables and less-desirables.


I can't tell if this is true or not - was parler.com actually blocked with this mechanism?


No, they were taken down by their cloud provider and by the two mobile app stores. My story was hypothetical, though disturbingly the companies involved don't entirely change when you talk about a take down from a different layer.


I guess just entirely inventing the slope and start points as well as a predicted trajectory is a new achievement in "slippery slope" arguments. Congratulations.

More seriously, maybe invent imaginary third parties rather than arbitrarily assigning your imagined bad motives and awful consequences to real people who did none of what you've suggested?

Google could, if they wanted, just add a new category to Safe Browsing. They could call it "Arbitrary censorship" or "Nazis are bad" or whatever you want. There are already several categories which even use slightly different parameters for the core technology so this wouldn't substantially change the system and yet would add much more flexibility if you wanted (as you might well) to protect against Phishing whether from Nazis or not, while still visiting a popular web site organising the overthrow of American democracy.


How is talking about mechanisms for taking parler.com offline "entirely inventing the slope"? It was taken offline by its cloud provider and its apps were removed. Google was even involved in the takedown. Nothing outlandish is being discussed here.

As for "bad motives and awful consequences", what are you talking about? Is wanting to take parler.com offline an objectively "bad motive"? Is succeeding in that endeavor an "awful consequence"? This is the heart of the problem: Weighing consequences is hard when faced with real threats. So when the two consequences are "parler.com becomes inaccessible" and "the integrity of the Google Safe Browsing URL list is slightly compromised", I think it's at least possible that executives would decide to compromise the list.


The problem is, if a mainstream browser goes against the flow, it becomes "The Nazi Browser." Its market share was already less than Chrome's, and now it's getting all these new users who are outcasts.

This whole problem only started because browsers stopped being neutral to the content and basically adopted the harmful "if you're not with us, you're against us" stance that seems to be propagating through everything these days. None of the "smaller" browsers (and I mean smaller than Firefox - the Dillos, Netsurfs, and Lynxes) do anything like this.


Author here. I think it's too late in the cycle for that. This list is too widespread and anyone that is banned from it needs to immediately work around the issue somehow, therefore reducing the visibility of the problems.


So what would they use instead? It's not like there are any other free, real-time and mostly accurate malicious-URL databases around for people to plug into their browsers and products.


Nothing at all. Many people survive exposure to the internet without being protected by corporate firewalls, think-of-the-children filters and antivirus.

Or do we expect UK citizens to curl up in fetal position and start screaming as soon as they leave their country because they're no longer protected by their ISP filters?


As someone who tracks phishing pages I would disagree. The amount of really high-quality fast flux phishing put out every day on completely legitimate-looking domains is astonishing. I know plenty of people who would immediately fall for it, and I wouldn't blame them one bit.


I don't doubt that phishing exists, but it's still a tail risk, it's not like the majority of the internet population got scammed 24/7 before google stepped in. So if google were to abuse that power then we could choose living with an increased risk instead of trusting them. At least until another solution is found.


Perhaps “comes the hour, comes the man” would apply? It's a difficult problem, but if there was an urgent need for a solution, I'm sure one could be found.


I would agree, but "apparently maliciously" is too subjective.

According to US conservatives this is what Twitter, Facebook, Amazon, Google, Apple, Twilio, Snapchat, etc all did to Parler for political reasons.

According to US progressives/liberals it was absolutely not malicious, but rather the polar opposite: protecting people.

These days there is no common agreement on that stuff, and given the recent events I see no reason to believe that they wouldn't do as GP asked.


Sounds like a full inversion of terms "conservative" and "progressive/liberal" has happened?


Indeed, although I suspect it's just because of the politics here. If Parler had been a progressive/liberal haven conservatives would support censoring while progressives would be outraged at the violation of free speech.

The reason I think this is that's what happened with "private companies can do what they want." Giant corporations imposing their values on individuals is not a problem for progressives when it's big tech. Likewise Conservatives don't seem to support private property rights and no regulation anymore.


What other browsers? Almost all users of SB are using Chrome.


Firefox and Safari. I know, Chrome is huge these days and it's a problem, but it's not like anything can be done about Chrome.


As @gomox alludes in his article, Firefox uses Google Safe Browsing API.


That's the whole point of my comment, yes?


Users would start to ignore the warnings and proceed anyway, or even turn safe browsing off.


McAfee SiteAdvisor recently started flagging the website for my open source project https://datasette.io/

"slightly risky" due to being a "Technical/Business Forums" and a PUP - "Potentially Unwanted Programs

I submitted a review a few weeks ago and I just checked and it's green now, which is a big relief. https://www.siteadvisor.com/sitereport.html?url=datasette.io


So, essentially they let someone host malicious content on their CDN, which led to Google blocking it. I don't see the scandal here. Also, it seems Google fixed the issue within 2 hours, which is quite good TBH.

There are many open-source & commercial IOC lists in distribution from vendors like Crowdstrike, Team CYMRU etc., a lot of them are being fed into SIEM systems, firewalls and proxies at companies. If you happen to end up on one of these lists it can take months or years to clear your reputation.


If you're going to comment that they did something wrong, you should consider reading the article and notice that the safe browsing flag didn't mention a URL and the block was removed without any follow-up once they requested the removal.


> losing access to their GMail accounts and their entire digital life.

This is why my email address is @ a domain that I own. Thus, if my hoster goes ventral fin up, I find another hoster. I might lose some time, but I won't lose everything permanently.

My mail reader (Thunderbird) is also configured to always download all new email and delete it from the server. Hence I have backups going back 25 years, which has turned out to be valuable many times. One case was when I was reconstructing the timeline for "History of the D Programming Language" I had a solid resource rather than my barnacle-encrusted memory.

https://dl.acm.org/doi/abs/10.1145/3386323


Its not just startups. I work at a major company and we’ve had internal domains flagged in the past due to internal security testing. We resolved it by making some calls to people at Google because the Safe Browsing dashboard is so slow to fix things.

This is especially troublesome if you allow customers to upload code to run on your systems (e.g. Javascript for webpages or interactive data analytics) You have to isolate every customer on separate domains.


> You have to isolate every customer on separate domains.

Allowing unvetted JavaScript to be served from your main domain is something of a security risk anyway.


But you can smother the damage; startups can't.


Do you need a real domain for each customer or is a subdomain sufficient isolation?


Real domain. If you have customer1.example.com and customer2.example.com, and customer2.example.com serves malware, all of example.com can be flagged.


apparently, submit your domain to https://publicsuffix.org/ to prevent this from happening?


This is not new; such things happened many times in the past (25 years ago Microsoft was the behemoth trampling small companies) and will happen again. I do not think Google is doing it consciously -- this is probably just collateral damage from some bot or rule.

The way to handle it is to reduce dependencies on the cloud. This does not mean cutting cloud services altogether, but once the company is big enough (and the author talks about 1000s SMEs and millions of users), plan for graceful degradation with a fallback to a different provider and another fallback to owned servers.

This takes work and reduces capability during the crunch, but it is often a lot easier and cheaper than people think if planned properly and not in a shotgun style of crisis engineering. My 2c.


Author here. The scary bit is that the blacklist is enforced client side in Chrome and other programs. Our servers and systems were running just fine when this happened, but if Google Chrome refuses to open your website, you're still down.

The closest parallel I can think of are expired SSL certificates, but the level of transparency and decentralization of that system vs. this opaque blacklist is not really on the same league.


Some derisking solution may be wrapping your web app as native client. E.g. Electron app is Chrome technically but you get more control over its settings. I know Microsoft (SmartScreen) and Apple may block apps for many reasons too but at least you get more baskets for your eggs.


Yeah i read stories that Yahoo in 1990s called itself a media company and it's product managers "producers" out of fear that once you call yourself a software company - Microsoft will crush you...

As for using clouds - there is absolutely no point in the world to use them for anything above staging level, or very very low level launches. People should switch away from cloud as soon as they see even tentative signs of a product-market fit.


You will save so, so much money switching away from clouds too.

No, you don't need to use a hundred different AWS/GCP/whatever services, and yes, managing your own infrastructure is a lot easier than you think (and sometimes easier/faster than AWS).

The Stack Exchange network, at least around 2018 or so, was hosted on 12 servers they own!


Completely agree. The clouds are still very comfortable for development though, and i use them a lot. But i'd never even think of using cloud in production.


> I do not think Google is doing it consciously -- this is probably just collateral damage from some bot or rule.

"Collateral damage" from some bot or rule just means that Google doesn't care enough about the edge cases (which, at Google scale, are particularly harmful): Google consciously decided this when implementing their algorithms.


> this is probably just collateral damage from some bot or rule

The point is, collateral damage and/or false positives are not acceptable for a service with an impact like this. In the real world, we consider them war crimes, etc. Bots and rules are implementations of policies and policies come with responsibility.


One corporation must not have so much power over billions of citizens of many countries. A power like that must only come from a transparent non-profit organization with a publicly elected management board.

We will get to that point sooner or later. But the road there will be long and painful.


Said NPO will be captured and subverted when the need arises and it is cost effective to do so.


> We will get to that point sooner or later.

Is there anything in particular that makes you believe that it'll eventually happen?

Because personally my outlook on things is a bit more pessimistic - oftentimes the main concerns of individuals and organizations alike are financially-oriented and few share the enthusiasm for transparency and openness like Richard Stallman does.

The trend of SaaSS ( https://www.gnu.org/philosophy/who-does-that-server-really-s... ) because of companies not wanting to invest time in engineering their own solutions or even using FOSS, alongside with how many of them are handling GDPR and even cookie compliance, with the use of UX "dark paths" (e.g. it being easier to accept advertising cookies rather than deny them) doesn't let me keep a positive outlook on things.

It feels like we'll all be reliant on the "tech giants" for a variety of things for the decades to come, even "de-Googling" oneself not always being feasible.


>Is there anything in particular that makes you believe that it'll eventually happen?

Humans have demonstrated the ability to eventually improve social systems to make them account for the needs and demands of the majority of stakeholders. In the offline world it has evolved into what is known as democracy. It started several centuries ago and eventually evolved into modern governments as we know it - publicly elected management boards.

Recently, there was an excellent article [1] on HN. It rightfully compared the current state of internet to the feudal times and warlords common in the offline world many centuries ago. From that point through a long and painful process we've come to elected governments as the most sustainable form of governing a large number of humans. All other forms of government turned out to be unsustainable (no matter how attractive they were to certain individuals or organizations) and inevitably led to all kinds of social catastrophes.

I believe, the same will eventually happen to the internet, our new brave world we used to love, but now seem to become increasingly disenchanted with.

[1] https://locusmag.com/2021/01/cory-doctorow-neofeudalism-and-...


I’ve being increasingly wary of Google’s offerings altogether. Their ban hammer seems to be driven by Mr Magoo, who looks at everything and sees threats, and makes judgements.


Yes, but an inverted magoo. Mr. Magoo assumed the best intentions of everything he bumped into (and misunderstood).


As a person why suffers from myopia, I find this analogy offensive.


Can anyone "in the know" objectively comment if Google Safe Browsing (GSB) has had a net positive result or outcome for the Internet, at large?

Has GSB helped users, more than it has hurt them?

The anti-Google rhetoric [on HN] is becoming more tiresome as of late. Personally, I welcome the notifications in my browsers that a domain is unsafe. I can't possibly be the only one.


The problem, from HN's perspective, is that false positives on GSB hurt businesses a lot more than they hurt users or the internet at large.

If I'm a random person browsing the internet at large, and a website I try to visit gets flagged as "possibly malicious", well, I probably didn't need the information or services on that particular website that badly anyway. I can find another website that offers the same information and services easily enough. Meanwhile, if my computer or browser is infected with malware, that's pretty bad for me personally. I could lose money, time and personal data and security. The potential consequences are bad enough that I really shouldn't risk it.

On the other hand, if my business is blocked by GSB, that is very bad for my business. The customers I don't lose are going to lose confidence in me. Meanwhile, the cost to me if I am accidentally hosting malware is pretty minimal. Even if a large number of my users are harmed by the malware, they're unlikely to be so harmed they stop paying me, and it's pretty hard for to know where you picked up malware, so it's unlikely to be traced back to me. I've never actually heard of a lawsuit from an end-user against the website they downloaded malware from.

A false negative from GSB is a lot worse for internet users than a false positive; an internet business, on the other hand, would prefer a false negative to a true positive, let alone a false positive.

Add in that internet business owners (or people highly invested in internet businesses through their jobs) are over-represented on HN, and it's no surprise that HN is not a fan of Google Safe Browsing.


> an internet business, on the other hand, would prefer a false negative to a true positive, let alone a false positive.

[Emphasis mine]

This is crucial and it's why the sub-threads imagining suing Google aren't going anywhere. Google will very easily convince a judge that what they're doing is beneficial to the general public, because it is, even though some HN contributors hate it because they'd prefer to meet a much lower standard.

What I'm seeing a lot of in this thread is people saying OK, maybe a burger we sold did have rat droppings in it, but I feel like our kitchen ought to be allowed to stay open unless they buy at least a few hundred burgers and find rat droppings in a statistically significant sample and even then shouldn't I get a few weeks to hire an exterminator? Isn't that fairer to me?


I think GSB is great because there is no other product like it, it is very fast to respond to most threats and it can be used for free. The only thing about it that's not great is, in typical fashion, the lack of transparency about some of the processes. Not about how phishing verdicts are created, this should remain a closely guarded secret, but about what actually happens when you send a report or send a review request.


Author here. It's not really rhetoric, I wrote the post because it's downright scary that your business of over 10 years can vanish in a puff of smoke because Google didn't bother to require an offending URL field in an internet-wide blacklist. At the level they operate, there needs to be a semblance of due process.


Updated my post to make it more clear I was referring to HN and not your post specifically.


What about false positives?

From the fine article: one Google system was detecting emails coming from another Google system as phishing. This is ridiculous.


It's needed to make sure you can not claim bias. For example Google blocking competitors, or unfavourable information.


It's hard to argue against "safe". If they would name it "filtered browsing" it might be something arguable, but "safe browsing" who wouldn't want that?


If Safe Browsing were offered by some neutral internet organization (e.g., similar to IANA) I wouldn't mind. But it's offered by a private company: so it's naive to think that GSB benefits anyone other than Google itself.


I'd guess a large net positive among the general population but maybe neutral for the tech literate like HN readers. Most tech-literate people are careful enough to recognize tactics used by phishing sites and won't click on phishing links, or would click and immediately figure out it's phishing. That cannot be said for the general population.


It seems similar to the move from client side spam filters to the server side.

Spam filtering really didn’t get better with the change (for me), but now it’s orders of magnitude harder to run an email server.

Taking the article at face value, GSB makes it much harder to run a reliable web site. Has centralization of email into surveillance organizations hurt more than the benefit from saving bandwidth to download spams, and automatically deleting them at the client?

How much damage will (further) centralization of web hosting onto social network sites (Facebook, Twitter, GitHub, Stack Exchange, etc, etc.) hurt the internet?

It’s arguably already done more harm than good. I can’t even find a decent recipe that a high end laptop can efficiently display. I used to be able to download cookbooks worth of recipes, and my 386 could load them instantly.


The story he links to, about the "Online Slang Dictionary" being removed from google search because the founder of Urban Dictionary was friends with googlers (true) and (allegedly) used his influence is fascinating:

http://onlineslangdictionary.com/pages/google-panda-penalty/



My plans to trickle out details of my conversation with the Google employee were put on hold due to a massive change in my life responsibilities due to the novel coronavirus, but it’s my intention to resume soon.

As I say on the website, this will culminate in my releasing the MBOX formatted file of the conversation, with full headers.


Eventually, Google will get to the point when regulators will come to gut it and the crowd will be cheering


But, left with fewer resources, Google's security might become like the security of smaller companies, and the crowd will be crying.


Are there any no win, no fee law firms that specialize in these cases? What if for every hour offline, your SAAS loses X money? For this particular case, what if due to the service disruption, some customers decide to move their business elsewhere? Enforce an SLA?


Author here. That was exactly our situation with the impacted systems. We got lucky with the fast "review" and it happened late enough in the day that only PST customers were impacted meaningully.

But still, quite frightening, hence the post. It's not a failure mode we had in mind when we established the SLAs.


Stupid question: Isn't this clear-cut grounds for a defamation lawsuit?

Also, is it possible to have a class-action defamation lawsuit?

The fundamental issue that the author gomox is not stating clearly in his article is that there are no consequences to Google for their actions. None. Literally zero.

I don't think the best plan is to wait and hope for a government to step in and take action. Hope is not a strategy.

Complaining on public forums has similarly done nothing to curb Google's careless wielding of the ban-hammer.

So sue them. Cost them money. Punish them in a material way that they can't ignore.

I can't imagine anything else working...


Teach people how to get past the scary warning one way or another, and spread that knowledge far and wide. With enough false positives their blacklist will be diluted to the point of uselessness and hopefully people will also become better educated in the process.

Google will of course do everything in their power to stop that from happening, but every little bit of opposition helps --- from recommending others to not install censorware browsers, to showing them articles like this --- because this is a fight for the freedom for the Internet. As big as Google is, the Internet is far bigger.


For desktop software, antivirus "industry" can be almost equally destructive.

For instance, Avast breaks installers of software made with a specific installation framework: https://github.com/wixtoolset/issues/issues/5593

The problem lasts for years. At one point I've tried to contact them, but people from Avast were either unable or unwilling to fix their software.


Doesn't Safe Browsing require every URL you visit to be sent to G$$gle first? I know Chrome users "have nothing to hide", but this looks like complete surrender.


Chrome automatically reports URLs and some page content to Google if "Help improve security on the web for everyone" is enabled. This is not enabled by default, even if 'help make Chrome better' is checked before install.

https://i.judge.sh/discrete/Rumble/WindowsSandboxClient_w4Ta...


There is the expected privacy-surrendering API in which you send all your URLs to Google, and a more defensible one in which you download some sort of database to then query locally: https://developers.google.com/safe-browsing/v4


I don't know what the implementation actually looks like in chrome, but it could work on a blacklist that's stored locally and updated on a regular basis.


No, it does local checks first, then only checks the full URL if there's a high probability of a match: https://www.chromium.org/developers/design-documents/safebro...


why would it?

chrome can just load $black_list from time to time and just perform local check


Yep this happened to me too and I came to exactly the same conclusions.

We have a list of completely separate “API domains” that our scripts talk to and which also host the cloudfront CDN.

We also cohort our customers by Sift score and keep trusted enterprise customers away from endpoints given to new signups. This way if someone new does something sketchy to get you flagged it won’t affect your core paying customers.


Some web hosts use Safe Browsing to automatically perm-ban any sites on the list. I've been banned from Heroku for a couple years at this point because one of my sites got added to Safe Browsing as malware and Heroku's systems just automatically perm-banned me (and to make things worse, in the ban email they tell you to send ban appeals to just bounces).


My idea, which will be ignored as usual, is that the problem is the monopoly.

The reason we have a monopoly is because the web browser is now a full operating system that is so complicated that no group can replicate it.

Start over with a new protocol. Make it content-centric, i.e. distributed protocols with no central servers. Support download-limited lightweight markdown sites for information sharing.

Then for applications and interactive content, add a canvas-light graphics system to web assembly. Again, I suggest limiting download size to keep things snappy. And make sure not to load any applications automatically or put them in the same process as the markdown browser.

If you do it right, you will have a common protocol that is straightforward enough that there can actually be several implementations. And it won't be controlled by one company.


If customers using google incurs a tax upon business regardless of whether the business does business voluntarily with google why not work on changing that.

Start with a snazzy our service works better in firefox. Eventually offer trivial new features in firefox but not chrome terminating with a small discount for using firefox. Over time small price increases can render the discounted price the same as the current price and effectively you are charging your users for using a vendor which costs you to do business with.

Google views chrome as a moat around their business keeping other vendors from cutting them off from the revenue stream that powers their entire business. Attack the moat and you might see movement to make your life easier.


It is quite good Google cares about users. But it does not care about website owners. There is one and only reason. For Google WWW is a competition for Google Play marketplace.

Literally open internet is a competition for Google. That is why the company has no problem to issue domain wide ban, without informing website owner, without any explanation and with showing a scary message to website users to make them go away.

Author of the blog post seems to believe it is an AI action. But what I can see his company was hit with some serious damage due to a company that, I assume, has some competing apps on its Google Play platform.

I can believe AI can be the cause, but it should be a court to decide if there is no collusion and who should pay for the damage.


This is an area where regulatory action should be taken against Google. Google needs to implement a process with manual review in a reasonable timeframe, or they should be broken up for having monopolistic power over which sites are on the internet.


I wouldn't be surprised if this was done just in order to associate somebody with something interesting Google sees on the Internet and has no ownership information about so "that they know". Benefit of the doubt is already gone.


Can Google be held legally accountable for this behavior? Seems like they are hurting businesses by spreading false information. With their market power there need to be some incentive for them to react quicker and with human oversight.


If the business wants to argue that, they can sue Google for defamation/libel.


This reminds me of ugliest.app - there was a hn post on it a while ago. And then suprise, suprise, someone made a "paypal" login page which was hosted on the main domain. It was put on the blacklist, not sure if it still is.


I'll tell you a mini story about a coffee shop I visited few days ago. That place was hidden in yelp search when I looked for 'coffee & tea' in my area (their yelp page existed). While I don't know the actual reason why this happened, I immediately discovered that coffee shop using google (as a double check). It gave me a charm because it reminded me a fact that if you have the 'right service', people will find out. Given this flow, I started to believe gatekeepers might begin losing their odds.


It seems like the FTC should be running this for US based customers and browsers should default to a local resource and/or let users override the default source of truth.


Cool, then we can complain about false positives at the FTC instead of at Google!

IMHO, it doesn't really matter who runs it, so long as they're not actively working in bad faith. False positives are a fact of life, garaunteed so long as we have an adversarial malware ecosystem. (For example, the fixes for bad decisions are pretty much indistinguishable from bad actors evading correct decisions.)

The other side of the coin is a web that looks like my missed calls list - everything is assumed to be spam and malware infested until proven otherwise. No one will use your startup anyway, because any given site is probably terrible. The whitelist becomes a thing that people maintain in their heads, and, again, you get a massive incumbent advantage.

The right balance is somewhere in-between, and involves fine tuning the false positive rate. The false positives are always going to be unhappy, and hard to tell apart from true positives trying to keep their scam going.


Google:Don't be evil. Yes, don't be evil but opaque and inconsiderate. It's amazing how a company as profitable as Google has such a horrible customer service.


"Don't be evil" - It's been forgotten about a long time ago.

https://en.wikipedia.org/wiki/Don%27t_be_evil


I have to add that firefox seems to be using the same logic/data for their safe browsing featureand will happily flag sites as malicious with no human oversight.


Before even imagining all the ways to start regulating a tech company, I think we desperately need a few basic regulations like:

- For every major service offered, company must provide 3 ways to contact live support, two of which must be immediate, e.g. chat, phone, E-mail. [As opposed to today’s “standard” of having none of these!]

- Every action that can be taken automatically by an AI must be possible for support staff to immediately reverse.


If algorithms they own are operating on a list they maintain and they are making you lose profit, exactly why can you not sue them for that lost profis? What's the legal theory here? A product they own and is entirely disconnected from you is banning you. This is not and should not be OK, nor should you be required to do any special dances and magic gestures to try and mitigate the problem.


The mitigations suggested are easier said than done. In particular, domains can't share cookies which means switching domains likely means logging out any users that are logged, and losing any local settings. Likewise splitting your site between different domains makes it much more difficult to share state (such as whether you are logged in) between the sites.


Add to the list of preventative measures:

- Establish a Twitter account for anything dev ops related.

Don't assume you'll have the ability to communicate via your internal infrastructure. It also helps customers to know there is a 3rd party medium for staying informed and getting in touch.

Knowing that such things exist, while minor, is good marketing fodder as well. It walks the comms are important talk.


As much as I like to give Google a hard time, this isn' really Google's fault. Always use your own URL's for everything. Also, why would you allow customers to upload files and then make them available? Unless you are dropbox or similar, that's bad configuration.

This really sounds like "We made some configuration mistakes and now blame Google"


Maybe there should be a law that any business that has over ten billion dollars in annual revenues has to answer the phone when you call them and have a reasonable resolution process for complaints.

If that ruins your business model, cool. Just spin off parts of the business until each one is back under ten billion in revenue and do whatever you want.


That’s not a bad idea. In general I am believing more and more that businesses that exceed a certain size are harmful for the overall economy. They may be more efficient and generate lower customer prices but they also harm innovation and prevent smaller companies from succeeding.


Ideally, we would be able to choose the best company, such as the one that does answer their phones. In this case we can't, which is the real problem.


Am I missing something? Is there ever a reason to expose a CloudFront url to the end user instead of using a custom domain?


Is there a problem with doing it? I don't see how that would have helped in this case (if anything, it might have made things worse if Google decided to ban the 1st level domain, which they certainly won't do for Cloudfront.net).


It just seems less professional. It’s much like having a .blogger.com or .substack domain.

We have been trained for decades not to trust random domains. To the uninitiated, a CloudFront domain is random.

I know I’m taken a little aback anytime I go to Amazon’s credit card site - https://amazon.syf.com/login/ it looks like a phishing site.


This Cloudfront URL is not a customer visible URL, it's just referenced for some static assets (images/JS/CSS). The warning is shown instead of the actual SaaS app that is hosted on a "proper" domain, effectively taking the whole thing down.


Great, so legitimate businesses need to implement tactics commonly used by c2c and malware to operate successfully


Well, as long as you are spending 6 or 7 figures a year on advertising with Google, you'll have a account rep at Google that you can always reach out to. Your ad spending level works as Google's filter for which websites on the internet that they actually give any care about not killing.


We spend a nice buck on Google Ads but the impact of getting your SLA-sensitive SaaS app blocked from the Internet is not compatible with reaching out to "someone who might know someone" at a 100K employee company.


There's an effective monopoly on web browsing, and then any private decision here becomes de facto censorship. How can this be constitutional, ants need to rise and get some rulings down on this topic, the web needs to be brought back to how it was.


Seems like a good case for a strict content security policies and self hosting static assets.


I'm always surprised by the gall of Google and other companies that decide for others if websites are suspicious. I'm always sure to disable all those garbage warnings, together with email spam "features".


For a SaaS, CDN's are of limited utility as you have many returning visitors who have cached these assets already. Of course, YMMV, but for us, it was easier to host almost all static assets locally.


Isn't this way to get hurt by a Google's bot a brand new discovery as of 2008 or so? And the bottom line of "letting users upload things is dangerous" is no newer?


We all let it come to this. We are all lazy as f and only care about convenience and short term benefit.

That is why we have the big 5 now that basically are too powerful now to turn away from.


How long until antivirus and safe browsing start marking websites that are "hate sites" as harmful and start, essentially, censoring the internet?


isn't the problem here keeping the cloudfront hostname, vs. setting up a CNAME from your own domain to point at the distribution?


Not really, we own the entire Cloudfront subdomain, and Google is wise enough to not ban cloudfront.net entirely (now that would be an interesting day on the internet!).

Having a CNAME in front wouldn't have made any difference.


Anyone knows what happens if you include resource from a banned domain? Is the resource blocked, or will the user get red screen too?


root.cern was affected by this in the fall, apparently due to a false positive in the windows installer. It was resolved relatively quickly (a day or so?) but hugely inconvenient for e.g. documentation, and of course the particle physics community has connections. root.cern.ch worked but the internal links were all over the place.


Thank you for sharing this. I wonder if having a ton of subdomains might also flag Google to blacklist the parent domain...


"...And that's reason number 3955430, ladies and gentlemen, why monopolies are bad and MUST be dismantled."


Is this not libelous? If the site is neither deceptive nor malware-hosting, and Google are telling people that it is?


BTW, did using another giant's (Amazon) services (like Cloudflare) made the problem better or worse?


> A lot of the cases of blacklisting that I found while researching this issue were caused by SaaS customers unknowingly uploading malicious files onto servers.

This is terrifying - what business is it of Google’s what party A uploads to MY servers? And how are they getting that information without dramatically violating the privacy of their users?


If party A uploads something to your servers and the stuff isn't publicly accessible, Google doesn't do anything about it. But if that content is accessible by the public, Google feels a need to protect the public.


Easily solved using the anti-trust act. Time to break up Google and perhaps a few others.


Will anybody here stop using safe browsing though? Or Google products for that matter?


This is really terrible, I sure hope the EU causes a stink about this


A new method of DDos: send the domain to GSB blacklist!


Could the re-use of IP addresses be the problem here?


I say it's time we get rid of these monopolies?


Cool thread I have archived this on my tidbits feed.


Soon enough this will be used to block other kinds of "unsafe" sites containing dangerous things like "hate speech".


Can we have an ant army already!


Sue them for libel.


posted on medium which decided to paywall after years of being publicly available.


Author here - I haven't signed up for Medium's "pay the author" thing, which I think should make my content free to read and paywall free, is that not the case for you?


A bit of deception on how their site ended up on the block list. They strangely block out a part of their response, but we can see "was cleared", which sounds a lot like "the malware some nefarious agent put on my site was removed".

How sites end up on the block list-

-they host malware, either intentionally or because they were hacked.

-they host a phishing site, either intentionally or because they were hacked.

Protecting users is a monumentally more critical task than your concerns.

And this system is incredibly valuable. When I get a text to a phishing site, I immediately report it to the safe browsing list. I also notify the nameserver, the hosting agent, and if applicable the SSL cert provider. Bit.ly if in the chain, though they never do anything [fun fact, even -- phishers and malware authors love putting bit.ly in the chain because they're paying subscribers, and as domains are taken down they can just change the destination. Bit.ly exists on the backs of scumbags, and itself should be on the safe browsing exclusion list]

Usually the safe browsing list addition happens within an hour, saving many people from being exploited. The nameserver and host -- DAYS. Namecheap takes an eternity to do anything, even for outrageously blatant phishing sites. GoDaddy - an eternity. SSL providers seem to act quickly, but propagation delays makes that negligible.

EDIT: 11 days ago I reported the scn- prefixed netflix.com to all of the above. This is a blatant phishing site, and was mass texted to Canadians. It was blacklisted by safe browsing within an hour, likely saving a lot of people grief.

Namecheap, who I informed by their email and by their garbage ticket system, still host the nameserver and physical hosting for this site. 11 days later. Grossly negligent behavior, and there needs to be some window of responsiveness because these players are just grotesque at this point.


Author here. I blocked the message in the screenshot because I narrated the first incident, but took screenshots during the second one, so the redacted part was referencing the first one in which, as described, our domain was cleared without actually doing anything.

Protecting end users from nothing at all (like I said, there is no offending URL) is not more important than making sure Google doesn't literally gatekeep the entire Internet, IMO.


I guess. Odds are that there was something, and you have every reason to state otherwise. You're really focused on the URL, but a whole domain will be tagged when random queries are met with content dispositions with malware, which can be automatically flagged by the search engine.

As an aside, your commentary about Google alerting to phishing emails seems like you're misunderstanding and trying to use this to further your "it's all random!" claims. They aren't flagging it because of the sender, but instead because the contents included a URL on the blacklist. Google re-scans and when they find URLs that are now blacklisted, they warn about Phishing. This isn't new and they've done it for years, and it seems pretty obvious and logical.

e.g. "That email you got a while back that claimed it's from the Netflix billing problem website is actually phishing. If you gave them details, that's a problem".

"Protecting end users from nothing at all (like I said, there is no offending URL) is not more important than making sure Google doesn't literally gatekeep the entire Internet"

This system protects countless people from malware and phishing daily. I have no reason to believe your particular claims about this (though I'm skeptical given that you are blocking details that would allow others -- such as Google -- to repudiate your claims. Why block the subdomain? If it hosts static resources, what's the concern?).


I am not misunderstanding anything, the fact that Google's own legitimate emails are flagged as phishing by their own filters is pretty telling about the reliability of the whole thing. The fact that you can come up with a plausible explanation to why it happened doesn't make it any less damning.

But of course, they don't flag google.com as a spammy domain and stop all emails coming from it, right?

PS: Im not sure exactly what you are disputing. Are you suggesting their report pointed to a smoking gun on my site, and I'm lying? My experience is not unique. There are plenty of instances of the same type of issue affecting other people in the very comments you are reading.


"the fact that Google's own legitimate emails are flagged as phishing by their own filters is pretty telling about the reliability of the whole thing"

It detects blacklisted URLs in emails and sends warnings, retroactively given that sites are caught some indeterminate time after they might have been communicated (flagging if you have interacted with the email and thus might have been compromised). It seems like it was perfectly reliable.

That isn't damning at all, and it should embarrass you that you cited that, seemingly confused about the reason.

"Im not sure exactly what you are disputing"

I'm saying that we have zero reason to believe you (but reasons to not believe you given that you're redacting things that don't need to be redacted). People caught in the nets of things like this -- through their malice, carelessness, incompetence, etc -- always claim innocence.


If Google flagging its own e-mails is your idea of a perfectly reliable phishing detection system, I don't think we are going to find much common ground.

For what it's worth, it's all true :) Good luck to you.


Sure it is. Keep on harping about a warning system working perfectly (because, again, you clearly fail to understand it)...it makes a really good case for your screed.


> When I get a text to a phishing site, I immediately report it to the safe browsing list.

Please, don't do that. You're just giving more power to a private company (Google). It's so deceiving, I know: reporting/blocking malware sites is a good thing, but doing so via Google diminishes the returns so greatly that it's no longer worth it.


As opposed to what alternative? Google's safe browsing list is used by everyone, and is currently the gold standard. There exists no alternative. NextDNS uses it. Safari uses it. Firefox uses it.

Yeah, I'm not feeling guilty about this, and I'll do it every time.

Note that the list isn't like a spam list or something where bad actors can just flag something and get them blacklisted. When you report to the safe browsing list it is actually verified, and when it's a fake bank/netflix/Amazon/etc login, it's pretty easy for them.


That's right, there is no good alternative at the moment.


Could this be the basis for a class action lawsuit?


Most of Google's "safety" features are somewhat evil in some way. I don't want any of them, but some of them can't be disabled (like the one that can lock you out of your account even if you have the correct password).


Which one is that.


Sometime Google doesn't recognize your device and then your password is not enough... even if you have second-factor authentication disabled. So if you don't have a second form of contact like another phone number or another email for recovery, then you are fucked. Sometime they even ask you for a previous password for recovery, so if you use a password manager that doesn't keep history, you might also be fucked.


Is this only when using MFA. Sometimes, without MFA enabled, if you just change the user-agent header they send an email that they have detected a "new device". What if you just exported all mail each day, maybe this could be automated, then in the event of a lockout at least you have all of the stored mail.


I don't use MFA.

Also, I have my emails backed up, but that doesn't help for authentication/recovery with other services/external accounts that were created using that Gmail account... Maybe I need to host my own but that comes with a plethora of other problems.


Top burny busty chicks only on this site! Follow the link, and you won’t be sorry! - https://adultlove.life


Just sue them for damages. It's libel.


Why can't companies like google just have a warning and review period before taking actions like this?


It's absurd. This thing happens on the play store. I've seen it happen multiple times due to pure mistakes. It takes an appeal and time to resolve the issue, in the meantime you are stuck.

Their appeals form only lets you submit 1,000 characters, no images or attachments. So in many cases, it's hard to even provide proof of the mistake. For example, if they falsely takedown your app for trademark infringement, but you have priority rights in a country or a registered mark, how are you supposed to effectively prove that in 1,000 characters with no images? In one case, we had a decision from the trademark office in our favor, but we were unable to attach it in any way and had to try and summarize it in like 300 characters.

There is no reason in most cases to not provide a warning period and the opportunity to provide evidence and exhibits.

They act so much like a monopoly in this case that they are stupidly making things harder for themselves. Sundar and Google's legal team should take all the PMs aside and tell them they are going to start losing antitrust cases left and right if they can't provide more due process for decisions.


I have no extra knowledge on the subject, but if the flagged website was indeed serving malicious content, the brakes would have to come down pretty hard. If you have a review period you can end up serving malware to hundreds/thousands of people. Don't know how often this happens, though, and what the false positive rate is, it'd be interesting to see.


Reviews would have to be done by humans and humans doing things themselves is bad for the bottom line.


They don't even validate that blacklist entries actually contain an offending URL in the report. That's how much they care.


Because malware sites are practically ephemeral, pop up and disappear on short time frames. A review period wouldn’t do much except let them game the system even better.


That would probably cut a lot into their profits. Automating these tasks even if some people get cancelled wrongly is way cheaper than hiring people for reviews. Hey are so big that losing a few customers doesn’t mean much to them.

I am waiting for the day when this happens to a large company. My company has more and more stuff on AWS. If Amazon cuts us off by accident the damage will quickly go into the billions.


1) Google doesn't want any humans in the loop. Humans are expensive. Would sending a warning first result in more humans involved or less? More. So not gonna happen.

2) Google claims any information given to exploiters of its rules and systems aids the next attempt. So they don't like to give out any information about what AI rule you tripped to get banned.


“We don’t care, we don’t have to.”


Google doesn’t need to do it, so they won’t spend the effort to do it.


This happens when the ticket for braking anti-monopoly laws is magnitudes cheaper than the profit you rake in breaking it.


Wow. I wonder how lng it will be before the Big Tech oligarchy will start blocking websites for “misinformation”.

Insane world we’re heading towards.


Wait until this is also applied to a list of domains from the SPLC and other groups to further censor “hate speech” on the internet.


Imagine a future where multiple big tech companies share “blacklists” of individuals and applications that should be banned across their networks. Your entire business and digital life could be snuffed out in an instant. Already seen it happen, now it just has to scale.


I wonder if a blockchain/bittorrent decentralized option could exist to replace google.

most people don't have billions lying around to compete, but you could reward people who rented out space for the indexing data, and have advertisements baked in that could maybe still use some retargeting but without tracking any personally identifiable data about a person.

Nodes could double as ai/cpu processing for algorithms related to search and storage. Computation and storage amounts could have their own payout per action, or per time on storage.

Most people have their computers on all the time anyways, so if they're working in the background for them to earn some side income, while helping create a better internet.

Would need some centralization I'd imagine though, I think the problem with de-centralization is the goal is ALL or nothing.

Like one or two big servers that maybe tie everything to the rest, and push 'updates' on algorithms, contracts,etc... to end users. Maybe a segregation index, knowing all airplane related searches are indexed on cluster c which has nodes 1-8, so you know where to go to get the info being searched.

I'm a mainly full-stack but 'dumb' developer, not an algorithms wiz, mostly focused on crud apps. But this would be fun to build.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: