Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Google spam filter getting worse?
158 points by jgwil2 on Jan 20, 2023 | hide | past | favorite | 91 comments
I have noticed an uptick in uncaught phishing messages in the past few months, and talked about it to a friend who observed the same. Anyone else?

For months now, emails with subjects like "MCAfeeconfirmati0n--#21845315" and "confirmation#4073301981" have been hitting my inbox. These are such obvious spam emails that I'm unsure how the spam filters aren't catching them. Reporting them as spam hasn't done anything to catch them.

I have this same problem with Outlook. Starting probably 2-3 months ago I began receiving somewhere from 5-10 spam emails with titles like this a day directly into my inbox. Reporting them as spam helped a little and brought it down to maybe 1-5. But they’re obviously spam with subjects like Norton Confirmation, OuOrtIBGGvGIO, Life Insurance Offer, etc. with weird fonts and other stuff.

As a side note, a lot of these spam emails I get are from Gmail.

Judging from my own spam label on gmail, those messages are part of the torrent of junk that is pouring out of Microsoft's "hybrid on-premises exchange" egress VIPs. Basically some clown who pays Microsoft for quasi-hosted Exchange has a virus that sends spam, and Microsoft blesses it with the reputation of the customer egress addresses. Eventually, this will stop working for Microsoft but at this time it's like waiting for Greenland to melt: inevitable, but takes a long time.

Also worth noting if you are trying to evaluate gmail's classification performance that the vast majority of what they think was spam is not in your spam label, it got stopped with a 4xx error code at SMTP time. So you don't really have a way to know the denominator.

Ironically Microsoft are the only major MX that won't accept email from my server.

And good luck getting off that list if you're on a hosted VPS... they're about impossible... I can get through to hotmail and o365, but not the outlook.com block. (shrug)

I'm relaying through SendGrid as I just don't have that many emails coming from/through my server that it's worth the lowest paid level (there is a free tier) to have to worry about it...

I've been considering setting up a higher end server (compared to the $20/mo vps I'd been using) at a data center and seeing what I can manage as a direct mail host without the relay. But 10x-ing my costs just doesn't feel right for something that will take more time and not generate revenue that I'm not that passionate about.

For those curious, been looking at WildDuck mail which seems like an interesting structure and the features are cool, just not sure I want to go through it all. I've been using Mailu via docker-compose on DigitalOcean for a couple years for all my lesser used domains/addresses, relaying through SendGrid. It works but kind of annoying going through setting up each domain added through the relay.

My dedicated server is also blocked by MS. https://sendersupport.olc.protection.outlook.com/snds/ is supposed to help me resolve it, but it says my IP address has no reported incident.

Ironically, SendGrid is the main source of spam passing through my spam filters; but I can't block it because about 1/4th of emails I get from them are not spam

Funny. I'm on Outlook and mine is (sort of) the opposite, most of the spam that comes through is @gmail.com these days. Seems like spammers are taking advantage of known trusted relationships between services to increase delivery rates to specific domains.

Seeing the same. Someone from Google please fix this. I've gone from one spam a month to several a day. I've been using Gmail since the beta.

They're multi part which seems to trip up Gmail, it seems one part is scanned and another displayed. Base64 decode the source parts and add a keyword filter for the "non-spam" text as it's usually pretty static.

Yeah, it's been happening to me for about a year now. I went as far as to make another email just to avoid it. Made me sad. I had that email address since 2008 or so.

I had exactly this yesterday, only the email address was my own Gmail with a dot at the end so when I opened the email the name was "McAfeeSecurity" with my own email address and profile picture.

I reported it as spam and Gmail helpfully asked if I'm sure because I communicate with this person a lot and when confirmed said it will block the sender. Unsure if this will have any impact on the emails I send out myself now.

Rather worrying that Gmail addresses can be spoofed.

Same here, it's so bad.

Google probably lets some amount of known-spam emails through for data gathering. See this quote from Google's "Rules of Machine Learning" [1] (A great resource by the way)

> Rule #34: In binary classification for filtering (such as spam detection or determining interesting emails), make small short-term sacrifices in performance for very clean data.

> In a filtering task, examples which are marked as negative are not shown to the user. Suppose you have a filter that blocks 75% of the negative examples at serving. You might be tempted to draw additional training data from the instances shown to users. For example, if a user marks an email as spam that your filter let through, you might want to learn from that.

> But this approach introduces sampling bias. You can gather cleaner data if instead during serving you label 1% of all traffic as "held out", and send all held out examples to the user. Now your filter is blocking at least 74% of the negative examples. These held out examples can become your training data.

> Note that if your filter is blocking 95% of the negative examples or more, this approach becomes less viable. Even so, if you wish to measure serving performance, you can make an even tinier sample (say 0.1% or 0.001%). Ten thousand examples is enough to estimate performance quite accurately.

[1] https://developers.google.com/machine-learning/guides/rules-...

I don't think that explains the very obvious crap that gets through, for instance several near duplicate spams in a row, each of which I manually reported.

Gmail is the prime target for all spammers. I see regular reports, also for Google Search results. Nobody has an answer really.

3 days ago "Tell HN: Gmail's spam filters have gone bonkers" https://news.ycombinator.com/item?id=34411009

1 month go "Ask HN: Do you all get spam in Gmail daily?" https://news.ycombinator.com/item?id=34093812

4 month ago "Ask HN: What's happening with Gmail spam filtering?" https://news.ycombinator.com/item?id=32923098

"Ask HN: Is Gmail spam out of control for everyone else too?" https://news.ycombinator.com/item?id=30315116

These dupes are getting tedious, largely the same comments as the one 3 days ago (94 comments, 127pts). People agree, other mail hosters say most of their spam comes from gmail and outlook, various folk point out they've switched to competitors and it's much better (for now)

Thanks for posting. I didn't realize that this had been such a hot topic recently.

I run my own mail server + spam filter, so I'll chime in. I have seen a high uptick in spam making it to my inbox in the last two weeks. I primarily rely on Spamhaus blocklists + a Bayesian filter trained on old spam.

The uptick I have seen is going from 0-2 spams making it to my inbox to 10-20 spams making it to my inbox. When this has happened in the past, I have assumed it is spammers bypassing blocklists by finding new hosts, or by spammers finding a clever way to beat the filter. Usually after these big upticks, they drop off again suddenly, which makes me believe that it was a blocklist bypass and not a filter bypass (my filter is pretty weak and hasn't been retrained/updated in many years.)

Given all the news about hacks with self-hosted Exchange, more likely they're relaying through hosts with a built up trust... As good as Exchange + Outlook are as a user, it is pretty painful to see exploits in the wild like this.

The whole system just sucks as a whole, and feels too entrenched to come up with something better. Even a notify+pull system wouldn't fix these kinds of exploits, even if they would correct end-user breaches.

I use rspamd for my self-hosted mail and I still don't really see any spam at all. I've spent quite a bit of time tuning it (ensuring that domains I expect mail from are trusted, mostly) but I can't believe how GOOD it is.

Spam is not countable

Yes, it's been measurably worse for somewhere on the order of months to years now.

I'm not sure what they've changed internally, because if they have talked about their engineering strategy for spam detection (which I doubt, since it's probably asymmetric information), no one has shared writings about it.

Nevertheless, I get obvious spam in my inbox now, and important email occasionally goes straight to my spam filter now.

People here on HN have been speculating that they moved to some sort of machine learning model, probably because employees were incentivized to pervert the existing product for promotion purposes by gaming internal metrics to prove they've had an impact.

Another anecdotal datapoint, but - I haven't noticed an uptick in actual spam making it to my primary inbox. I can't give solid numbers, but it's not been bad.

This includes a marked increase in crypto spam/phishing emails due to the cointracker email list breach - those have pretty much exclusively gone straight to Spam (including those using Google Sheets so it has an official Google sender email).

Again, just an anecdote, and I don't doubt that you and anyone else reporting an increase is experiencing it.

I have a month old business email for my new company setup with GSuite and Google's own on-boarding emails went directly to spam in that inbox. I haven't marked any emails as spam with this new account yet.

I have been getting tons of PDFs which in the previews shows pictures of women. The subject and body of the emails just seems to be random words like in a seed phrase, and with some random single digit numbers. The email is sent from office, hotmail or gmail accounts and verifies. The TO field is also filled with other emails. I have been getting this for like 3 or 4 months, and report as spam does not work. In all the years I have had a gmail account it has never really been a problem.

Microsoft has the problem as well, it's not just Google. Do they not filter outgoing?

  Message ID <9UOejz_TlFksgoyXm9GI5Q@notifications.google.com>
  Created at: Fri, Jan 20, 2023 at 9:14 AM (Delivered after 0 seconds)
  From: "Girl Shows Girl cast a lookSTART JOIN Muriel (Classroom)" <no-reply@classroom.google.com>
  Subject: Class invitation: "Check Join now View gambling Babe amidcustity"
  SPF: PASS with IP Learn more
  DKIM: 'PASS' with domain google.com Learn more
  DMARC: 'PASS' Learn more

  Message ID <DM6PR18MB3569050DD20FD0372DA98C9DCEC59@DM6PR18MB3569.namprd18.prod.outlook.com>
  Created at: Fri, Jan 20, 2023 at 4:50 AM (Delivered after 3 seconds)
  From: hoven patroo <hovenpatrool@hotmail.com>
  Subject: 名梦 t94396350
  SPF: PASS with IP Learn more
  DKIM: 'PASS' with domain hotmail.com Learn more
  DMARC: 'PASS' Learn more
You would think they'd do some basic bayesian filtering. This was stuff we fought in 2002.

The first one is generated by apparent user actions from paid organizations. Although it's clearly spam, you can see how this is difficult for a provider to tackle, because all of the superficial signals are good: authenticated user, paid account, using official APIs. Obviously they need to step up their defenses against abuses like sharing from docs, calendar, etc to stop bad actors from laundering their spam through Google's highest-reputation internal senders.

When I worked in this area of gmail we called this the "russian urologist" problem. How do you correctly classify traffic like this when hypothetically some of your customers want to send and receive messages about viagra in russian? Casual observers will say that is spam but not to the russian urologist.

I bet I'd get flagged if I tried to email 100 of my customers from my gmail account in the same hour.

Probably, but what if you uploaded a PDF to Drive and shared it to a giant mailing list?

We have the opposite problem. We send lots of newsletters (no spam, mostly government) and have an excellent IP reputation (Senderscore, Microsoft), except for Gmail, where our IP reputation has declined in recent months, for no apparent reason.

I hope they are just tweaking things!

I'll try to address the specific question that seems to have been asked, which is about phishing. Phishing and spam are two different classes. Spam is largely classified based on metadata about the transaction and only to a lesser extent the body of the message. Phishing, on the other hand, is almost purely based on the content, because it revolves around stuff like the message seems to attempt to confuse the recipient about the sender's identity, or includes URLs that appear to be intentionally confusing, or is using domain names that seem to have been intentionally formed to mimic your organization's domains (for Workspace customers). So you are going to see very different outcomes for spam and for phishing, and quite different outcomes for gmail.com accounts vs. Workspace accounts.

Yes. Google has loosened their spam filters. I have noticed.

My educated guess on why? Lawsuits from political parties, notification of class action litigation against Google and others, union notifications, insurance notifications, and similar emails ending up being caught by spam filters.

The lawsuits are piling up.

I think they have a target for "% of good emails filtered as spam" and their classifiers need to choose a lower recall operating point to hit that target, because the spam has gotten harder to detect.

on this particular subject, on my personal account I receive very little real email any more, everybody switched to IMing. I wonder if that's taxing their system by not given gmail enough "good stuff" to chew on, to learn the difference? but that would mean they're personalizing it for me somehow which I doubt. maybe everybody's real personal email quotient has gone down.

sort of interesting, the real competitive enemy of gmail was not icloud or outlook, it was iphone.

Not exactly spam, but quite often mail are badly sorted and promotional mail get into the main inbox. One of the main offender is aliexpress. They send everyday some mails from various addresses : buyer01.m@mail.aliexpress.com services01@aliexpress.com exclusive01@mail.aliexpress.com ae.like18@mail.aliexpress.com buyer-info18.m@mail.aliexpress.com

And every month or so they vary the numbers and I have to tell the filters to route them appropriately to the junk folder. (And I have to tell one mail at a time because if you try to select multiple with different mail addresses the filter doesn't propose to add it to the filter list).

I've used Gmail since 2003 and consequently was (un)lucky enough to get my $FIRSTNAME@gmail.com - it's certainly handy but boy do I get a lot of spam - 3-400 hundred a day I expect.

I've definitely noticed an uptick recently and what is most perplexing is that some seem like they'd be easy to catch - in fact, I set up some Gmail filters to do so and they seem to be working 100%.

I can only imagine, mine is a pretty popular name as well, I see quite a few entries where I get mail obviously to other people... It gets kind of annoying to say the least...

Examples like, someone put me on their Farm Equipment account, so I was getting receipts and marketing... Also got on someone's college application, so funding notifications etc.

I have sometimes made an effort to contact the org or person in question... I did manage to change someone's password for a dating site, and changed their profile to "I don't know how email works" etc. When I couldn't reach the person.

Heh- I get quite a lot with content like:

"Hi - Grandma here - so nice to see to see you last week and glad your studies are going well. I hope you liked the sweater I sent over.

Lots of love."

Still not sure if they're real and granny assumes @gmail.com must be the right email for her grandson's name or it's some sneaky phish.

I get a lot of personal info too like paystubs and legal docs for other people with my first name. I usually reply with "you have the wrong person - you might want to check" and receive a surprising amount of replies like "how dare you read private email - delete immediately or we will take legal action... " Oh well.

Could you share those filters?

I don't think there is a way to export filters from Gmail but they're just a collection of simple rules like:

"Congrats CALLU !" in subject goes to spam

Like I alluded to in my post, it feels like these would be easy for Gmail to catch directly.

It absolutely is. A few weeks ago I decided to create what are now dozens of rules to manually filter out spam and it has been extremely effective for me (20+ a day we’re hitting my work inbox).

My best filters target the “opt out / unsubscribe” language people put in their footers. I iterate a few times a week as things sneak in. I’ll never get 100% but the results have been very positive.

I get dozens of terribly-formatted spam emails per day. I get about 2x-3x as many "spam" emails as I do real/marketing emails. "Spam" is in quotes because the emails make no sense...I can't see anything they're even selling. There will just be one non-sensical link by itself that I never click.


from: runnerup info_GBAQBHFLXV@news.ukgkkwwumjhqu.edu via netorg12672764.onmicrosoft.com

subject: --confrmtion-70346102

content: a link labeled: "runnerup-N0tificati0n"

I can't tell if Gmail can't figure out these are all spam, or if they purposely send me 1-2 dozen every day because I religiously label them all as spam and no one else does. Maybe that's because its super hard to label an email as "spam" using their mobile app! My wife has also noted a large uptick in similar spam emails over the past 6 months.

I was just about to ask this on here! I regularly check junk mail just in case and it’s been crickets for a long time, but in the last couple months seem to get like 3-4 spam emails in there a day, and regularly into my inbox, usually a Geek Squad or McAfee “purchase” receipt. Very clearly spam.

Don't forget to check your spam folder for ham as well!

Yeah. I have an account that bounces through gmail as part of a forwarding chain. About 25% of its non-spam messages were being silently stuck there (and not forwarded) until I disabled the gmail spam filter. The downstream account (fastmail) spam filter works fine.

I noticed this as well, switching to kind of a relatively new service called Tutanota as I haven’t heard great things about fastmail and protonmail when it comes to spam and looking for something using open source tooling. We’ll see how it goes.

Yes, I got the blatantly obvious but still scary coinbase one:

The subject says "paid" -- "Reminder - You have paid an invoice"

but the email says to pay it.

"Please pay your invoice

Coinbase would like to remind you to pay invoice xyz.

Amount due: $599.00 USD"

With sender email being paypal.

I think it's a combo of two things:

1) To get the best training data, you sometimes need to let things you've classified as spam into the inbox to verify that the user marks it as spam. It's pretty standard for training a classification system to occasionally pass negative samples to verify their negativity.

2) The spam filter itself almost certainly has a latency budget, and if it can't respond in time, the message is passed unfiltered. In other words I think the spam filter fails open. It's probably just been down more lately.

Gmail's spam filter is still a dream compared to Outlook (I use both regularly). Absolute lazy garbage gets through to my inbox and has for years [0]. I used to complain about it to their support but nothing has changed. I certainly wouldn't recommend Outlook to a non-tech-savvy person.

[0] https://twitter.com/JiriPospisil/status/1108355909099667462

Haha, I just checked Outlook and received this. You can't make this up...


I think there's something bigger going on. At work we have non-Google accounts and it's gotten worse there as well. Like all of a sudden about 3 or 4 months ago, I just started getting bunches of very very obvious spam that wasn't caught by our previously pretty good filters. (Subjects like "Meet Russian Brides" and From fields of "Foo Print Advertisement", etc.) I wonder if someone has figured out how to game the current system or something?

I get 10-20 a day.

Lately it's been Google classroom invitations from sex bots. Along with the random crap that doesn't make any sense, and the McAfee/Yeti Cooler junk.

In your Inbox or your Spam folder? I think OP was about emails that end up in your Inbox.

In my inbox of course.

Wow! This is a lot!

Spam filtering is a cat and mouse game. The moment you think you have the "perfect" set of rules, scammers will figure out how to game them. Then you'll have to make changes to handle the additional cases. Rinse and repeat.

I have anecdotally seen slightly more types of scam/phishing messages slip through the filter in recent weeks, but I assume it'll go away in the next round of updates from Google's side.

I understand the cat and mouse nature of trying to control the unruly, and recognize the truth in what you are saying, but I got this one today, how is

    From: ConGraTuLaTioNs!! <info_rGoJOQymodm@appeme.website>
    Subj: *robert; $2,613,527 WiNNer ANNouNcement!*
not already detectable by the spam filter? I get them pretty much every day

I get a significant amount of recruiter spam: Every day I get 2-4 cold emails from random recruiters that are generally poor matches. (IE, the recruiter never read my resume, is probably sending email to thousands of people, ect, ect.)

I always mark these as SPAM, and the next day more recruiter spam comes in. There's no way to unsubscribe because they always are from some random independent company.

Yes, I've been seeing more in GMail, but that's nothing compared to Google Photos spam and Google Calendar spam, which I get hit with every other day.

To maximize the ridiculousness, Google sends me an email thanking me for each image abuse report or chat abuse report done in Photos -- but they don't seem to be actually /doing/ anything about it.

Has Google ever publicly talked about their spam performance filter over time? For me this past year I get obvious spam messages in my inbox every week. Is it that they can longer filter at the required scale? It seems hard to believe these messages could evade even the most rudimentary filters, so I assume they're not being filtered at all.

The spam arms race continues to escalate. Broad availability of tools like ChatGPT has probably helped spammers in the short term.

If any good can come of this long term, it would be the ability for me to charge people to get an email into my inbox. This has been proposed multiple times over the decades, but has never been more needed or feasible than now.

Anecdotally all the spam I get is exactly the type of spam I got 20 years ago on the same email address. I don’t think it’s chatgpt but either a loosening of filters or something else.

I'm having the opposite problem. Sometimes even my replies to someone with a Gmail address go to their SPAM box. What kind of a filter decides you don't want to see a message from some you messaged first?

FWIW I have my own domain and switched to Google as backend long ago, and yet I still occasionally have this problem.

I've also noticed this a little bit, virtually identical situation. Searched around for a bit, narrowed it down to the reason being that I had a UPS tracking link in the email... which was the point of the email, to send that link.

There are several Google Groups that I subscribe to and this regularly happens:

A real person who I know in real life, whose messages I care about posts to Google Group from a Gmail account, and the message ends up in my own Gmail spam filter.

Like - the message didn't even leave the Google infrastructure and it got tagged as spam?!

I've had email messages from Google about Google products for which I have an active account using my Google email address get marked as spam by Google spam filters.

That settles it then. They truly have lost their edge.

Even spam can originate in Gmail. There’s no reason not to scan everything.

It was not spam though, and you'd think it'd be fairly easy for Google to figure that out when they control the whole round trip.

Try to treat it like weather. Some times things are clear for weeks, then you get hit with storms. My wife and I both have had Gmail accounts forever and we never see the onrushes of spam at the same time. So I think it's the noise of two algorithms fighting. We should all get used to it.

I've been getting phishingesque emails from domains that are just random characters .ml (free domains)

I've experienced kind of the inverse of that lately -- using Workspace (and their domains) for email and regular outgoing emails are ending up in receiver's spam box. SPF/DKIM/DMARC/etc all setup correctly, tested (and working fine for many years).

I'm determining how to migrate off of gmail. My inbox has been destroyed and I can no longer use it reliably, it's impacting my personal life. Spam comes in every day and no amount of "mark as spam" can save me apparently.

Phishing emails in particular seem to go right through most spam filters. It seems like email providers should be focusing on these (spam emails don't annoy me as much as an email carefully crafted to steal my identity!)

I absolutely had this issue most of last year, but the Gmail spam filter seems to be catching things more effectively for me in the last two months.

There has definitely been some more getting through the filters the past few weeks for me. Maybe January is a special month for spam or something.

The main issue is so many scammers are signing messages with DKIM/SPF too.

Yeah, close to the November election, it felt like Google stopped filtering election spam email suddenly.

In the past few months I've been getting much more fake order / payment spam.

Google needs to do something about google forms/drive spam and other bypasses

Dear god I thought I was the only one! For awhile I was getting multiple drive spam requests a day. Then they stopped. Yesterday I got my first one again in months and it just is not okay. They are all coming for clearly fake emails and sending Russian bitcoin shit. I know Google has OCR, maybe use it when someone invites 50k people to a drive and there are no social graphs related.

Yes, it's getting worse. I get and mark as spam the same email pattern over and over.

TL;DR: almost all the people who care about quality at Google are gone or not in a position to improve the product

don't use your main email to register in random websites

that's as simple as that

How about reframing to "Is spam getting harder to filter?"

No one at google wants spam.

>"Is spam getting harder to filter?"

I look at it a different way, spam and filters are locked in an evolutionary arms race and at the moment spammers have found an adaptation gives them an advantage. In due time the anti-spam filters will adapt as well. It has always been a difficult problem.

Customers don't care. Due to Google's behaviors I assume some or many people do not agree with the sentiment "no one at Google wants spam."

What benefit does Google have for letting spam through?

Probably just insufficient incentive to stop it

I suspect they spent several millions of dollars a year and at least 20-30 people if not more and I think you don't have any idea of how hard the problem is and how it's getting harder all the time.

It's going to get even harder as spammers use ChatGPT like tech to write individual spam messages for each person

Why do you keep telling others they have no idea how hard it is?

My point from the higher level comment is that the customer does not care how hard it is. If chatgpt makes it harder there is nothing stopping Google from innovating and improving their detections. The comments are calling out that they seem to be falling behind the curve as more dangerous phishing and spam/fraud emails slip through.

I for one have no sympathy. Google did the same as other giants and gobbled up as much tech talent as they could only to layoff thousands later. If you are telling me I need to feel empathetic for the company reaping trillions from invasive data harvesting and monopolizing the most used digital services on the planet, I shall play the smallest violin I can find.

SpamAssassin picks up mail that Gmail can't, that doesn't have a billion dollar corporation behind it.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact