Hacker News new | past | comments | ask | show | jobs | submit login
Alarming number of spam false positives in Gmail (dedasys.com)
54 points by davidw on Mar 12, 2015 | hide | past | web | favorite | 65 comments

Google's spam filtering in general is abominable. For years they even managed to flag mail from some of their own services, like Analytics, as spam. Way too many false positives for main stream services that send perfectly ordinary decent emails in general.

On the flip side, the spam filter seems to be very US-centric, allowing a lot of spam through my local ISP's spam filter does catch.

Finally the lack of control over the spam filtering is ridiculous if you compare it to what many ordinary ISP's offer.

I can actually see the point of the analytics messages being labelled as spam. That's precisely what it is usually.

But it's not technically spam, as you have actually signed up for the analytics account. Perhaps this illustrates the reason why it is tagged as spam: a lot of people just hit the 'mark as spam'.

The irony is that most of the spam that gets through my spam filter comes from gmail.

I think gmail spam filters were OK at the beginning, because spam was so blatant at the time. Nowadays we have both more use of commercial services in valid email, and spam became more sophisticated. As a result, whatever they used to do is just not enough to flag email with high certainty.

I'm beginning to think that we should have two different spam filters.

One that is with p > .995 spam for all the viagra pills and lasik and everything.

Another should be for the emails that are probably spam but not certainly.

For the record, I just went over the 151 spam messages I've received over the past 2 weeks and had 0 false positives. That's pretty good.

I regularly get numerous false positives every month. Have been for over a year, so I don't see this as an extremely recent phenomenon.

These false positives include messages from Google services and from my work email (which is in my contacts).

It's a little strange. Some false positives are very understandable. A lot are just ridiculous. I'd gladly lower the sensitivity of the filter if Gmail allowed it.

And I went over the 300 spam messages from the last month and found 6 false positives. None were important, but one could have been.

I go in every month or so to retrieve a handful of legit messages and mark them as not spam. Today I had about 50 things in my spam, about ten of which didn't belong. Most were things I don't really care about, like LinkedIn invites to: some email address other than my main one, and some mailing lists I've subscribed to, but still this is bad and getting worse.

On a side note, about 20% of my actual spam over the last few months purports to be from young women named "Jessica," though it's not clear that I'm supposed to believe that it's coming from the same person. Is this just a go-to name for spam pretending to be from friendly (young female) strangers?

Using female names (especially "attractive" names like Jessica, Tiffany, etc.) is an old trick. Males will generally be more receptive to opening an email with a name like this in the from field.

Yeah I get that. They they wouldn't do it if it didn't work, though I'm not sure quite who falls for this. In any case, I'm just amused that mine are so heavy on the Jessica.

Kind of talking to myself here, but thought it was worth noting that "Jessica" was the top female baby name of the decade for both the 80s and 90s: http://www.ssa.gov/oact/babynames/decades/

True, or just have a setting in Gmail to choose how aggressive you want the spam filter to be.

I don't use gmail, and that's how I've set up procmail custom filters with spamassassin for nearly the past decade.

The nearly certain spam is delivered to /dev/null. The very likely spam is delivered to Junk.

Two folders?

Spam For Sure

Spam Maybe

I regularly get emails (from Apache mailing lists if you care) that go to spam. No amount of flagging fixes it. The way to deal with it is regularly check the spam filter, train the filter every time it makes a mistake, occasionally sacrifice a goat and pray to Google that they'll fix it.

Then again, you could say it's just a cost of using GMail, and it's offset by all the correctly identified spam emails.

You can set a filter on any kwyword / subject / sender / receiver and mark it as "never send to spam", that might help you.

Thanks very much, never saw that!

This has happened to me too, several times with increasing frequency over the last two years. My messages end up being labelled as 'spam' even with people that I have long standing relations with and that blindly rely on gmail to do its bit. Extremely frustrating because even though I'm not a gmail user this still affects me.

This might be pretty obvious, but if you forward personal email to a google apps account a lot of stuff like this can happen.

The other way, too. If you forward email from a google apps account to gmail (or another google apps account), funky stuff happens.

Colin from customer.io wrote about this once... http://iamnotaprogrammer.com/Dont-Forward-Google-Apps-to-Gma...

Noticed a lot of false positive too since the new year.

After investigation, it appeared that the ahbl.org's RBL was now wildcarding everything as spam : http://ahbl.org/node

In my particular case I use my own servers that flags mails before sending them to google. I doubt that google relies on ahbl but most servers with a old version of spamassassin do.

IMHO Those false positive by gmail might be due to a large amount of false positive forwarded to them by servers that rely on spamassassin.

I kinda wonder if a significant portion of the issue is people who sign up for stuff, then rather then unsub, just tell gmail it's spam?

Due to the simplicity of my email address, I get emails almost everyday from stuff other people have signed up for and mistakenly put in my email address thinking it was theirs. Sometimes if it's in a foreign language I can't tell if it's spam or just normal stuff that came to the wrong address.

I have had moments of frustration where I've selected a whole group of these emails and marked them as spam in the hopes that they would stop showing up in my inbox.

It turns out this is a serious problem with using a gmail account. I receive all the time email that was sent to some other unrelated person with similar name. This happens with lots of people that I know. I decided it is better to stop making gmail my main email, and treat it just as a junk mail destination.

If I receive a newsletter and there is no way to unsubscribe without logging in and/or going through hoops, I mark it as spam.

If I sign up on some service with email+service@gmail.com and I get a newsletter from some other unrelated company, I mark it as spam.

I could see that being a problem for some people, but I know better than to do that. I'm very careful about what I put in spam myself, and careful to mark as 'not spam' all the things that aren't.

What's really amazing is all of the emails I've fished out of there that are pretty much exactly like tons of other emails I've been receiving for years, often from the same address.

I mean, an Erlang-related email from erlang-questions (which I've been subscribed to with Gmail for 6 years) with "gen_server" in it... that's just not spam.

As someone who maintains email campaign servers and mailing lists, most people who report as spam do so as a last resort. Many have tried to unsubscribe, and often their email isn't in our system; usually they have a mail list forward it to them, and they don't know which mail list.

I've actually been getting a lot of false negatives in Gmail lately. It's funny that it's much more surprising to see spam in my Gmail account now just because it was so rare before.

Motivated by this thread, I just checked my personal account (gmail, email address hardly publicized at all) and my business account (GAE business, email address widely publicized). In the interest of balance, my anecdata is that the gmail anti-spam filters work very well for my two accounts.

No false positive spam at all in the personal account, and only a few dozen spam over the last several weeks.

One false positive spam in the business account out of the several hundred I bothered to check before I got bored (I've close to 800 spam over the last few weeks) - and that false positive is something I could easily have ignored.

Works for me, FWIW, YMMV and likely does.

The worst one I got was a legitimate email from Origin telling me that my password was changed.....someone broke into my account and changed everything they could. It went to Gmail's spam,because whoever broke into my account changed the language to Russian - so the email from Origin arrived in Russian and automatically went into spam - even though it was completely legitimate.

People who are capable of running their own mail server should really consider doing so. It's super easy with projects like sovereign (https://github.com/al3x/sovereign).

More privacy and more power over spam filtering.

I used to run my own server, but like I wrote, Google has in the past pretty much been the best in the business at this for a while. Whatever hacked up spamassassin type thing you run on your own server doesn't compete with the resources that they bring to bear on the problem. Also, my time has been better invested in working on other things, like my side projects or open source, rather than playing sysadmin.

But something has gone seriously awry :-/

I run my own mail server. I recently started finding out that my friend's gmail account has been classifying my messages as spam. I suspect the general trend is going to be increasing hostility to emails sent by small servers.

I found that DKIM and SPF decrease greatly the Gmail hostility. Not so difficult to implement. Overall, I fully agree on your suspect.

We recently ran into the issue of Google sending all of our new client emails to spam. At first I chalked it up to Google launching a competing service but we switched to Amazon's SES service to send email and it entirely fixed the issue.

Just out of curiosity, are free mail providers like Gmail and Yahoo! mail using postfix and Dovecot to manage their mail services? Also to filter spam, do they rely on spamassassin or other free software with some customizations?

No, they rely on in-house solutions.

Google is struggling to identify and sort the blizzard of emails I get from Paypal and Ebay.

Some stuff I want in my primary account. Some stuff I want in the commercial tab. A lot of it I don't want at all and I'm really happy if Google marks it as spam.

I don't blame Google at all for this. The blame lies squarely with fucking arseholes who have no concept of me not wanting their shitty fucking email. "We get more conversion if we poke people in the eye with this pointy stick!" Maybe, but you're an arsehole if you do and you should feel bad for doing it and you should stop doing it.

"Ironically, I am a paying customer of Google as of a few days ago, in order to have extra storage space."

I don't believe paying for more space changes your free Gmail in any way. I too used free Gmail and paid for Drive space but it's not comparable to paying for Google Apps.

I've received 3 emails from Apple in the past about requesting promo artwork for my app for a possible feature, and every one of them landed in the spam. The first time it happened, it was only 2 days before the request expired. Ever since then, I check my Spam everyday.

Might be a good idea to make a folder and a filter for Apple mail :)

"I live in email… and if I can’t trust it, I’m in big trouble."

Sorry, who told you that you could trust email? It has been broken ever since we decided an arms race was the best way to fight spamming.

No, I don't have a better solution. Doesn't stop email being broken.

I run a small server that's never been an open relay and never spammed, yet deliverability to Gmail has long been poor. Very frustrating and from what I've read all too common.

Once you start paying for something many web services suddenly get worse. For instance get linkedin premium and you find yourself sandboxes and the moment stumbleupon realizes you are a mark who will pay for traffic the organic traffic stops.

Its happened with me too, and a friend of mine as well. Funny story with that friend- he missed summer job emails because of Google, and cursed them a lot :-). But later ended up visiting Google for his summers.

... for this one particular person.

Well, yeah, and since that one person is me, I'm very, very unhappy with it.

Could you post a few example emails, with all the headers?

Most of it's private stuff, either work, or board [at] apache.org or personal emails. I'd be happy to share with someone at Google, but I'm not going to put them in public.

What about just the headers, no body?

Here's one. This was in spam too: http://code.activestate.com/lists/tcl-core/14455/

"Be careful with this message. Many people marked similar messages as spam." is what Gmail reports. I've been subscribed to tcl-core for years, too.

Can you post the headers?

I'd be happy to share a few of these from the open source lists privately - you can write me, hope it doesn't end up in spam, and I'll send you the complete headers.

But your headline reads as if it's a general problem.

It is general enough that it affects me too. Also, if with a service the size of gmail someone has a problem you can bet that that problem affects at a minimum thousands of users.

IT affects me do; I manually clean out my spam filter daily now just to keep the volume 'low'. I easily get at least one false positive daily.

Can you explain what you're talking about? Do you mean that if a program exhibits bugs only under a certain condition that its bugfree? And even assuming you know the spam's distribution, the error function needs to heavily penalize any false positives. So even one occurrence is a major event. Your comment reeks of blind faith, not appropriate for a forum like this

It is not blind faith, it is the f*ing scale. It is hard to argue that one occurrence is a major event when you are at the size of gmail.

It's never one occurence.

The thing that annoys me most is that now GMail by default shows images which can allow spammers and marketers to detect open rates and validate active email addresses.

This is made worse by the fact that marking messages as spam requires you to open the email first. By the time you open it, the spammer knows your email address is valid and active and even if you mark it as spam, your email will probably have made it to some other list of 'validated gmail users'.

The images are precached by Google so that this doesn't happen. The requested image doesn't come from its originating source.

For example, here is Basecamp's logo as it's displayed by Gmail:


Google acts as a sort of proxy for those images, so by setting certain http headers, the marketers can make the proxy access the original image every time the email is being opened. So they can track the time you open their email.

The thing that annoys me most is that now GMail by default shows images which can allow spammers and marketers to detect open rates and validate active email addresses.

These images are proxied, so the sender can't tell if you actually viewed them them. Only Google can. http://gmailblog.blogspot.com/2013/12/images-now-showing.htm...

You can also disable the "auto-viewing" of images, as described in the link above.

This is made worse by the fact that marking messages as spam requires you to open the email first.

You can still check the little box next to the email (in the inbox view) and then click "spam". This allows you to mark an email as spam without viewing it.

GMail is not displaying the images from remote hosts by default. They've recently started to proxy the images on their servers, thus they can check them before showing them to you. And also this did the opposite of what you're saying about. Marketers/spammers will get no opens info given only GMail will fetch the image resource.

You are missing that when Google is fetching given image it means the email was most likely not labeled as spam.

I'm not sure where this is coming from as I can mark messages as spam without opening them using both the android app and browser.

Also, in the settings you can set it to ask before displaying images.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact