Hacker News new | comments | ask | show | jobs | submit login
How to get gmail.com banned (2011) (mailinator.blogspot.com)
327 points by makmanalp on Aug 4, 2015 | hide | past | web | favorite | 60 comments

Wow. Do my daily HN scan for the day and find an article you wrote ~4 years at #1.

I hadn't read that in many years, and what fun to do a re-read.

Thanks Internet - don't stop being you.

I was sad to see that the link to the domain generator was broken. The new one on the home page is a div that's generated server-side.

I hope you don't mind that I wrote a quick one-liner to see if you're still detecting bots...


Yup :)

I didn't see any "evil" insertions, though...

I tried something similar and got a similar pattern. After a few changes, I just started seeing mailinator and mailinator2 over and over.

It looks like an exponential backoff except that it starts to favor the generic domains instead of refusing connections.

I saw the same, but just by refreshing a bunch of times.

A few years back we came into work one morning to find that some bot was scanning our site so hard that it seemed the lights nearly dimmed. Some detective work suggests that it was a service performed on behalf of a competitor, to get our price list (bear in mind that our catalog has a few hundred thousand products).

We were really annoyed that rather than just ask us, they had launched what amounted to a DDOS attack. So we thought about how we might exact vengeance...

After a few hours we figured out a pattern to the rogue requests that allowed us to filter them, despite their efforts at stealth (like, they cycle through a list of various user agent strings to make it look like there are multiple different users). We toyed with the idea of, rather than outright banning them, making our pages sensitive to their presence, so that when we detected them, we'd display a false price, defeating their whole operation.

We finally just decided to take the high road, temporarily banning any rogue IP addresses we detected (we couldn't make it permanent because many of the requests came from the Amazon cloud, from which we also receive some legitimate requests)

EDIT: you wouldn't think that requests for a few hundred thousand products would amount to a DDOS, but the bot was rather poorly written and grossly inefficient in the way it walked through the list.

I built a system called caltrops that did almost exactly that. As a given session's requests grew more and more suspicious, their data would skew from reality further and further. A real user on the line would notice immediately (and the more real-looking the user interactions, the more it would reduce suspicion), but competitors scraping our data would get pretty deliciously bunk data.

to deal with similar problems, kickstarter built a pretty useful tool called rack-attack https://github.com/kickstarter/rack-attack

This is a most excellent idea!

Btw, did you actually return incorrect price data, or did you just insert random bytes, etc.?

"Thousands of people use Mailinator everyday, so clearly, its a useful tool that many sites accept"

How many of you would have an outright revolt on your hands from your QA/QE folks if you banned mailinator? I think everyplace I worked would experience this same issue if we did this.

Could use + in the first part of email such as: youremail+blahblah@example.com to create throwaways. most sites consider those to be different email address then youremail@example.com for account purposes but email service, who respect the rfc, will threat them as the same.

Many sites won't accept email addresses with + in them, because many devs have extremely wrongheaded ideas about validation.

I used to have a first.m.last@university.edu address and that one was touch-and-go as well due to the fact that the mailbox had two .'s in it. I actually had to file a support request to get Amazon Student to accept it, even. Nobody from a university with that scheme ever registered before?

For the record, the gold standard for email validation is "send a confirmation link and see if they click it". Don't try and get fancy.

One other trick is that Gmail ignores .'s in addresses entirely. first.last@gmail.com is the same as firstlast or f.irstlast.

What's worse: some sites accept email addresses with + in them and then years later, they stop working, and you can't log in to fix your email address.

Ask me how I know this.

Hah, this happened to me once. Turns out the email validation was occurring client-side though... so a quick edit later and the server still gladly accepted my '+'-enabled email address. :-)

>For the record, the gold standard for email validation is "send a confirmation link and see if they click it". Don't try and get fancy.

But make sure you have some sort of rate limiting set up, so malicious users can't take advantage to spam someone's mailbox (and get your server blacklisted).

My favorite is e-mails with three dots in them. Which is actually not a valid address - the RFC specifies that you must have a valid textual character between dots[1]. However, because of poor decisions by Japanese telcoms, a substantial chunk of their users have 'e-mails' associated with their mobile phones with three dots, breaking goddamn every sensible validation script.

[1] https://tools.ietf.org/html/rfc2822#section-3.2.4

> Which is actually not a valid address

That means validation was working correctly. The email addresses are invalid even if the server will accept email for them.

There's only one sensible validation script: "Send an email with a confirm link".

Taking action with side-effects on entirely unvalidated user input is usually a pretty bad idea.

Sadly, if you're sending e-mail sanely, your mail provider likely validates recipients, and will be annoyed at you if you send them recipients they think are bogus.

The relevant rfc (on mobile; don't remember which) specifically states that intermediate servers must not validate mailboxes (local parts). And honestly the domain should be "validated" by the server doing an mx lookup; let dns handle it.

So wait, what now? So you can have an email address like first.last@isp..co.jp? Can you give me a generic example?

Yeah, I'm with the other guy, regardless of whether or not it's a good idea to do validation (it's not), that's not an address that should pass validation because it's not a valid domain or hostname.

I could see it being less of a big deal in the mailbox portion given that it's now kinda kosher to ignore dots there.

Yep, for some reason my wife's emails to her sister[1] always bounce unless she uses the desktop, it's madness.

[1] docomo customer with two dots in email

This is why we allow subdomain magic at FastMail, so that if you wanted to use myname+foo@fastmail.com you can use foo@myname.fastmail.com as well, and it works fine. Everyone accepts that form.

I had a first.m.last@university.edu (though I have a really common name so it was actually first.m.last.3@university.edu), but fortunately they also gave us 8 character usernames (which were also our login to our shared hosting on the Sun E6500 machine), but I never used the long form since it was rejected nearly everywhere.

My university address at the moment causes similar issues, it follows the first.m.last@group.uni.edu.au form which throws off lots of input validators.

All this time I've been telling people my email has a . in it, I had no idea I could remove it and still login.

In the early days, you had to use the username as it was registered to log in.

> Could use + in the first part of email such as: youremail+blahblah@example.com to create throwaways. most sites consider those to be different email address then youremail@example.com for account purposes but email service, who respect the rfc, will threat them as the same.

There is no RFC that requires this behavior. Subaddressing within the local part is recognized as a common practice (e.g., in RFC 5233), but nothing requires a system to support subaddressing, or requires a system that does to support a particular separator character or character sequence (e.g., "+") for subaddressing. Email systems are free to implement or not implement subaddressing, and to use any character sequence they want as the separator.

I've always found this to be such an odd option. If the concern is people spamming you or selling your email address, how does this really help? Anyone intentionally doing nefarious things can just add a simple regex to strip the garbage.

The parent mentioned QA, it would work for that. For antispam you could add a filter for emails to the direct address without suffix. Personally I prefer aliases, but I think they are very rare on freemailers.

> Could use + in the first part of email [...] email service, who respect the rfc, will threat them as the same.

I'm not aware of any RFC that says that mail sent to a+foo@example.com should go to the same mailbox as mail sent to a+bar@example.com (nor am I aware of any RFC that forbids this). I thought that GMail made up that feature and other vendors followed suit since users find it handy.

    > Subaddressing is the practice of augmenting the local-part of an
    > [RFC2822] address with some 'detail' information in order to give
    > some extra meaning to that address.  One common way of encoding
    > 'detail' information into the local-part is to add a 'separator
    > character sequence', such as "+", to form a boundary between the
    > 'user' (original local-part) and 'detail' sub-parts of the address,
    > much like the "@" character forms the boundary between the local-part
    > and domain.
(Highlighting by me)

The RFC even gives an example using the hash:

    > o  A message addressed to "5551212#123@example.com" is delivered to
         the voice mailbox number "123" at phone number "5551212".

This is how we do it, with a predefined string after the first plus. Makes it easy to prune test accounts from the members db.

One way to get around a domain blacklist is to point your own domain to Mailinator. Heck, since last year you can even get your own private Mailinator...


This reminds me of the sites that discouraged hotlinking by examining Referer and then sending Goatse.

I'm glad I was brave enough to click. b^)

That was a fun read.

It took me a bit to get my head around the use cases. It's sometimes amazing how many different ways you can twist a simple (complex really) thing like email into a product/idea.

This is great, but it seems like he got rid of the separate page now (the link in the article 404s) and the text is just inline again.

I love mailinator!

However tricking site scrappers may not work perfectly if the site scrappers maintained a list of websites in their "whitelist". Say if I am scrapping mailinator.com for domain names, if I see gmail.com or yahoo.com, I might just not put them in my database because they are in my whitelist.

I've used Mailinator for years and it's always interesting to read what this dude has to say.

Mailinator seems to have added some other anti-scraping detection.

Unfortunately it does not work very well as I was not scraping mailinator, but still somehow got IP banned. Fortunately my ip has changed. But they definitely have some strange and overzealous method now.

Here's a list of disposable email domains if you'd really like to block them: https://github.com/lavab/disposable

I would go one step further and look for {spam_words} in "username+{text}@{googledomain}.com", where spam_words can be "junk", "spam", etc. This is like a very narrow edge case, but still might catch something. Again, if you're into that kind of thing; I'm quite skeptical that it brings any value.

until you have a the german guy, Joseph Unker, with junker89@gmail and your validation prevents them signing up :)

That's not at all what I meant. Gmail redirects emails to "youremail+whateveryouwanthere@gmail.com" to "youremail@gmail.com". Some people filter emails this way, henceforth my suggestion to check for this edge case.


PS: The downvoters are downright silly on this website lately.

Or someone from Scunthorpe (cf the "Scunthorpe Problem").

Or Dick van Dyke.

Great story! Before deciding to use blacklists or lockouts (on anything), know that it can and will be used against you.

So much fun to read. Great post, thank you.

Most of the comments here about '+' parts are rendered completely irrelevant by single-user domains. Not to mention "one email ~= one person" schemes.

Why not encode the domain strings into an image?

OCR requires a lot more programming effort compared to a text-based content scraper

FTA: "Could I make it harder to scrape? Well, I could, but wouldn't really slow anyone down much."

I think that's the basic idea. He could spend his time making it harder to scrape, like the bar across the steering wheel. Some people would be deterred, others wouldn't, and time would be wasted all around.

It's unfair to blind people.

I'm not sure his method would prevent a headless scraper like CasperJS or PhantomJS from doing the dirty work, but nice technique nonetheless.

At least at the time of writing, if you had enough foresight and engineering time to set something like that up, you had enough foresight and engineering time to not make your system treat email addresses as meaningful identities.

Perhaps I'm missing something, but an extremely high percentage of the sites I have accounts on use my email address for authentication. Those that don't often suffer from username squatting. Maybe most sites are just doing it wrong, but what's the prevailing alternative?

Your email address isn't your identity. It's a name associated with your identity, but the identity itself is your account. Or put another way, not all valid email addresses are valid identities for these websites.

If the website is doing things right, they have other means (like a CAPTCHA at the least, or phone verification, or you buying an item from them) before deciding that an email address really is an identity.

I guess I still fail to see the distinction. CAPTCHAs really only keep out bots . . . they do nothing for keeping out Mailinator abuse. Throwaway phone numbers are easily obtainable. They might not be as cheap as Mailinator, but the point is Mailinator made it faster and cheaper for people. Buying an item doesn't really work out when the expectation is you offer a free trial and that's where the bulk of abuse occurs.

I realize this was a non-comprehensive list and I'm not trying to just attack it. I think I agree with the core assessment around what constitutes an identity. But short of some really draconian methods, I think you're basically trading off one insufficient method for another. And at that point, you may as well focus on making things easy for people, which typically means just working with email verification.

FWIW, when faced with Mailinator abuse I resorted to requiring a credit card number to sign up for a trial of my SaaS product. The abuse stopped immediately. But there were other impacts to the business as a result. I still debate the wisdom of it and how much of this should have been foresight. As a bootstrapped company, dealing with abuse was just a resource drain and forced me to focus my efforts on dealing with a segment of the population that was never going to give me money. Suffice to say, it was all very disheartening.

Anyway, thanks for sharing your thoughts on the matter.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact