
How I sent 300k emails through Github's API in a matter of minutes - badlogic
http://www.badlogicgames.com/wordpress/?p=3176
======
badlogic
That took my server down i'm afraid. Good day today. Here's a "cached"
version.

To all watchers of the libgdx repository: i’m terribly sorry and hope i didn’t
interfer with your work in any way

This is meant as a cautionary tale about using Github’s API on a repository
with quite a few watchers (460 in this case).

Earlier this year we migrated our code from Google Code to Github. We didn’t
have a good migration plan for the 1200 or so issues back then, so we kept
them on Google Code. We now have about 1700 issues on the tracker

Today i finally wanted to tackle the issue tracker migration, using a Python
script [1] i found on Github. The script requires one to specify a Github user
account that owns the repository the issues will get migrated to. I did a dry
run on a fork of the main repo using my Github account, fixed up some issues
in the script, and validated things to the best of my abilities. Things looked
good.

Then i ran it on the main repository. Luckily i was watching our IRC channel.
After about 4 minutes, people started to scream. They each received 789
e-mails from Github. Every single issue i migrated, and every single comment
of each issue triggered an e-mail notification to all watchers of the main
repository.

This wasn’t apparent to me during the dry runs, as i used my own Github
account. The script posts all issues/comments with the user account i
supplied, so naturally, i did not get any notification mails.

I stopped the script after 130 issues (4 minutes), and immediately started
sending out apologies and a mail to Github support, to which i haven’t
received an answer yet. I send roughly 300k mails through their servers in a
matter of minutes. If i hadn’t watched IRC, i’d have send out about 4 million
mails to 460 people within an hour.

Let me assure you that i’m extremely sorry about this incident. I know that
things like this can interrupt daily workflows quite a bit, even if getting
rid of those mails is not a Herculean task. I’d be rather upset if a repo
maintainer pulled something like this on me. Please accept my deepest
apologies.

The lesson for Github API users: think hard about the implications of
automating tasks through the Github API if you have more than a few watchers.

The lesson for Github/API designers: consider safe-guarding against such
issues in your API, in case other idiots like me pull off something similar in
the future.

[1] [https://github.com/tgoyne/google-code-issues-
migrator](https://github.com/tgoyne/google-code-issues-migrator)

~~~
driverdan
Install W3TC[1], enable all the caches. Test to make sure it doesn't screw up
your CSS and JS. Problem solved.

[1]: [http://wordpress.org/plugins/w3-total-
cache/](http://wordpress.org/plugins/w3-total-cache/)

~~~
spion
Why don't they make this the default? Almost every time I see this happening
to a blog, its a Wordpress instance...

------
chrisacky
It's not your fault.

It's actually a PITA to overcome issues like this on a technical level because
you have to run something akin to buffer queue, that works similar to how
"debouncing" works.

The best approach as I have found is to...

\- You rate limit events as they happen... So you might let 5 events through
(within 10 minutes), and then start to rate limit them by adding each item to
a queue which you will merge down every 10 minutes (but that
exponentially/incrementally back off each time you exceed the 5 items, so the
next queue takes 30 minutes before it's popped, and then 90 minutes... etc)

\- So for example, you might have an instant pop from queue where less than 10
events have been triggered within 10 minutes.

\- Then if more than 10 events have been triggered, you add each item to a
queue, and after X minutes, you pop each item off and send a bulk email.

\----

It's a real pain to manage such a system, because your "typical" job server,
such as Gearman, doesn't let you add a "delay" on jobs...

Ideally, you'd want to make sure that you ignore any new events for at least X
minutes... So you are left with the only option of running _another_ pseudo-
queue system just to catalogue all of your throttled events.

Let's talk strategy. How else do you guys handle instant email
notifications?... without this spamming issue. PS. I'm referring to GitHub
implementing this strategy, not the OP in case there was any confusion.

~~~
seiji
The problem is github makes all notifications STD-level contagious. They broke
out "watching" and "staring" a while ago, but they are still _very_
overzealous with notifying everybody about everything.

The poster didn't use "send email" API, he was just automatically importing
things, and every import triggered emails nonsensically.

~~~
giovannibajo1
The problem is that GitHub doesn't really have an import/export API; when you
migrate from google code or sourceforge, all issues will appear created and
commented from your own account, and at the current date/time.

I guess there's a commercial reason for this:
[http://giovanni.bajo.it/post/60836467126/github-is-
missing-i...](http://giovanni.bajo.it/post/60836467126/github-is-missing-
import-export-for-commercial-reasons)

I was going to make the same mistake myself (thanks OP!). Is there a
workaround?

~~~
plorkyeran
> Is there a workaround?

Not that I know of. I added a note to the README about it sending lots of
emails to help others avoid accidentally doing this, though.

~~~
badlogic
I was about to send a PR for the README, will save a few folks some headaches.

------
shortstuffsushi
I caused a similar issue while running edits on a Confluence Wiki instance as
an intern. I was helping our publications department add some macro to every
single page of the site, which I found out they were doing by hand! A bit
shocked, I told them I could write something to automate that in a matter of
minutes.

Sure enough, several minutes later all of the pages were updated. All 50 or so
pages in each of the 15 spaces. And everyone who had ever touched one of those
pages got an emailed for that page.

The nice thing about the Confluence API is that you can specify "minor"
updates to prevent exactly this scenario from happening.

I guess since GitHub is built on the git foundation, adding some sort of
"silent" flag might not be as easily possible, but certainly it's desirable.

------
adamnemecek
I'm surprised that Github has not implemented a feature for repo migration
from google code and sourceforge.

~~~
1qaz2wsx3edc
I don't think they view that as a significant growth vector. Skimming an
existing market is rarely a good tactic.

Plus it's up to the developer. We can't have one-click-do-all buttons for
everything.

~~~
plorkyeran
Even if there's no one-click option, it'd be nice if the API exposed a much
more flexible set of issue editing options to make it possible for us to write
better importers ourselves.

~~~
asveikau
If only they had access to some kind of command line tool which lets you bulk
pull and push and merge histories. Do you think they have something like that
at github? :-)

~~~
recursive
There is a command line tool, unfortunately it's not very usable.

~~~
claudius
I don’t think the GP was talking about ed scripts.

~~~
asveikau
They say you shouldn't explain your own jokes. My comment was that github is
built around a tool that makes it really easy and efficient to import and
merge histories. Namely git. It's amusing that the value-add portion of
github, the web bits, suffer from problems like this, that a tool like git
could conceivably be applied to.

~~~
claudius
Indeed they do say that. My – admittedly rather poor attempt at a – joke was
that the ‘command line tool’ to which you were referring is indeed quite
usable and user-friendly; as opposed to, say, ed scripts.

------
gexla
Following just one busy repo can take over your inbox (ahem, Docker.) So, I'm
sure your people have good filters in place so that they aren't too distracted
from a flood of messages from Github.

~~~
nickstinemates
Take a look at
[https://github.com/jpetazzo/gunsub](https://github.com/jpetazzo/gunsub) to
get the notifications under control.

------
lnanek2
Not his fault and pretty cool he was in touch with users of the library enough
to catch it and stop it.

Always figured I'd do Cocos2d-x or Unity for any serious game I do next. I
used Cocos2d before and written Unity plugins before. I even have a contractor
working on a Unity project right now. Will have to give libgdx a few extra
points when deciding in the future, though, for having a caring maintainer.

I actually wrote an OpenGL game engine for Android back before any of the
later things came out like Replica Island, AndEngine, the Cocos2D port, etc..
Almost makes me wish I'd open sourced it. It did have some awesome stuff like
batching all the sprites with similar draw states together into one draw call.

~~~
badlogic
I hope there's plenty of other reasons to give gdx a few extra points, at
least compared to Cocos2d-x. Being able to write your game for desktop, iOS
and Android with Scala or better performance than Cocos2d-x for example. SCNR

Glad to see other Android "old-timers", would love to see what you came up
with back then.

------
russell_h
Do people actually let GitHub emails go to their inbox? After they got really
aggressive about signing me up as a "watcher" to repos I found it necessary to
just route all GitHub email to a a dedicated folder that I never read, then
only whitelist repositories I actually care about.

~~~
holman
> After they got really aggressive about signing me up as a "watcher" to
> repos.

You may want to turn off auto-subscriptions to repositories you have push
access to: [https://github.com/watching](https://github.com/watching)

~~~
nemothekid
Wow finding that page is a nightmare. I was recently trying to find out how I
can stop auto-watching repos I'm giving access too (like if someone creates a
repo under an organization) and I searched all over the settings for this.

Thank you.

------
TallboyOne
Well, that is a big yikes, but crises (mostly) averted.

------
andrewljohnson
I did this with my a company repo, when I wrote a script to migrate issues
from a spreadsheet into GitHub. I only sent 50 issues * 15 people though.

------
shitlord
Anyone have a mirror or cached copy of this page? This site hit the front page
of hn/proggit twice this week, and it was down both times.

~~~
Amfy
Yes, pretty annoying. We should have something that takes a copy of a page
before it will be shown here.

------
scottcanoni
Site is majorly foobared

------
heeton
Recently: And a bottle of rum ([http://www.amazon.co.uk/And-Bottle-Rum-
History-Cocktails/dp/...](http://www.amazon.co.uk/And-Bottle-Rum-History-
Cocktails/dp/0307338622))

I loved it. A history of rum, including all of the politics around it (like
the role it played in the slave trade and American independence), great read
:)

~~~
darkstar999
Ehh? Is this spam or what?

note: the site is down so I don't know if this is a reference to the original
article

~~~
alecsmart1
The author has posted the article in the comments. Read the third or fourth
comment.

