

Github is down - jayniz
https://status.github.com

======
drostie
I've noticed that these are not immediately marked [dead] by the Hacker News
admins like some other spammy topics, and I guess this is a sort of gray-area.
On the one hand, it is in some sense a "new thing" and it affects the
community; on the other hand it does not particularly gratify anyone's
intellectual curiosity to be linked to that status page. The later explanation
and diagnosis of what went wrong, if it's made public, might be interesting:
just knowing "you can't use this now" isn't of that caliber though.

It's also puzzlingly dynamic: In other circumstances the "current status"
pages linked during outages have not been blog-type, but have instead just
literally reported the current status, leading to links which say "X is down"
-- only for you to click it and see "X is functioning normally." This has
already happened for the folks who linked the GitHub main page, which loads
normally. (And won't it also prevent the same URLs from being submitted at a
future date?)

~~~
TazeTSchnitzel
Also, many of these things, by the time some people (such as myself) see them
on the front page, are already fixed.

------
hinathan
This feels like a pretty standard pattern for a lot of services — fail, come
back up on backup DB, fail again when backup proves to not be capable of
handling the surge of load, then eventually come back up on the primary DB
once people have gotten bored and stopped hitting 'refresh'.

Is that a function of not prewarming failover DBs, or is there something
pathological about the primary-secondary pattern?

~~~
minikites
Maybe we just don't hear about situations when the backup/secondary server
succeeds in picking up the slack because it would be transparent to the end
user?

~~~
hinathan
Selection bias, good point.

~~~
mechanical_fish
Yes, and the selection bias goes even deeper: The simplest possible failover
logic is "if something is wrong with our ability to talk to the database, try
the secondary database". But, in that case, you almost never see a broken
website running on its primary database. Inevitably, the site has already
tried failing over to the secondary before it gives up and yells for help.

On the one hand, this is not a good thing, because if you've got a problem
that's unrelated to the database (e.g. too much traffic is choking up your
supply of DB connections) and then you do a failover, now you have two
problems - or, at least, more moving parts to sort out before the situation is
resolved. So it's tempting to design a more clever failover scheme. But, on
the other hand, cleverness is itself a risk: Not only might your clever
algorithm have an even-more-clever pathological failure mode, but it's harder
to understand in an emergency. When your stuff is broken, simplicity is your
friend. All else being equal, you don't want your front-line emergency
responder to have to understand complex failover logic. There is nobody more
frustrated than an ops engineer who can't make the system use the primary
database because some stupid bot keeps forcing the use of the secondary, or
vice versa. In the heat of battle, they're liable to comment out your clever
bot and replace it with a one-line shell script.

Engineering is a difficult balancing act.

------
Swizec
If only there was some way of using git locally without needing github!

Oh wait, this isn't svn.

~~~
nirvdrum
Aside from the fact this argument falls apart when you use submodules or third
party repos, a lot of us use GitHub for the services they offer beyond just
git.

~~~
tow21
If only there were some way to mirror submodules or other third party repos to
your own servers without needing github.

Oh wait, this isn't svn.

~~~
detst
I think he's also referring to website hosting, issues, wiki, etc.

------
Peroni
Frontpaging on HN will certainly help.

~~~
Andrex
The purpose of a "status" page is ostensibly so that it can get the most
visibility when the service is down, so that people aren't constantly
refreshing the main site.

~~~
patrickaljord
Also the status page is usually hosted on a different server than the service
it reports on. Otherwise it would go down with the main service and be pretty
useless.

------
orangethirty
Now that is a good service status page. It made me go from "Github sucks" to
"Go github." It is well pieced, informative, and upfront about the whole deal.
Their auto refresher seals the whole deal. It shows they are confident on
their skills to get the problem fixed. Bravo github. Well done. Now hurry up
and finish so you can answer the support email I sent this morning. :)

------
anupj
I can see a github post-downtime analysis blog post coming up (I hope). :)

------
peterwwillis
Question for the Github people: Why not keep serving non-stale cached data
while your databases are down?

You can do this with proxies or by modifying your code to always serve out of
cache, and the db updates the cache, so if the db is down, the cache is your
temporary failover while you fail over to the secondary db. ('cache' is
anything memcached-like that's separate from your db)

~~~
jeremymcanally
We do in a lot of cases. We just don't have every detail in the whole app
cached. :)

~~~
benatkin
You don't have some things cached that should be obvious to cache, like the
HTML for the most popular repos. Loading the commit messages from JSON for a
page that's being accessed tens of thousands of time a day is less than ideal.

It isn't just github, it seems like a lot of web apps don't use the lessons
learned for web sites.

~~~
peterwwillis
I think there's a culture with modern developers that says using older, less
sexy technology isn't going to work as well as newer, sexier technology.
"Cache HTML?! That's so inefficient!" Yeah, it also just works, too. When all
your databases, content engines, storage services, deployment tools, etc all
take a crap, your clunky little web proxy cache keeps right on humming and
your customers get at least a half-functioning site, if they notice at all.

------
mylittlepony
If only there was a better git hosting provider. Oh wait, I use bitbucket.org!

------
binarydreams
Yay!! We made it, frontpage on HN!

------
lazyjones
Cool, an auto-refreshing status page that will mostly be looked at when the
servers (possibly the network) are already stressed.

~~~
tomschlick
That page is hosted on separate infrastructure than their production app.

