Hacker News new | comments | show | ask | jobs | submit login

I get outages are a way of life with a widely used web application, but Github have really dropped the ball lately. This is one of many service outages lately and as a paying customer it's disheartening and worrying because I use Github in my day-to-day workflow, I and many others have come to rely on it. Don't get me wrong here, I love Github and couldn't live without it, but they really need to sort out these problems and it's not like they don't have the funds to address the issues anyway. My knowledge of distributed computing is somewhat limited, but I would have thought they'd just spin up a few extra virtual machines to handle the database spike (maybe it's not that simple with Github's setup, I'm not sure).

Databases typically don't scale horizontally with so much ease. Scaling high "write" data stores is not a trivial problem. But I feel your pain.

Github have their own datacenter and hardware, they are not relying on any cloud provider out there. It makes it harder to handle spikes of load, but they actually have an I/O intense service which justifies that choice.

My favourite line from that linked blog post from 2009 is this: "We're aware of the current stability and performance issues, and we want to let you know what we're doing about it." - issues they were having nearly 4 years ago are still happening unless the problems they've faced lately are completely different.

The move really did fix their problems back then.

There were a period where they were really having visible scaling problems, with response times often getting painfully slow - apparently especially due to slow I/O. These problems completely disappeared after their move to Rackspace's managed hosting.

Now it looks like they might be hitting a new barrier that might require architectural changes to overcome.

I understand when my non-technical clients ask questions like, "I thought we fixed X?" or "shouldn't Y have been fixed by now?", but I would have hoped that fellow developers would cut each other some more slack, you know?

There are an infinite number of reasons an app could fail at any given moment. Just because we see the same "something went wrong" page 4 years later doesn't mean we're seeing the same problem, or even the same class of problem. It just means that the sun still rose in the east this morning, my stupid car will probably have something new wrong with it when I leave my apartment in a few hours, and software is still really, really complex.

The GitHub team is a smart bunch. I highly doubt they're still dealing with the same class of issues that plagued them four years ago, and it just seems kinda strange to even bring it up.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact