Hacker News new | past | comments | ask | show | jobs | submit login
GitHub is down (githubstatus.com)
107 points by romellem 4 months ago | hide | past | favorite | 102 comments

Hearts out to all the GitHub SREs who get to fix this, and all the SREs at GitHub’s customers who get to spend the next week convincing management and their peers that moving git in house won’t result in fewer outages.

Our on-prem GH enterprise hasn't gone down in the last year.

Additionally any potentially disruptive work like upgrades etc can be scheduled outside of business hours so that they won't impact users

In general, paying GH tends to be much cheaper than employing a team of SREs that'll keep your on-prem Git instance up and running 24/7 (esp. including things like backups, disaster recovery, and similar good practices necessary for any enterprise-ready SaaS).

Something tells me that if you can schedule upgrades "outside of business hours", your scale is small (because in a global company, it's always "business hours" somewhere), which makes the cost argument even more prominent.

You don't employ a team of SRE's with the sole day-job of git maintenance though, your existing SRE/sysadmin team manages it.

And any large business will already be doing Backups, DR etc for their existing systems.

> any large business will already be doing Backups, DR etc for their existing systems

Any? Are you sure?

Lol don't crush their optimism, the real world will do it for you in due time

Been doing it for 20 years so far.

If you work in a large business that needs git but doesn't have backups and has no disaster recovery plan at all you're going to have much bigger things to worry about than a git repo being offline.

I self-host Subversion and haven’t had an unplanned outage in more than a decade :) (almost two decades?)

How’s your experience with Subversion? What kind of content do store in SVN, BTW?

I really like it. I keep source code, 3rd party libraries, images, binary build output and installers. I can easily go back and debug and crash a customer ever sees because I have all the binaries and debugging files.

It depends on your needs and team size. For few tens of dev that need a versioning system with some sugar, a self hosted gitlab can acieve better uptime.

> a self hosted gitlab can acieve better uptime.

Gonna call BS on that.

It can achieve better uptime, if you ignore the downtime for upgrades, the downtime for configuration errors, the downtime when the disk fills up...

> It can achieve better uptime, if you ignore the downtime for upgrades, the downtime for configuration errors, the downtime when the disk fills up...

Besides the last one, the others you can schedule when developers are not actually working on something, or give a headsup so developers can be prepared in case of errors.

In the case of GitHub, Microsoft deploys changes whenever they want, whenever you want it or not.

> In the case of GitHub, Microsoft deploys changes whenever they want, whenever you want it or not.

That's because GitHub is used by millions of people every day all over the world. Business hours for them are around the clock. No matter when they update, if it breaks something, many people will hear about it regardless of when the break was deployed.

Yeah, this is exactly my point, their needs don't overlap with your needs, hence sometimes it's better to self-host, because then it's 100% your needs that gets taken care of.

Good planning and monitoring will also prevent the last point from becoming a problem.

As I said, it depends on your needs. I don't know if you have ever managed a small gitlab self-hosted installation, but it's really trivial, you set up it the first time, and you forget about it. Also, the upgrade process takes few minutes to complete.

Btw, you cannot compare a scheduled maintenance window with an unplanned downtime ... maybe in the middle of an important deployment.

> I don't know if you have ever managed a small gitlab self-hosted installation

Bitbucket Server

Most of the upgrades happen outside core work hours, which is easy to manage for smaller companies.

> It can achieve better uptime, if you ignore the downtime for upgrades, the downtime for configuration errors, the downtime when the disk fills up

So 10 minutes a month for upgrades, zero time for the rest because you are managing it just fine.

so true. At this moment everyone's like, well! This is UNACCEPTABLE. We'll just, we'll just... what, what are you going to do better than github?

>what are you going to do better than github?

Not add new features every week? Seriously, you cannot tell me that you cannot run a minimal version of a git repo service much more stable than that. I blame the need for new shiny stuff.

> I blame the need for new shiny stuff.

Ya'll make me laugh. Don't add any features and "This thing is dead, nothing new in years, find an alternative" But a company actively adds new features their customers are asking for and you're upset that they're adding new features.

Pick a lane.

There's a middle ground between "nothing new in years" and "new feature or changes every week".

Curiosity: Not everyone in Hacker News has the same opinions.

Yet in every thread on HN when a service has an outage, the same opinion is spewed.

I know the comments of these posts before I even open them. HN is an echo chamber of predictability.

That you keep opening the discussion page and even participate might say more about you than HN :)

We self-host Gitlab and haven't had an unplanned outage in a year since I joined.

I always wondered if some % of GitHub orgs move to Gitlab when this happens.

I'd be interested to see if there is any correlation there.

Yeah, because Gitlab would never go down.


Gitlab can be self hosted

And we all know self-hosted software never goes down.

Although I guess the benefit is that the CTO gets a psychological lift of looking over Jeremy from IT's shoulder when he tries to fix the issue rather than being at the mercy of some engineers which the company can't fire.

People like to make it sound like running your own server/service is impossible - completely forgetting that's how everything was done not that long ago.

Some wiz-bang new graduates calling themselves Engineers probably haven't a clue - sure, but for everyone else, you probably still have folks employed that did just that on a professional level.

It's not hard - and often it's less expensive. Particularly for a turn-key paid product like Github Enterprise or GitLab.

> And we all know self-hosted software never goes down.

I've been running self-hosted software, and GitLab in particularly, for over a decade at various jobs.

It almost never goes down. It fact, its up way way way more often that GitHub!

there's always gitlab

The fact that GitHub being down can have such an impact means that Git’s decentralised features don’t go far enough. Why aren’t issues, PRs and all these other little things we depend on also stored in a git repo? Or something similar that can withstand GitHub being down.

I set aside time today to go through my notifications and I can’t because of this. Notifications are trickier to decentralise but still, ugh.

"Everything goes in the repo" is one of the main features of the Fossil [0] SCM which is built and used by the SQLite team. It's a neat idea and I wish something like that had caught on.

[0] https://www.fossil-scm.org/

It's not too late! At one point, git was the new kid on the block while everyone else was using something else (or copy-pasting directories/zip files for versioning).

All it takes is more people using it, and people here on HN are more suitable than others to start playing around and build their projects with it.

There's so much activity around SQLite nowadays - surely some of it must be applicable to dialing up the coolness of Fossil.

Those who use it by emailing patches around aren't affected, ala the linux kernel.

They aren’t affected until the mailing list servers go down.

Then you can send PR's to the maintainers' direct inboxes, that'd still work.

Couldn't find any stats online about it, but how often does the servers for the mailing daemons for the linux mailing list go down?

What do you even mean? Github made a decentralized service a centralized one by encapsulating it. This “impact” isn’t the fault of Git at all.

That's why he said that Git's feature set doesn't go far enough. Most projects need an issue tracker and a tool for discussing proposed changes. These things are out of scope for Git, but since people needed them, they congregated around a centralised service that builds around it. Of course, decentralised alternatives for issue tracking and pull requests exist, too.

This is why Linus uses email. It’s the Unix way — Git is just another tool. It doesn’t need to be a website, and an issue tracker, and a change proposal database, and version control software, and a social media website, and pull request tracker. It just needs to be a version control software with some decentralization.

If todays developers were tasked to build the World Wide Web again, we would all log into some oAuth portal.

It is the fault of Git because it doesn't take it whole way. If the things I need to work are centralised then Git is not doing everything it should.

For comparison, check Fossil - https://www.fossil-scm.org/

Everything you described are features of GitHub not Git. They are not the same... despite the name.

With that in mind, your comment is similar to complaining about Jira being down, or Jenkins... they are different products.

> The fact that GitHub being down can have such an impact means that Git’s decentralised features don’t go far enough

Github was created to literally bring central server to a decentralized system. Yes, when that central server goes down, that's a problem for its users. It has nothing to do with Git.

You can, if you'd like, send Git patches in emails, or via snail mail, and you won't be influenced by any outages outside of individuals' laptops (which will affect only those individuals). Also, good luck managing your 250 microservices that way.

heh, this was my argument back in like 2009 when everyone was all, "DECENTRALIZED!! We have to take months to migrate from SVN to git cause DECENTRALIZED!! You don't one a single point of failure do you?!"

all I could say was, uhhhh github is a single point of failure. Unless you're mailing diffs around like Linus (which nobody I knew did), it's 6 of one, half-dozen of the other.

there are lots of great things about git and especially git over e.g. svn (especially merges) but I never bought the decentralization argument the way it was spearheading the conversation back then.

With git I can create, edit and merge branches locally. If GitHub is down I can push to another remote I can send diffs. I can calmly carry on working until GitHub is back. If GitHub is down too much I can migrate to gitlab or gitea.

I would say the decentralised features of git work for me. And they are no overblown.

Sadly, people seem to push their entire development flows into GitHub Actions, so things like deployments happens on GitHub's servers, and then their servers go down and you suddenly can't deploy anymore... Or writing their documentation via GitHub issues/PR descriptions/comments and now you cannot access that either...

If your SVN server went down, you couldn't even commit. At least with got, you can do everything as normal except push and pull, and that's only if you don't have multiple remotes (which, admittedly, very few GitHub users do).

> The fact that GitHub being down can have such an impact means that Git’s decentralised features don’t go far enough.

This is like blaming Bitcoin for your exchange collapsing. :p

The point has been missed hardcore.

I fail to see how this is git's problem. Who was forced to use GitHub because git doesn't provide issue tracking?

That being said, Fossil has what you want. Are you willing to switch?

Mine looks like (on git push):

  remote: Resolving deltas: 100% (4/4), completed with 4 local objects.
  remote: fatal error in commit_refs
  To $REPO
   ! [remote rejected] main -> main (failure)
  error: failed to push some refs to '$REPO'
Status Page: https://www.githubstatus.com/ -- actually shows red -- it was almost updated as fast as HN.

> Status Page: https://www.githubstatus.com/ -- actually shows red -- it was almost updated as fast as HN.

I think we should applaud this when it actually happens. Far too many services are terrible for this. While I'd rather the service not be down, it makes me feel a bit better if they don't lie about it.

In a world of green, an accurate status page is a radical act.

I'd buy this t-shirt

It's funny when a company like Amazon, which many would think is on the cutting edge of automation, implements their status page as a manually updated static page.

It's not on accident. There's SLA's and other things on the line when the status page officially recognizes an outage.

Wonder why mine[1] got marked as a duplicate when it was posted first?

1. https://news.ycombinator.com/item?id=35817640

Maybe because they linked to the incident instead of just the domain homepage that other people tried to post.

so unfair. This happens to me too sometimes. It's like not a perfect system.

While I understand where you as a person are coming from, and my below thought is a tangent, I think this sentiment is like the origin story of editorialized headlines.

If one gets dopamine/validation/click thru revenue from others consuming/clicking on media one shared,and further feel "unfairly gamed" when someone else gets that payout, it's a short time before one starts subconsciously appealing to the human lizard brain and sensationalizing media.

well put!

Tough making a living on imaginary internet points.

hey, it's a livin. Mcconaughey agrees. I just keep on getting points.

Happens on my GitHub issues too :/

Everything is down. How can this happen? Many of these services are unrelated, like even the brand-new Codespaces and Copilot are completely down. Do they all have a single point of failure? (Presumably the git server)

DNS. It's always DNS

When in doubt, and the outage is widespread it's either DNS or BGP, most likely DNS. My money is also on DNS in this case.

I suspect you'd lose that money. Many of the recent GitHub outages were caused by database issues. Examples: https://github.blog/2023-05-03-github-availability-report-ap.... In fact last march they specifically mentioned “Database stability”: https://github.blog/2022-04-06-github-availability-report-ma...

It's not DNS

There's no way it's DNS

It was DNS

DNS, BGP, PKI, probably other TLAs.

Yeah but they probably all share DNS or something related to networking

My bet is on cache servers having a bad release

>This incident has been resolved. Posted 1 minute ago. May 04, 2023 - 16:23 UTC

Everything resolved at the same time. DNS?

I'm not really sure Github prioritizes the high availability anymore. These outages might be a way to motivate the customers switch to Github's self-hosted enterprise solution which is the revenue stream for Github. If they're not losing users because of these outages, (I'm sure they can find that out) these outages really make them more money in the long run..

As a previous GitHub Enterprise Server customer on a site with hundreds of users, who's spoken with a number of GitHub employees about this, I do not think that is true. They would far rather you were on GitHub Enterprise Cloud. They view Server instances as a necessary evil for people who have compliance or security requirements they can't handle with the Cloud version, but it is fairly obviously a pain in the neck for support, and attempting to attain feature parity. They have been trying to bring their Server offering closer in architecture to the Cloud version since forever (e.g. moving to Nomad a few years ago to orchestrate it all as containers, so you can shard the services out over multiple hosts similar to github.com, etc.).

Last time GitHub went down was less than a month ago: [0]

At this point maybe it is time to self-host rather than to continue to tolerate this since it's guaranteed to goes down every month. [1]

Even open-source projects like RedoxOS, GNOME, KDE, ReactOS, etc are doing just fine without worrying about GitHub's unreliability as predicted years ago on not centralizing everything to GitHub. [2]

[0] https://news.ycombinator.com/item?id=35611739

[1] https://news.ycombinator.com/item?id=35611862

[2] https://news.ycombinator.com/item?id=22867803

> At this point maybe it is time to self-host rather than to continue to tolerate this since it's guaranteed to goes down every month.

Not a single one of these outages has caused me enough pain to even consider moving to another hosted provider let alone self-hosting and the burden that goes into managing all that stuff.

Can we stop with this whole "omg a major platform that gets used by millions of people a day can't guarantee 100% uptime, we need to all manage our own infra!!" mentality?

Seriously I've been in GitHub all morning and decided to take a break and read HN and I was JUST learning about this now... It was resolved before I seen this thread too.

I sense a disturbance in the force...

you mean... I sense a disturbance in the source...

The Source is what gives a hacker his power. It's an energy field created by all coded things. It surrounds us and penetrates us. It binds the galaxy together.

on May 4th too. Hmmmmm

Out of an abundance of caution...

Quick, activate the Copilot-based sysadmin!

To be fair they fixed in under 20 minutes

Fixed it*

Hmm, will it turn out to be DNS?

I'm guessing so, given the breadth of the outage.

(It's always dns.)

In the various tools for determining blame:




nc bofh.jeffballard.us 666 # I am surprised this is still working given its age; the CGI version at https://pages.cs.wisc.edu/~ballard/bofh/ doesn't work anymore (but the page is still up)

DNS...Developers Need Solutions

I wonder if they are migrating most of their platform to Azure.

Huh, it is working for me. Is it region specific?

Status updated 1 minute ago saying resolved so maybe they fixed it?

27 minutes to resolve - relatively fast

According to the same status page, this is Resolved as of May 04, 2023 - 16:23 UTC.

Their twitter account states that allnis back to normal

I see GitHub is working hard to achieve nine fives.

Applications are open for YC Winter 2024

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact