Hacker News new | past | comments | ask | show | jobs | submit login
GitHub was down again
110 points by pupdogg 42 days ago | hide | past | web | favorite | 76 comments
GitHub is down once again!



While I use (and mostly like!) GitHub, this is yet another reminder that centralizing so much infrastructure—from package managers to CI pipelines to static websites and more—around one company is a very bad idea, and will likely bite us in the end.


Is that actually bad? A chain is only as strong as its weakest link, so if my system depends on many sites then even one outage can take me down. Centralizing like this can actually reduce risk.


The chain is still the same length when centralised and may be weaker. Separate services would use separate databases but it would make sense to consolidate resources when centralised

If you had separate services the CI service database going down would not affect anything else. A centralised database for all the services will take down all the services when it borks

If your priority is profit or cost management then duplicated infrastructure will make no sense


It seems to me that centralizing means that if the service goes down, your entire chain is down.

Decentralizing on the other hand makes it possible to replace individual links.

If GitHub goes down you can switch to Gitlab. But you can't do that if you're using GitHub for git and issues and CI and static sites, etc.


> A chain is only as strong as its weakest link

wow, when I read this, it is actually much cleaner than what I would use as in the past: the overall system strength is min {i in 0...m} subsystem[i] where m is the total number of subsystems


Rather reliability of sum is a product of reliabilty of constituents (unless redundancy was applied) where reliability is in [0,1.0] range.


This is extreme nitpicking but I'd recommend seeing it as a reliability of a product of constituents.

A sum of constituents is more likely to be a situation where you're fine if at least one of the options works (e.g. a cluster with redundancy).

This will actually get you a semiring (easy to check), although whether that is really useful remains to be seen. It's nice to have anyway.


Depends on how it's done, if it's one authority with many mirrors it can be more reliable. Take for example `apt` in debian, you can always find a mirror online.

With git's distributed nature, this would be very much possible.


Will you also mirror CI infrastructure, issues, pages, documentation, API-consulting scripts, etc?

Also, pushing to a mirror is not exactly the thing you want.

What makes apt easier is that pretty much everybody is just downloading from the source or from a mirror. When the source stops, mirrors don't get updated and everything still works. That's very different from the usage model that people have with Github.

Gihub/Gitlab have extended the Git usage model very much, they are not just git. You can't easily migrate away from them to another git offering (and not even very easily between both).


Agreed, that would be even better. I wonder what happened to GitHub that their entire HA system didn't work.


Meh, as long as all dependencies are built on standards and I can quickly switch to other services I don't mind centralization.


I always wondered about Rust packages - why GitHub? Were any alternatives ever discussed or proposed?


I would really like to see a postmortem for the recent outages. While I doubt it's all the same root cause, it would be nice for some messaging around improving resiliency.

I think the outages could be related to:

1. Github mobile just went public so they may of changed the scaling params to keep up with expected increase in traffic 2. The new notification system seems to be a lot heavier, and they could still be catching up to the changes 3. They were somewhat recently acquired by Microsoft so maybe they're migrating to Azure to reduce expenses

Whatever the cause, 11 days with outages out of 90 is pretty rough when you rely on Github for project management, a central hub for viewing CI, and all VC concerns. Feel bad for the smaller companies that wholly adopted GitOps and are blocked deploying hotfixes during these outages.


> I would really like to see a postmortem for the recent outages. While I doubt it's all the same root cause, it would be nice for some messaging around improving resiliency.

They recently posted a "post-incident analysis" [0] about several "service disruptions" in February that were all a result of database issues.

Obviously, I have no idea if today's outage is related.

---

[0]: https://github.blog/2020-03-26-february-service-disruptions-...


Geeze, so many outages/impacts (12+) in the last 90 days.

The status page looks like a Pez dispenser.

https://www.githubstatus.com/

Usually companies put in place a release freeze or "Code Purple" when there are such demonstrated problems with releasing stable code.


Fortunately, Github is built on a distributed system that doesn't require us to stop coding just because we can't access a central server.

Distributed systems for the win!


except that it also serves as a communication platform, like issues, PRs, code review, CI.

the fact that git is distributed is but a small part of the whole.


Which is why it gets targeted by "state actors". I'm not saying that's what this is, but it's happened before.


What is distributed here? If github is down no one can see what you changed, not even the build servers. Git is distributed in the sense that every client has complete history, but not in the sense that my laptop can act as replacement of github(like torrent).


Git supports lots of protocols for sharing your changes - even email!


Diffs can be shared over mail, right. Can I do a `git pull` via mail?


Yes:

https://git-send-email.io

You can also do a "request pull" over email, which if it sounds confusing it's because GitHub wanted it to:

https://www.git-scm.com/docs/git-request-pull


You can't over mail. But each of you could run your own Git server if you wanted to be able to pull.


Git is distributed in the sense that you have a copy of the source code and can make commits. Compare this to what would happen if an SVN server went down.


Even in SVN there is a copy of the current code everywhere. Just that it does not support having multiple repos and all which I don't think many people uses in git. Most orgs only have one remote repo setup(I would love to be wrong in this)


I wish I could agree with you - still, these outages just reveal that GitHub is much more important to software development than pushing changes into a git repository.


Why is this post flagged? Seems like an appropriate HN topic


The more repetitive a topic gets, the more users flag it. 'Is down' posts are repetitive as a category to begin with, and 'X is down again' posts are repetitive along two axes. There was one of these a few hours ago, before it got flagged:

https://news.ycombinator.com/item?id=22955595

... which was likely because users regarded it as a quasidupe of

https://news.ycombinator.com/item?id=22935941.

By 'quasidupe' I mean it's not strictly the same story, but the difference ('down again') isn't significant enough for the thread to be substantially different.

(The submitted title on this one was "GitHub: Here We Go Again".)


I understand your argument, but the number of comments on this post shows people still want to discuss it.

Is it possible to run an experiment. Remove the flag option but highlight the hide option to users. Users who are not interested in the topic can remove it from view but do not kill the discussion for those who are interested in it. Then see if the quality of posts/discussions decline.


Every "GitHub is down [again]" post leads to the same discussion. With the number of recent GitHub "service disruptions", it starts to get a bit old.


Agreed.


It almost feels like HoTMaiL all over again.

For those too young to remember, Microsoft bought Hotmail (forget the silly CamelCasing, we know that HoTMaiL was HTML + mail by now) which was based on FreeBSD+Apache and was champing at the bit to use it to demonstrate the scalability and stability of their then relatively new NT operating system and the IIS web server. Let's just say that... things did not go the way Microsoft would have wanted and the demonstration more or less achieved the opposite of what they intended. It took them a long time to move the frontend to Windows and an even longer time to do the same to the backend.

They won't make that mistake again but they might succumb to featuritis or wrongfooted attempts to steer Github-users further and further into the Microsoft world.


I remember signing up for Hotmail and abandoning my isp email as if it was some sort of internet activism ;)


GitHub Enterprise as a quarterly Uptime SLA of 99.95% It's probably worth checking to see if they've violated it this quarter. The status page says that their uptime is 99.92% YTD but their support page says that the status page is not connected to their internal metrics. [0]

[0] https://help.github.com/en/github/site-policy/github-enterpr...


Be glad you're not on Bitbucket though. Issues happen so often people stopped posting it to HN.


I feel Bitbucket has been remarkably stable this year. The difference between the Github Statuspage and Bitbucket's is night and day. Of the two recent issues I can see in the last month they look like minor glitches and not outages. Bitbucket had an unfortunate series of interruptions for a couple of days in October last year, but Github has been having major outages every few days for 3 months now.


Single Point of Failure As A Service


Just switched back to using GitHub again and the timing couldn't be worse. Hope things are going okay for them, definitely some stressful weeks.

On a side note, the status.github.com page is quite delayed. Following live updates from Twitter has been a better strategy for confirming that GitHub is having issues: https://twitter.com/search?q=github&src=typed_query&f=live



> Currently observing issues affecting GitHub.com ?.

Yes.


Experienced this last week, too, we worked hard to complete a feature, and we were ready and excited to merge and push the update to production, then GitHub was suddenly down. We didn't sleep well that night. We could have done some workarounds but just too frustrated to do so.


Seriously considering moving our projects off of GitHub at this point.

What other services are good? I loved using Phabricator at Facebook, but its cloud option is pricy.


If you have multiple remotes, then downtime quickly becomes a non-issue. As long as at least one remote is up, you can keep pushing and fetching.

I wrote about mirroring in another comment [2]:

> I put together a script [0] to automate the process to set the primary remote to Sourcehut (git.sr.ht) and mirror to GitLab and GitHub. And yes, that script is in a repo that's also mirrored to GitHub and GitLab (check the README). It combines well with my `git pushall` alias [1].

> [0]: https://git.sr.ht/~seirdy/dotfiles/tree/master/Executables/s...

> [1]: https://git.sr.ht/~seirdy/dotfiles/tree/master/.config/git/c...

Bonus points for having one remote being a tiny self-hosted instance, like plain git+ssh. I don't think I've had any reliability issues with plain git+ssh on an rbpi.

Furthermore, using mailing lists or something like git-bug instead of a vendor-locked-in issue-tracker keeps issues decentralized and eliminates downtime.

[2]: https://news.ycombinator.com/item?id=22764055


There are tons of options. The most stable, well run, fully-featured of them is gitlab.com . GitHub recently changed its pricing model for teams private teams to try to compete with gitlab.com . Their feature set is absolutely fantastic and the site is one run by an independent and exemplary company.

I've also been experimenting with sircmpwn's git.sr.ht, if you are a minimalist to me like that as well.


Consider mine: https://sourcehut.org

It's a different workflow, but we've had zero unplanned outages in 2020 and are the highest performance software forge in general: https://forgeperf.org


In my experience, it's not really git itself that is the problem with downtime, but rather things like CI and recently, package registry. Last week I thought I was isolated from the GitHub downtime because I was working on a project on GitLab. But turns out a bunch of the npm dependencies are hosted on GitHub. Normally that wouldn't be a huge problem, except I'm specifically working on CI pipelines at the moment so it's quite frustrating.


Came here because my Gitlab CI/CD was failing on `npm install` with a 500 from codeload.github.com.


And now they've bought npm as well! We're cornered.


I'm considering self-hosting gitlab.


Having been down that road, it's also a mess.

Best I can imagine now is using GitHub as a primary and other hosted services as a mirror. Mirroring the git repos will only get you so far though, you won't have the services built around them (e.g. issues, merge requests, comments, actions) or you may not be mirroring all git repos that you need access to. But a reduced service may be sufficient for short periods during an outage.


How so? We've been self-hosting Gitlab for 2.5 years now (maybe more?). Besides a regular update (through apt) we haven't had any trouble. Team of 5 developers w/ a dozen or so repos and basic CI/CD builds. I highly recommend it.


We didn't have a better SLA than hosted GitHub when self-hosting GitLab, and then there's the extra cost of maintaining the instance, the CI runners, the DB, the networking, configuration and so on. Wasn't worth it for us after using self-hosted for 2+ years with a tech team ranging between 10-25. On top of that, most of the team thinks GitHub has a better UX and enjoy using it more.


Can you share your experience?


Host it locally or on your own server?


I'm so used to github being reliable that I was first checking my own machine when composer wouldn't install something.


Same here! I'm using WSL2 on Win10 and started getting "fatal: Could not read from remote repository" errors and thought it's definitely something to do with my WSL legacy to WSL2 upgrade.


Same. I'm of the opinion that GitHub's edge over GitLab is their uptime guarantee. But they've had issues for the past 3 days now.


Haha, I was trying to install a brew cask package and wondering why it wasn't working...


Have they provided any insight as to the cause of the recent spike of downtime? Any post-mortems or anything like that?



I have seen nothing so far. I am also very interested in what is going on as it's very unlike github to be down this much.


Looks like it's affecting git operations for me.

Yikes.


Consider self-hosting via gogs.io or gitea, and use them to mirror your key repo dependencies ...


>use them to mirror your key repo dependencies ...

Is this possible with a self-hosted gitlab?

Trying to get that to push to a cloud repo (GCP/Azure/whatever) as backup. Managed to selfhost gitlab but at the edge of my technical git knowledge here


I am often a bit sad that we don't have all the fancy pants GitHub features. But right now I'm quite happy I can still use my git, the CI runs quite well, and everything else is quite dandy overall =)


What was that we were all saying about Microsoft being on an upward trend?

Years and years of stability to the point where it starts to taken for granted. About a year post Microsoft purchase, and here we are.


Exactly.


They must have been anticipating this happening more often, given the replacement of the raging unicorn with a sad squidcat (or is it a catsquid?)


i think they shuffle it up algorithmically. I am getting the unicorn, for example.


Yup same here


What are we at 5 downtimes this week?


and the new notifications they rolled out are worse in some significant ways :(


Not sure if they've fixed anything, but this blog post describes some of the issues: https://drewdevault.com/2020/03/13/GitHub-notifications.html


i reported a lot of this to them during beta, too.

thank god they added back a button to group by repository. it was literally unusable otherwise - just a stream of disorganized crap.

the repo name is still uselessly duplicated in every issue line despite the common heading when grouped. you can no longer "mark all as read" in a repo without going elsewhere to see the hidden notifs, pressing "select all" and clicking "done"...so next time you refresh the page without doing that, it all floats to the surface again. etc etc.

ugh.


:( a lot of colleagues are now sad... :(


tell me about it! I got an early start today wanting to push deploys before customer opened...well, now I can't!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: