
GitHub was down - arparthasarathi
https://www.githubstatus.com/incidents/q07bfjh7jf1t
======
natfriedman
Sincere apologies to all GitHub users for the downtime this morning, and the
brief outages last week as well. We take reliability very seriously, and will
publish a full RCA in the near future.

~~~
_bxg1
Well Bitbucket lost all of my repositories this morning so you've got a long
way to fall

~~~
101404
What do you mean they "lost your repositories"?

~~~
_bxg1
When I tried to log in I was prompted to "upgrade my account" to a "Bitbucket
Cloud" account. After doing so, all of my repositories were gone. It seems
that my repositories remained on my "Bitbucket ""Regular""" account but that
my email address was no longer associated with it, giving me no way of logging
in to it. I emailed support 6 hours ago and have yet to get a response.

~~~
MyelinatedT
Wow Atlassian... That is quite horrifying! Glad I stuck with GitHub through
the MS acquisition. Bitbucket was probably my main alternative.

~~~
zenexer
For the record, Azure DevOps did the same thing to me when we switched over to
Azure AD. My account and repositories ended up in an entirely corrupt state.
Support was eventually able to resolve most of it, but I’m still discovering
problems.

~~~
_bxg1
I shudder at the thought of GitHub one day trying to integrate with people's
Microsoft accounts.

------
kevinmannix
This seems to be the third or so day in the past week I've had issues with
GitHub around this time in the morning. They've typically been really good.
I'm a bit surprised there hasn't been more talk about it on HN.

~~~
giancarlostoro
They seem to be doing heavy work on it. Now on Mobile you can't see repos in
"Desktop Mode" which is unfortunate. I have to tell my browser to pretend to
be in desktop mode. Plus the regex post from the other day seems to imply they
are working on new things when somebody from GH replied in said thread. I
don't mind improvements, but don't break production guys...

~~~
djsumdog
They also changed the "Group Membership" dialog to be paginated when you add a
new person to an organization. We have over 200 groups so now I have to page
through for ever new hire we add. There's not even a search option.

I'm sure the pagination might better for performance, but it's terrible UI.

~~~
masklinn
They may have missed that one, because at the same time they introduced both
pagination _and search_ to the repository membership page, and boy did that
help us on one of our repos with a few hundred direct collaborators, by the
end we could only manage access through the API because the page didn't even
load most of the time.

~~~
londons_explore
I see why web pages need pagination so the server or browser doesn't OOM, but
there really ought to be 10000 entries per page, not 25 that most sites seem
to like.

Ctrl+F on a list of 10000 entries is far easier than clicking through 400
ajaxy pages and trying to figure out some custom and buggy filtering system
that probably doesn't allow regex.

Past 10000 records most sites probably ought to just let you export in
something bigquery compatible anyway - Regular Joe isn't going to have more
than 10000 of anything, and anyone who does can learn how to use proper data
tools.

~~~
masklinn
> there really ought to be 10000 entries per page

Did you miss the part where I noted Github’s lists fail to load (let alone
render) long before that point?

------
waynenilsen
Don't forget to check your SLAs

Enterprise = 99.95% (quarterly)

[https://help.github.com/en/github/site-policy/github-
enterpr...](https://help.github.com/en/github/site-policy/github-enterprise-
cloud-addendum#enterprise-cloud-uptime-sla)

They're having a bad February but January was good. We will see what March has
in store

~~~
cle
> How do we calculate Uptime?

> Our Uptime calculation is based on the percentage of successful requests we
> serve through our web, API, and Git client interfaces.

Just curious, how do they measure this? What is the actual calculation?

~~~
Benjammer
What do you mean? All modern enterprise analytics/monitoring solutions are
going to be able to give you _some_ kind of top-level "request success rate"
metric. I assume they mostly just lean into whatever monitoring tooling they
have set up. What kind of "calculation" are you imagining here? Like a very
specific SRE formula for availability windows or something?

~~~
bradstewart
Depending on what part of the system is down, how do you know you even got a
request to mark as failed?

~~~
MetalMatze
You want to start measuring the closet to your users. In most cases that would
be some sort of load balancer. I don't think there's much we can do without
going to the client side.

------
savrajsingh
No matter how many talented engineers you have on staff, your entire service
can still go down. Let's pause and reflect on that. ;)

~~~
amelius
The interesting thing is that Git is entirely non-centralized, so in theory
they could simply redirect to servers onto which the data has been mirrored.

~~~
Cthulhu_
Git is, but the APIs and all the services they provide around it aren't.

That said, I think it's a bit weird that they don't store the data of the
services around the code itself in git, like they do with e.g. sites. That way
you'd have an `issues` branch that you could still access if github is down.

But that would probably pave the way for easy migrations away from Github.

~~~
chrisweekly
>"But that would probably pave the way for easy migrations away from Github."

bingo

~~~
amelius
At least they could have used the concepts in Git's design. But it seems they
didn't learn much from the tool they based their service on.

~~~
frenchy
I don't think learned is the right word to use here. Github's centralized
design and vendor lock-in is quite intentional.

------
anonsivalley652
When Microsoft buys companies, they tend to progressively decay as the
original architects leave, the morale of remaining employees grinds down from
the stress and they bring in cheaper contractors to duct tape the bits
together and plug the holes in levee with their fingers. I've BTDTBTTS.
_cough_ LinkExchange, WebTV, Hotmail, Skype, Softricity, Nokia, LinkedIn,
Danger/Sidekick _cough_ GH maybe next. ¯\\_(ツ)_/¯

~~~
erikbye
IME GitHub has not had increased downtime after Microsoft's acquisition.

~~~
anonsivalley652
Well, some other people in the comments disagree. And it hasn't happened
_yet,_ but it's the way they don't manage / integrate acquisitions very well
unless they're wowie complementary products like Visio. Danger dropped off a
cliff and Softricity was absolutely amazing but shelved, so friends of mine
basically repeated the theme for VMware View and were acquihired by VMware.
Time will tell where GH goes.

~~~
bbrree66
There are so many logical flaws here.

People's comments are meaningless, you can look at historical GitHub up-time
and see that it hasn't changed meaningfully.

"And it hasn't happened yet"

Ah yes, now you have to backtrack from: it happened! to... no wait I promise
it will happen! Based on... what? The fact that some acquisitions don't go
well?

This is all pure speculation with no substantiation.

I recommend learning about confirmation bias.

~~~
erikbye
I agree, but I just now took a look here:
[https://www.githubstatus.com/uptime?page=7](https://www.githubstatus.com/uptime?page=7)

I went back from the time of Microsoft's acquisition, and that status seems
heavily underreported. At least when I checked now, it was all green, green.
That does not reflect my experience.

------
Jaygles
Have they written any postmortems regarding their last couple of degradations?
I tried searching their blog but the only ones that popped up were over a year
old.

~~~
SuperSandro2000
I would also be really interested in them but didn't find anything yet. Maybe
their new notification system has something to do with it?

------
pcr910303
HN already has a 'Github downtime' post as soon as I find Github weird and
check HN if it's only me. How is everybody so fast? :-)

~~~
chimprich
If Github is down, thousands of programmers suddenly have nothing better to
do.

~~~
ithkuil
I rolled my eyes twice, then merged a few PRs manually and moved on with my
day. (i.e. the git server itself and all the API required to interact with the
CI automation _appears_ to work just fine)

~~~
umanwizard
The actual server was broken for me:

    
    
      $ git push
      Enumerating objects: 26, done.
      Counting objects: 100% (26/26), done.
      Delta compression using up to 8 threads
      Compressing objects: 100% (15/15), done.
      Writing objects: 100% (15/15), 1.49 KiB | 1.49 MiB/s, done.
      Total 15 (delta 12), reused 0 (delta 0)
      remote: Resolving deltas: 100% (12/12), completed with 10     local objects.
      remote: Internal Server Error
      To git+ssh://github.com/<redacted>/<redacted>
       ! [remote failure]    wip -> wip (remote failed to report status)
      error: failed to push some refs to 'git+ssh://git@github.com/<redacted>/<redacted>'

------
bflesch
Are they migrating to azure?

~~~
donkeydoug
I had the same guess... migrating from aws to azure & hitting some bumps. Have
to assume they won't be very forthcoming about it if that is the reason.

~~~
masklinn
> migrating from aws to azure & hitting some bumps

Could be the IO? I remember colleagues working on getting stuff running on
azure and they experienced horrible IO latency, as well as very low throughput
for lots of small IO (aka unix-style software).

Was a few years back so it might have improved since, but if those things are
still non-optimal and github is built with a unix-style vision of tons of
small IO access…

This specific outage taught me that github apparently stores git repos on-disk
which I was not expecting though (because the API access complained it could
not delete repos until they'd fully backed to disk or something).

------
krallja
Fourth time this month.

~~~
charrondev
It’s quite unfortunate. I’m hoping the post some kind of RCA/analysis.

I know I would if we had a month with only 2 9s of uptime.

~~~
nalllar
they're doing the usual corporate status page uptime lying:

[https://i.imgur.com/vy7onDT.png](https://i.imgur.com/vy7onDT.png)
[https://www.githubstatus.com/uptime](https://www.githubstatus.com/uptime)

~~~
rohansingh
I don't get it. Their own incidents tab on the same site shows four incidents
this month:
[https://www.githubstatus.com/history](https://www.githubstatus.com/history)

I'm not sure how that results in 99.98% uptime on the other tab.

~~~
nalllar
turns out it's bad/misleading UI, there's a dropdown for which type of
downtime which defaults to 'GIT Operations'

~~~
rohansingh
Ah, got it. It's still incorrect though. With the incident a couple days ago,
I couldn't `git push` for a couple hours.

------
tomphoolery
I've been noticing the issue they're describing for a few days now, with
errors in GitHub Actions requiring rebuilds and webhooks not seeming to fire
which caused Jira to go out of sync.

~~~
mirekrusin
Mssql only once guaranteed delivery in action?

------
aalleavitch
Centralization is great, isn't it?

~~~
dana321
I think in a few years time we will look back and think

"Waa, that was so archaic the way we used to do things!"

~~~
eeZah7Ux
It would be nice if someone wrote a decentralized VCS...

~~~
mirekrusin
With issues built in like fossil, oh wait...

------
thereyougo
I appreciate the way they keep everyone informed in the downtime. It tells a
lot about the company

------
arnonejoe
[https://downdetector.com/status/github/](https://downdetector.com/status/github/)

------
dvdhnt
Well, given the timing of GitHub reliability issues over the last few days, I
think we can all agree it has everything to do with dates and timezones?

/s

Appreciate the work.

------
zeisss
That explains why my test suite suddenly takes >1h and after I canceled it,
the status was green O.o

------
pcr910303
Looks like they are fine now. Yay!

~~~
glenneroo
> We continue to investigate the issues with GitHub services and will shift to
> a slower update cadence to provide more meaningful updates going forward.
> Posted 18 minutes ago. Feb 27, 2020 - 16:12 UTC

------
vivan
What do you guys recommend as a good way to continue work undisrupted when
GitHub goes down? A second remote mirror?

~~~
roblabla
A second mirror doesn't really help - when github goes down, the code should
still be available locally on your computer. The things that become truly
available when github dies are all the non-git features: issues, PRs, etc...

There are several ways to work around this, but none are really satisfying.

------
ArchReaper
>We continue to investigate the issues with GitHub services and will shift to
a slower update cadence to provide more meaningful updates going forward.

Translation: our shitty software update practices are now affecting Github,
not just Windows!

If anyone from Microsoft is reading this, why is your company so incompetent
at software updates in the past few years?

~~~
salmon30salmon
You honestly believe that a company the size of GitHub has had their software
update practices appreciably change since the acquisition? Relax, GitHub had
update issues before and they will have them again.

~~~
MrMorden
All of this has happened before and all of this will happen again. So say we
all.

