
GitHub was down - Retr0spectrum
https://status.github.com/?dupe=no
======
hetman
Everyone is saying: just work on your local repo. But GitHub is way more than
just git. There's bug tracking, code review, continuous integration, etc etc.

Making your organisation too dependent on a remote service can indeed be a
scary prospect and I'm not sure what GitHub offers to mitigate this.

~~~
jlebar
> Making your organisation too dependent on a remote service can indeed be a
> scary prospect

At every company I've worked for, internal services have been less reliable
than github. Certainly way less reliable than gmail.

I get that it's scary, in that it feels like you're giving up control over
something important to your business. But I'll posit that you never actually
had control, only the illusion of control.

~~~
Spivak
> But I'll posit that you never actually had control, only the illusion of
> control.

That's a weird way of phrasing it. When you run your own services and they
break you have total control and _do_ have the power to fix it.

When you buy SaaS you are relying on someone else. You may very well have more
reliability and uptime but you are nonetheless giving up control.

~~~
trustfundbaby
That's exactly the point. Sure you have "control" but what good is all that
control if it takes you 4 hours to track down the source of a problem and fix
it (for example). Github has 100s of engineers (with heavily specialized
knowledge that you don't have btw) working to fix any problems, I'd bet on
them over myself and (maybe a handful of engineers) everytime ... and I've run
my own subversion/git servers before.

Some folks just hate the feeling of not knowing what's happening and how long
its going to take to fix, vs having direct access to work on problems
themselves even though its not necessarily "better" in any sense of the word
... hence the illusion

~~~
forgottenpass
Your post gets to the false tradeoff that turned "devops" from a push for
better interdisciplinary collaboration into a "2 jobs, 1 paycheck" role.

Software teams are often pathologically unable to create a deliverable that is
scrutable and manageable by an operations audience. Building a product that
can stand on it's own two feat without constant minding is work.

For all the testing dogma that files around, constant delivery has resulted in
systems that are more brittle than ever. If someone with all the institutional
knowledge is always there to catch the system when it fails, why bother making
it easy to investigate failures? Why bother with analysis and building in
fault tolerance when someone can worry about a long term solution for that
failure mode when it causes them to get paged at 3 am?

So it becomes easy to say minders are necessary, and they must be developers.
Business incentives mean that answer isn't always scrutinized as heavily as it
should be, because it means spreading maintenance costs over time instead of
an upfront investment in resilience and maintainability. Refusing to provide a
toolset and manual for maintenance means some nobody with access to google
can't fix 99% of the problems that could occur. That way the magic black box
creators can ensure they're the ones getting paid to do it.

We don't have or need Windows engineers, kernel engineers, or Cisco engineers
on call. Nor do we have nginx, postgres, cpython, exim, apache, php, mysql,
Active Directory, Exchange or Office engineers on call. We use weird
enterprise software from companies that have gone out of business, we don't
have high-priority support contracts with many of the rest.

Software that needs minding from the developers is just bad software. That's
technical debt they took on to get the product out the door.

------
anonu
What do people do to get around this? Run my own git server like the good 'ol
days? Github has become a central source of failure for us now...

~~~
jonaf
Your question seems to indicate that you have a runtime dependency on github.
If true, this is a problem no matter where your git repo resides. You need to
architect a solution wherein you do not have a runtime dependency, especially
if it is a single point of failure, or accept that your reliability will never
be better than the sum of your services' reli abilities (there was an article
on the "calculus of availability" or something like that recently).

What I personally do in AWS is bake my artifacts or other git-sourced data
into AMI's. If you want a middle ground, you can instead push your artifacts
to an s3 target -- s3 has better reliability / track record than github for
this purpose.

~~~
todd3834
Is it still considered a runtime dependency if all you do is merge code and
deploy while relying on GitHub? I assume most apps will not go down if GitHub
goes down, but their ability to move code to production gets stalled.

~~~
tux1968
Git is inherently distributed. There's really no reason that Github has to
stand between development and production.

~~~
gtirloni
There's the convenience reason. Teams making that trade-off (not having a
local mirror) need to understand they made GitHub itself a runtime dependency.

~~~
bananaboy
But in theory in a pinch you could spin up a git server somewhere publicly
accessible and switch your site to pull from there, and push to there
yourself.

~~~
calpaterson
Not if you're using more than git from github, for example downloading
releases as zip files for deployment.

There are probably a lot of other tempting features to put on your deployment
critical path but I'm not in an environment that uses github so I've
forgotten.

~~~
WorldMaker
Webhooks are a big one: rather than polling a git server for changes it is
very easy for CI/CD systems to rely on GitHub's HTTP webhooks in their
critical path to kick things off on push/PR/branch/merge. Especially because
GitHub does a great job of populating its webhooks with tons of useful
information about the event that's tough to replicate with just git post-
hooks.

------
sebringj
Blip, its back up. Nah, too much effort to host my own and I do need a coffee
break from time to time.

Ignore this if you are on a big team of course.

------
acchow
if only we had a decentralized version control system...

~~~
ben174
Exactly this. It's painful to realize that Git was invented to solve this very
problem, but so many use GitHub as their source control server just like CVS
and all the other old source control systems.

~~~
EduardoBautista
Then what's the alternative? Push and pull directly from other users who have
access to the repository? What if that user is currently offline? There are
huge benefits to having a main centralized repo. If you need redundancy for
some reason, it's fairly trivial to mirror the repository somewhere else.

~~~
Yetanfou
Push and pull to your own server running Gitea/Gogs or Gitlab, and mirror to
Github if you feel the urge? Or, if you insist on having your workflow around
Github, do it the other way around: have your local server mirror your Github
repositories. That way, if Github goes belly-up you can just switch to your
local server for the time being.

------
prh8
This is the 3rd major outage in a week, wow.

~~~
sinaa
App server availability is now 98.3% over the past month, which seems pretty
bad!

~~~
travisjungroth
Good ol' one 9 uptime.

~~~
ki85squared
"One nine up-time" rolls right off the tongue! Must be a PR play!

------
ram_rar
Such incidents remind me of 2 things

1\. No matter what 99.99999999% availability a service provides, its utterly
useless if the time to get back up is unacceptable.

2\. Do not have remote services, which you cannot fully control to be a part
of your run time deployments.

~~~
acchow
Uptime % includes time to recovery. Since you're not up if you're still
recovering...

~~~
ram_rar
But that still does not provide much inside into recovery time. Its good to
know, the median recovery time.

~~~
solatic
The only time that uptime would not give insight* into recovery time is if the
service is not being regularly updated (remember, zero-downtime deployments
should never affect your uptime numbers), then, the uptime numbers improve
over time to something really impressive but go in the toilet once the shit
hits the fan.

I hardly think it's reasonable to consider services like GitHub or
infrastructure like AWS as being irregularly updated...

------
lessclue
If you want to host your own repos with user and access management, bugs,
issue, task tracking and tons more, take a look at Facebook's Phabricator
[https://www.phacility.com](https://www.phacility.com) (the self-hosted
version, of course).

------
praneshp
Tangential: Slack was down for a bit earlier today as well.

~~~
Empact
I'm seeing errors on Segment.com as well.

~~~
praneshp
Ha ha, nice!

------
grzm
[https://status.github.com](https://status.github.com) shows

> _18:19 CDT Major service outage._

~~~
jwilk
What's CDT?

~~~
dingo_bat
Central Daylight Time

------
Woofles
Seems like they're saying it's maintenance now?

    
    
      $ gl
      fatal: remote error:
        GitHub is offline for maintenance. See http://status.github.com for more info.

~~~
jdelsman
That is likely just their standard downtime message.

~~~
Woofles
Only reason I brought it up because it was originally showing the all to
familiar unicorn on the website, then it changed to also saying maintenance.

~~~
beager
This seems like a pretty common response for a breaking incident for a scale
app. Requests flow through to a failing system and trigger HTTP 500. Those
requests may pachinko through the stack, making a variety of calls that can
compound the degradation of a system weathering an unplanned failure state.

Engineers stop the bleeding by 503'ing requests at the perimeter or putting up
a static maintenance page. This allows things like caches or DBs or app
servers to cool off while a rollback or a revert goes out. Then, when the
system is stable again, let requests flow through again (slowly, of course).

------
morgzilla
And now the office goes to the keg.

------
runn1ng
As the last time, pull/push access to repos works fine for me, only the web
service doesn't work.

edit: confirming, push doesn't work either

~~~
joshuahhh
Pushes are failing for me. (To one repo, at least.)

~~~
zackify
same

------
badpizza
Gone now!

18:41 CDT Everything operating normally.

------
NicoJuicy
It's sometimes up ( perhaps bad to say this --> everyone will refresh )...

------
Animats
This is different from yesterday's GitHub Major Service Outage.[1]

[1] [https://status.github.com/messages](https://status.github.com/messages)

------
myrandomcomment
I would love to see the outage postmortem. To be honest designing a completely
redundant service today is just not that hard are compared to 10 years ago.
The ability to load balance, route, us VMs/containers and move loads makes is
fairly simple. In 1998 when I was building an backbone and ISP it was much
harder. You had a ton of single points of failure by the nature of the
hardware and software at the time. We purchased systems from Sun because of
the quality of the HW and the stability of Solaris. Now a stack of cheap linux
boxes with the right design (IP CLOS, BGP fabric, load balancing and
containers) who cares if a switch fails or a rack goes out? Most issues today
are caused my humans and bugs from software upgrades for new code.

~~~
brianwawok
Kind of?

How many servers did the average app depend on in 1998? 1? Get dual HD and you
were in decent shape.

Compare to a modern microservice app, that maybe depends on 100 internal
services and 4-5 external services. A lot of things need to go right or mostly
right for things to function.

~~~
ProAm
Sounds like an argument against micro services?

~~~
stouset
Only if you don't stop to consider the alternative.

~~~
ProAm
"The Wheel of Time turns, and Ages come and pass, leaving memories that become
legend. Legend fades to myth, and even myth is long forgotten when the Age
that gave it birth comes again."

------
workerIbe
Oh man was just browsing a 3 month old branch for a feature that was just
revived today, did not clone... Please come back!

------
1zael
Run your own git server. Version-control locally, build automated systems to
push triggered by uptime status.

------
unpwn
And its back!

------
pirocks
It seems to have been fixed now.

------
Retr0spectrum
> 00:41 BST Everything operating normally.

------
zebraflask
95.8%. What would AWS think about that?

------
Cofike
Does this mean we get the day off?

~~~
nicklaw5
I'm down for that :D

------
Robin_H
Ugh, time to go to bed then.

------
dewiz
anybody knows if www.atom.io would be affected by this?

------
unpwn
And its back!

------
zakshay
Not sure yet, but it seems to be due to AWS (S3 likely).

------
ben174
Rough year for GitHub. GitLab is licking their chops right about now.

~~~
obilgic
at least they didn't delete the db.

~~~
ben174
Ohhh yea, shows how good my memory is.

------
carsongross
Wow.

Once is an accident. Twice is a coincidence. Three times is an enemy action.

~~~
zackify
Both days a different error on pushing, making me think it wasn't down and it
was my fault!

