Hacker News new | past | comments | ask | show | jobs | submit login
GitHub Outage (github.com)
420 points by philip1209 on Jan 28, 2016 | hide | past | favorite | 281 comments



Sorry, I think I caused this. =[

    bower jquery#1.11.3                       not-cached git://github.com/jquery/jquery-dist.git#1.11.3
    bower jquery#1.11.3                          resolve git://github.com/jquery/jquery-dist.git#1.11.3
    bower foundation#~5.5.2                       cached git://github.com/zurb/bower-foundation.git#5.5.3
    bower foundation#~5.5.2                     validate 5.5.3 against git://github.com/zurb/bower-foundation.git#~5.5.2
    bower ember#^2.3.0                           ECMDERR Failed to execute "git ls-remote --tags --heads git://github.com/components/ember.git", exit code of #128 
    fatal: remote error:
Mid bower install. Rly srry guys!!! =[


So... you were doing a production redeploy, and it crashed?

The URL "git://github.com/components/ember.git" [1] suggests this is an internal GitHub Bower build log, but your post history doesn't mention anything about GitHub (let alone whether you work there), so I'm not 100.00% sure.

[1] https://webcache.googleusercontent.com/search?q=cache:3e00jl...

Assuming this is, in fact, a GH Bower log, the first thing that came to mind was that this architecture isn't (and possibly should be) using a dual-silo approach: when you upgrade, the upgrade gets loaded into a new blank namespace/environment, tested, and if it worked (passes CI test coverage or something like that), the main entry point is switched to the new environment (maybe with a web server restart or config rehash) and the old environment gets purged (possibly after a trial period). The current stack looks quite akin to "click this button to flash the new firmware and DO NOT UNPLUG your device or you'll brick it."

But then I realized... wait. You guys have like... isn't it like, a few dozen RoR worker boxes? Was this crash on the inbound router or something? xD

It's all good though; consider this "curious criticism" - like constructive criticism, but with extra sympathy. And hey, GH's never broken on my watch before (not that I need it atm... hopefully); this is interesting =P


Hate to be the bearer of bad news, but ikawe was joking about the bower thing.

Source: I work at GitHub.


> Source: I work at GitHub.

So, you're saying you broke it.

Everybody, over here, gang up on this guy.


Shit happens some time. Who cares who broke it- what matters is they're fixing it. I hope github does make public a official reason tomorrow. Not looking to blame anyone- I just want to know what they're doing internally to make sure it does not happen again... adaptation. Even if it's from some external hack or ddos attack- how can they plan on building redundancies in. I am pretty sure GitHub's dev team is talented- I'm being such an anti-troll.


My comment was quite joking in nature.

> I hope github does make public a official reason tomorrow

I do too, but, mostly I find post-mortems quite interesting to read.

If your services are at some 95% uptime or lower, you're doing something (serveral things) very wrong and its probably not that interesting to me.

Getting from 95% -> 99% you probably did some interesting things there.

Going into multiple .9s beyond that, you're likely doing quite a bit of interesting stuff but what I find more interesting is where you went wrong. Figuring out not just where you went wrong, but, what your incorrect assumptions were and WHY they were wrong. "We believed X could never fail because of Y, and even if X did fail, it would not cause production impact because of Z!"


Ah, glad I didn't presume it was a valid message! I've heard people joke about this kind of thing exactly this way in the past.

But now my "technical breakdown info" box has no tidbits in it. :P

I'm glad you're back up now (sortakinda - what sort of traffic are you sustaining right now? :D), but a rough idea of what asploded would be really cool to know about.

Speaking of which, I'd like to take a moment to make a strong point about the fact that disaster-recovery situations don't get blogged about enough. Vague "we fixed it" datapoints get buried in status update logs like it's something to hide and hope nobody brings up.

In situations like these, the only constructive perspective is for everyone to accept that something went horribly wrong and not make a fuss about it, and if such a mentality can be established, this creates an environment within which we can share technical breakdowns of "we found ourselves in XYZ position and then we did these thirty highly specific things in heroically record time to be up and running again", and I think sharing this type of info would potentially be more educational than setup tutorials or the "we switched to X and it improved Y by 1400%" type things the Net's full of. Sure, you'd have to generalize and probably give a lot of backstory about infrastructure, but it's becoming trendy (in a sense) for companies to describe their operations in precisely this way, so it's not completely nonviable.

(Note my use of the angle of "we found ourselves in XYZ position" - maybe a small highlight of what led up to the disaster would be included (worth considering if the information would be educational), maybe not. In a blog context, moderating comments to keep the discussion on-track and constructive may be necessary, but IMO would be worth it.)


I missed the joke. Either way- hope you get everything up and running soon!


Perhaps you missed the joke?


That's what you get for using bower and jquery. The react/webpack gods are angry.


or the react/webpack gods are laughing...


I once ran rm -rf in the production mysql data directory.

Shit happens.


I did that too. Destroyed out Zabbix database. Neither that Zabbix server, nor the other one monitoring the server I destroyed, could alert us that anything had gone wrong for over an hour. I finally realized it when I couldn't login...

I was able to painstakingly rebuild the server after 9 hours without anyone noticing. To this day one of my biggest fuck ups and prouder accomplishments.


Are you saying that once you break something, you should break your monitoring as well, but do it very quickly since it may be too late? :)


Yeah, if you're going to break your monitoring solution, you'd better shoot it in the head and vaporize the body.


When I worked at an ecommerce company the boss (who also was a programmer) did that too (on our master engine- not just website orders, but ebay, amazon, everything historical, inventory, RMAs.... etc etc... OOPS indeed... Yep- shit happens. *Recovered from it- was a hastle and shut down everything, but it was one of thoes whoops moments. Happens to the best of us.


Been there. Done that. No backups.

Recovered all the data by using open file handles in /proc/.

Not a fun two hours.

Shit happens. Live and learn.


Have you written a blog post about that /proc/ trick? sounds like an interesting read


Here's an article that explains it: http://archive09.linux.com/feature/58142


I don't blog, but, what teraflop posted is basically correct.


I'm going to put that on a T-shirt



You are the human manifestation of "the chaos monkey"


For anyone that feel they might fall under the same category (I do!), represent: https://teespring.com/human-chaos-monkey


Oh! i see! So you are the guy that DDoSed github by downloading Ember!


I thought the point of heavy client-side frameworks was to take load off the server.


Can someone explain this to people who don't use bower?


He was just cloning stuff and it failed mid-way.


Maybe by the time you switch to npm, github will come back up?


Cool that you have time to update us. :)


Relavant: GitTorrent: A Decentralized GitHub (http://blog.printf.net/articles/2015/05/29/announcing-gittor...)

The repo is at ... https://github.com/cjb/GitTorrent, so just clone that and ... oh ...


Fixed that for you!

http://gittorrent.org/ and `git clone git://gittorrent.org/gittorrent`


From the site:

  First we connect to GitHub to find out what the latest 
  revision for this repository is, so that we know what we
  want to get. GitHub tells us it’s 5fbfea8de... Then we 
  go out to the GitTorrent network.
So yeah, wouldn't have saved you.


The article continues and covers solutions to that.


Every time this happens people make clever remarks about how Git is distributed but we're all depending on GitHub for so much that we defeat the purpose. But once GitHub comes back up, everyone just gets back to work, trusting and relying on it as much as ever. Eventually it goes down again, and we come back to complain. Convenience is the only thing that we seem to value. (I'm no different, which makes my comment completely hypocritical.)


> clever remarks about how Git is distributed but we're all depending on GitHub for so much that we defeat the purpose

Yeah, but they'd be wrong about that purpose.

Distributed systems used by people (eg. email, BitTorrent) always have their major hubs (Gmail, The Pirate Bay). That's understandable: no product reaches critical mass without a main stream. The strength of a distributed system isn't that it has no points of failure: it's that, in the event of a significant failure in an established node (eg. TPB's downtime at the end of 2014), the community can retarget around a new solution (eg. KickassTorrents) at the point in which the inconvenience of the downtime outweighs the inconvenience of switching habits, without a significant dip in service associated with the switch.

In contrast, a truly centralized system like BitKeeper would just outright block progress if the central node were to deny access (which was, of course, the purpose that led to devs like Linus Torvalds changing their focus for a few months, so they they could scale back to full kernel production as they constructed a workable alternative, in Git).


The bigger problem here is the number of build packs only relying on Github as a source of truth. We need to find an abstraction to distribute storage so that there is no single point of failure.


Exactly.

So I had to install Gems from RubyGems, not that big of a deal, and I had to look them up on RubyGems since Google gives me GitHub first, but that's OK too, but then all the documentation seems to be on.... Github. Except not, it's also on RDoc. (Though RDoc kinda sucks, compared to GitHub...)

Pretty big win imho, Rails got a few more points with that. :D

Now I'm wondering how NodeJS is faring at this...


We have npm as our single source of truth, although you could host your own npm, or use a tool like sinopia [0] which makes pretty reasonable tradeoffs while being usable. Instead of asking you to replicate all of npm, it just keeps local copies of your packages, and if a package isn't found it'll hit npm.

[0] https://github.com/rlidwka/sinopia


Companies/people don't learn on their mistakes. Almost everything is on GitHub nowadays. This makes it a SPOF even if Git itself is distributed. More companies should host their projects on premises. There are good open source alternatives to GitHub: Apache Allura, Fossil, GitBucket, GitLab, Phabricator, and Redmine.


... GitHub Enterprise ...


It's not open source and it's expensive.


BitBucket


Yes, but at least you can continue to work while it is down. You'll be missing on the collaboration part (and bug trackers, and stuff).

If you are desperate, you can also call another developer and add another remote pointing somewhere else. So Git is a win anyway.

Try that with a "SubversionHub"...


Even if convenience is not a priority for you, it almost certainly will be for people you want/need to interact with.


Only that I don't expect data loss since it is distributed and even if GitHub gets wiped clean I can just happily push up all my stuff from the local repositories and get on with my life.


If you listen closely you can hear the sound of continuous integration builds around the world breaking


If you listen closely you can hear the sound of all bower-dependent builds on earth failing in synchrony


why only bower-dependent?


I'm guessing the parent doesn't mean only bower, but bower uses github as the source of packages/code (I don't use bower, so I'm only going from memory). On a build that downloads everything 'fresh' it's not going to be able to get sources that are only on github.

Bower certainly isn't the only thing that does it. It's a trend that's becoming more and more common.


I was in a go tutorial a couple of years ago where the presenter was including directly from github.

And sitting in a ruby talk when the rubygems compromise happened just after everyone had been told to go upgrade a gem because of flaws in a specific gem.

It's not unique, but it is terrible.


Unlike most other package managers, bower doesn't actually host packages. It just finds that at Git URLs, which are almost invariably on GitHub.


As GitHub queues up webhooks, we're bracing for impact at buddybuild! :)


You shouldn't notice a webhook deluge because the site isn't generating events. I'm watching our webhook services though and will let you know if that changes.


Hi Kyle!

It looks like webhooks are wedged.. no?


Everything should be A-OK now. If not, hit up github.com/contact :)


But no one can do anything to generate new webhooks.


pushes have been working for the past 10-15 minutes... pulls for 5-10 minutes.


Which is, in a sense, a bit disturbing.


Yup, a coworker was wondering why a build failed, a minute later I saw this.


Well, it's not like we have invented binary and source packages two decades ago, so people don't need to re-download and re-build their build-time dependencies every time they start compilation.

Oh wait, we have.


Yep, but nobody ever includes them. Nice stuff to automate, gotta chek if there is already DevOpsy stuff to fix that, and if there isn't, got me a new weekend project...

It is obviously gonna be hosted on GitHub. Oh, the irony...


I felt a great disturbance in the Force, as if millions of voices suddenly cried out in terror and were suddenly silenced. I fear something terrible has happened.


Nomen est omen Mr. Chewbacha?


I was going to stay up for a middle-of-the-night release (DB migrations, bleh). Instead, posted https://status.github.com/ into slack and am packing up and going to the bar.


Can you imagine what would happen if the same happened to Slack? No communication in thousands of companies.


This is new. Looks like github is still "working" when it's down.

  $ git push origin master
  Counting objects: 5, done.
  Delta compression using up to 8 threads.
  Compressing objects: 100% (5/5), done.
  Writing objects: 100% (5/5), 433 bytes | 0 bytes/s, done.
  Total 5 (delta 4), reused 0 (delta 0)
  remote: Unexpected system error after push was received.
  remote: These changes may not be reflected on github.com!
  remote: Your unique error code: 4fce1b2367b5304dd3761538b8fd0c23
  To git@github.com:myrepo/myrepo.git
     a62b7f1..e88431a  master -> master
  $ git push origin master
  Everything up-to-date
Note: Values are fake, but message is real.


The Git backend is different from the GitHub frontend, unsurprisingly.


I suppose not. Glad to see something like this still work:

  $ git clone git@github.com:influxdata/influxdb-ios.git
  Cloning into 'influxdb-ios'...
  remote: Counting objects: 10, done.
  remote: Compressing objects: 100% (8/8), done.
  remote: Total 10 (delta 1), reused 10 (delta 1), pack-reused 0
  Receiving objects: 100% (10/10), done.
  Resolving deltas: 100% (1/1), done.
  Checking connectivity... done.
Even though this doesn't:

  $ wget https://github.com/influxdata/influxdb-ios
  --2016-01-27 20:03:39--  https://github.com/influxdata/influxdb-ios
  Resolving github.com... 192.30.252.131
  Connecting to github.com|192.30.252.131|:443... connected.
  HTTP request sent, awaiting response... 503 Service Unavailable
  2016-01-27 20:03:39 ERROR 503: Service Unavailable.


You can also git clone over ssh. That's kinda equivalent to what the git@github.com form does.

    $ git clone ssh://git@github.com/influxdata/influxdb-ios
    Cloning into 'influxdb-ios'...
    remote: Counting objects: 10, done.
    remote: Compressing objects: 100% (8/8), done.
    remote: Total 10 (delta 1), reused 10 (delta 1), pack-reused 0
    Receiving objects: 100% (10/10), done.
    Resolving deltas: 100% (1/1), done.
    Checking connectivity... done.


It's down now.

  jim% git clone git@github.com:pfsense/pfsense.git
  Cloning into 'pfsense'...
  fatal: remote error: 
    GitHub is offline for maintenance. See     
  http://status.github.com for more info.


I was trying to copy a library down that's not available via composer, and found that this worked:

git clone git://github.com/MunGell/Codeigniter-TwitterOAuth.git .

where git clone https://... failed. YMMV, of course.


If you're talking about PHP's composer, You can include arbitrary repos with it, you don't have to depend on packagist at all.


Good point, I always forget about that. Thanks.


Yeah, got that too. Maybe time to go to bed.


Developers: "I can't get any work done because GitHub is down!"

Linus Torvalds: [facepalm]


I honestly can't get much work done now, I've been looking for RabbitMQ auth plugin sample code and every link I'm clicking now shows me a unicorn.


Google the examples, then use google cache


Thank God we're on Bitbucket! Right, guys? Anyone?


Ha. We're on BB too. Lucky for us, there hasn't be a problem with BB service for oh, about 6 hours [1]. :)

1. https://bitbucket.statuspage.io/


One of my clients is on BB, they have frequent service issues. I need to switch them over to github.


Regardless, most dependencies are on GitHub which breaks bower install for most people. It's crazy how much infrastructure relies on this single point of failure.


Someone should make a distributed version control system.


What would that look like? Can you describe it? It's not very useful to say stuff without providing some sort of useful idea. You can't just say "someone should write some sort of stupid content tracker, and give it some random three-letter combination that is pronounceable," or whatever.


Completely agree. But whoever makes it, they should make it free and open source and designed to handle everything from small to very large projects with speed and efficiency.

Additionally, it should be easy to learn and have a tiny footprint with lightning fast performance. It should outclass SCM tools like Subversion, CVS, Perforce, and ClearCase with features like cheap local branching, convenient staging areas, and multiple workflows.


> it should be easy to learn

That would be a significant upgrade.


Can't tell if your serious or not but I suppose that would be properly implemented git.


It's sarcasm: $ man git "git - the stupid content tracker"


You mean like IPFS? [1] People are working on it.

[1] https://ipfs.io


Their source code pointing to Github is symbolic of exactly how this problem hasn't been solved yet.


I didn't say it was solved, but they are very close. IPFS is essentially a global p2p git repo, with a cryptographically controlled branch namespace.


That sounds very cool. I was just pointing out how divorced these ideas still are from the way we actually program and implement things on the web.

In an ideal world, a link to a source code repository (or a link to anything for that matter...) would never fail because there would be automatic mirrors to at least provide read-only access to it.

It sort of forces one to ask whether this is the result of fundamental mistakes within HTTP / DNS itself. It's not realistic for a web designer to put time into making external links fault tolerant by running some query to switch the routing link to an available node.


Content addressed links at least make this possible. With http you have to reach a particular server. If that server is down, the link is broken. With ipfs all you need is at least one machine on the entire network to be serving a file and the link will work and as an added bonus you can verify the hash of what you get so a mitm attack on that link is impossible.


Very interesting. Do you think ipfs could be used to mimic a server to host a static website that could be loaded by a browser?


You mean like this?

https://ipfs.io/ipfs/QmNhFJjGcMPqpuYfxL62VVB9528NXqDNMFXiqN5...

:-) IPFS includes an optional http gateway to give the traditional web access to IPFS before browsers support the IPFS protocol natively.


Would this be useful?

    # pre-push (there is not post-push)
    # Update in several places
    git push bitbucket master
    git push gitlab master
    # ...


Or a local cache of dependencies


What a useless comment.


Aren't npm installs actually hosted on npm? This shouldn't affect npm installs.


That's right. Except I of course got unlucky.

    npm i semantic-ui
    npm ERR! fetch failed https://github.com/derekslife/wrench-js/tarball/156eaceed68ed31ffe2a3ecfbcb2be6ed1417fb2
    npm WARN retry will retry, error on last attempt: Error: fetch failed with status code 503


So, in this case, it looks like github isn't acting as a version control server, but rather a static server... It turns out that github isn't just a single point of failure for some application, but rather multiple points of failure for slightly different reasons...


It is possible to install npm packages that are hosted on GitHub: http://stackoverflow.com/a/17509764/889864


Oh wow, color me surprised. I edited my comment.


I changed jobs recently- previously used Github, now on BB- glad I'm not at my old job, I'm sure several people's phones are blowing up from auto-emergency fail measures.


Yeah, our time to shine has finally come!

We use Bitbucket in our ~30 member academic robotics research lab, it's wonderful because they are nice enough to supply us with as much as we need (private repos, teams, etc.) for free!


Whoo hoo! I agree!


Damn right :-)


If the internet would hire people to advertise for them, it would bring marketing to the next level. Take for example if BitBucket hired someone to go online, and compare the two: GitHub and BitBucket. Of course the comparison would have to disclose that thy work for BitBucket, but it would be a very thoughtful response.

We would get these detailed as fuck responses on why one project (BitBucket) is worth more to the developer than GitHub instead of these little jabs at why it is better. I want thoughtful responses.

If there was a duplicate response, the employee that responds to various threads online can link them to their answers.

This isn't even directed at you @ntaylor. I think it's an exploitable marketing strategy. A clear example of this is Katie from PornHub. /u/Katie_PornHub (or whatever the user name is) posts on reddit, which in tern gets more interest in PornHub. PornHub is basically using reddit for free advertising.

Basically. I want to know WHY something is better than something else, and that WHY is with as much detail as possible with a lot of thought put into it. Give me a pros and const list between the two. Anything but two good things about it.

- - -

Holy fuck, this would just add a human element to advertising. You hire humans to serve ads to people. Whenever "GitHub" is mentioned, and you have a competing better service, their only job is to advertise your service to anyone have problems with the other service.

In cases like these, it always seems to happen naturally. Humans are willingly advertising for free, when they could be paid for it.

The downfall of the internet:

  > We make up the ad network
  > It carries us far;
  > however---giants must fall.
Me.


Thinking about this more, this could be automated to some degree. You could have a bot that listens and responds when key words appear somewhere online. You could use Google's API for when it's bot finds something of interest to you "As it happens"

This would only get you so far. What if a user has a question? How will you help them out if it's a bit. The solution?

Write API's for every site that you want to use. You can either scrape every site for responses that fit your criteria, or let google do the digging for you with the "as it happens" notifications which would indicate that you have to scrape that site for the information.

If the information you get from the site is something you want to respond to, you can.

- - -

Holy fucking shit.

I just described a system that would allow for you to use all websites as inboxes for chat messages.

- - -

Refine:

With this early model, we would have one program on our machine that would understand X websites. It can parse user comments directed at us from a website (A). It can send responses back to A, and so on.

Now, users on Program X can chat with theoretically any person on any website (A, B, ...), but users of website A can only chat with other people on website A OR people using Program X.

Why doesn't everyone use program X? Because it does not exist yet.


"We're investigating a significant network disruption effecting all http://github.com services." - https://twitter.com/githubstatus/status/692508939792039936


> effecting all http://github.com services

Guess they're too busy working on the issue to notice the misuse of "effecting" instead of "affecting."


Never too busy to correct grammar.

http://i.imgur.com/iUbz1Lh.png


Troubleshooting checklist:

1. Commit fails

2. Try again, see if it fails twice

3. Check internet connectivity

4. Try github in browser

5. Try github in a different browser

6. Go to HN to see if it's down for everyone

7. Write snarky comment

8. Go try again...


Why would your commit fail?


First noticed it when making a wiki commit, actually. Edited the story slightly for artistic license. :)


I think he meant that the commit should work fine, but the push would fail.


Luckily, git is a distributed control system and we all remembered to:

git remote add backup <bitbucket or gitlab url>

git push backup


I'd actually like a gitolite-like system that takes my pushes and replicates them among Gitlab/Github/Bitbucket/repo.or.cz. I'm sure it's possible with hooks, but every time I get around to looking into it, GitHub is back up


Now that GitHub is back up again, it looks like a cron version of https://help.github.com/articles/duplicating-a-repository/ would get me some of the way there.


Fortunately git is distributed and lets you work offline. https://git-scm.com/about/distributed


I'm trying to read some webpages hosted on GitHub.


Try adding www instead of hitting the apex. If not github.io urls should still work.


For the impatient. I would make a gist but...

  while true; do curl -s https://status.github.com/api/status.json | grep good && tput bel || echo -n .; sleep 1; done


Here is a version which uses OSX's say command for a spoken announcement of "github is up again". It also checks once every minute instead of once a second.

    while true; do curl -s https://status.github.com/api/status.json | egrep 'good|minor' && say -r 160 "github is up again" || echo -n .; sleep 60; done
Edit: accept a status of minor as an indication of up-ness.


say "github" does not get the pronunciation right. say "git hub" is the way to go!


I'm not sure how ddos'ing github would help with the outage.

runs script


Judging by the fact that their status page is still up I'd say that the service is pretty independent of the rest of the site. If my one-liner gets popular enough to do damage to Github when China can't then I will be extremely proud of myself. :D


Ah, I like yours better for GitHub specifically, but for any website, my .bashrc got this as its latest addition tonight:

  # watch for a website to come back online
  # example: github down? do `mashf5 github.com`
  function mashf5() {
      watch -d -n 5 "curl --head --silent --location $1 | grep '^HTTP/'"
  }


The status page update seems like some ironic machine rant "The status is still red at the beginning of the day" (http://take.ms/50Tox)

Hubot has assumed control...the singularity is upon us!!!


Yeah, I am confused by the existence of a status report for a time 3.5hrs in the future. Guess they're just trying to get ahead of the game and set some realistic expectations.


Maybe it's GMT mistakenly labeled as PST?


Weird. It was labeled EST for me.


It is.


Status page seems to be back: https://status.github.com/

No word on twitter yet: https://twitter.com/githubstatus



went to push, didn't work, tried the webpage, saw the outage, now I'm on HN


same here


same


bundle install here


sadly, same


ok this is actually really annoying now


What the `zsh` working with `fuck` think of your comment on OSX:

git:(master) ok this is actually really annoying now

zsh: command not found: ok

git:(master) fuck

look this is actually really annoying now [enter/↑/↓/ctrl+c]

look: is: No such file or directory

git:(master) fuck

> No fucks given

That's kind of perfect.


Nice! And in this case the perfect is not the enemy of the good!


And 100,000 people's work grinds to a halt.


As a workaround, I can add all of you to my company's GitHub Enterprise install. It's only ~$2,500/year per 10 users, so I can just expense it.


Github is down. Post github.com on HackerNews, that'll help them. =P


To be fair, GitHub has scaled to the point where traffic probably isn't their concern.


This is reminding me of last year's major Facebook outage. If I recall correctly, that outage was a bug in service discovery that took down all data centers (a CLI accepted a negative value when the Zookeeper variant treated it as an unsigned int , then all service discovery went down). I feel like service discovery is the biggest point of failure at large companies, and it would explain why services across so many different domains and systems went down.


Except that when Facebook is down, productivity goes increases, and when GitHub is down, productivity decreases :)


sigh Many moons ago when I was a starry-eyed lad just learning to use Git, I remember all the cheerful sentences in the Git book like "Unlike SVN, with Git you can work even when the server is down!"

The more things change, the more they stay the same.


Well… you can still work with Git when the server is down. Just not GitHub.


That was never the main design consideration of Git, but you nonetheless can make commits and branches when the server is down. Go try it.


Git is not Github. If hotmail.com goes down, it doesn't take down email worldwide with it.


You can continue to work. For example I skipped pushing to Github but I can push to Heroku. No problem. I'll push my code when it is up again.


git != github. If you're worried, you can always push to another remote in the meantime. Does svn do that?


I see you got your centralization in my decentralized protocol.


I'm still working over here. I mean I took a quick break to read HN. But no productivity problems. Why can't you keep working?


Fiddling around with several deployment and configuration management options I just came to the conclusion that there is too much code between me and the systems.

I would like to have one bash script that I could start in such worst case scenarios that tests everything and tells me exactly what the problem is.

Just imagine everything is down and you can not trust your dashboards.

What would you put into such a script? What would you test for?


Does this have something to do with the constant DoS attacks they've been getting from a certain country for hosting certain open source projects that defy the censorship authority of said certain country?


Russia? China? Is it forbidden to state the country?


You have been banned from /r/Pyongyang.


exactly why I host all my repos on my own dedicated server with my own backup, and then mirror/sync to github

and also why I do include my dependencies in my projects

having a build tool fetching dependencies dynamically on github or whatever is not a convenience, it's a PITA

just sayin' even extremly reliable hosted services can fail, do own your repositories, going back to write some code ;)


Status page is not responding either:

https://status.github.com/


Seemed to work for me, although it took a while to load and said "All systems operational"

edit: Now says "19:32 Eastern Standard TimeWe're investigating connectivity problems on github.com."

edit2: "19:47 Eastern Standard TimeWe're investigating a significant network disruption affecting all github.com services."


Yeah me too, status page says its OK but the website look like this:

https://www.dropbox.com/s/ap1nqogmqza2e12/Screenshot%202016-...

And:

    cd /[mason@IT-PC-MACPRO ~]$ cd Code/rollerball
    [mason@IT-PC-MACPRO rollerball (master)]$ git pull
    remote: Internal Server Error.
    remote: 
    fatal: unable to access 'https://masonmark@github.com/RobertReidInc/rollerball.git/': The requested URL returned error: 500
    [mason@IT-PC-MACPRO rollerball (master)]$


Yeah, you guys were way too fast. The status page needs some minutes to update. :)


They also send out update on their @githubstatus twitter: https://twitter.com/githubstatus

Nothing for this yet, though.

EDIT: There it is: https://twitter.com/githubstatus/status/692505376554618883


Luckily that came back up 5 or 10 minutes after the outage began. And they updated it with a down notice.

A side project of mine called StatusGator [1] monitors status pages and alerts you when something is posted on them. I built it to handle the use case that you're trying to diagnose an issue only to remember to check the status pages of your dependencies and notice that they are already working on it.

It's pretty useful for those services that keep their pages up to date. But not very useful in cases like this when it's not updated.

1. https://statusgator.io


Apparently they can see into the future and know for a fact that they'll still be having issues in a few hours.

  January 28, 2016
  00:00 EST The status is still red at the beginning of the day

  January 27, 2016
  20:02 EST We're continuing to investigate a significant network disruption affecting all github.com services.


... or you're in a different timezone. It's 11am on the 28th where I am.


It says EST there. It's wrong. Looks like it's reporting times in UTC but marking them EST (for me at least).


Under/over on how many man hours will be spent by engineering teams world wide discussing their github single point of failure issue?


Just the other day there was a discussion about "go get" and what would happen in the case of a Github outage. Sigh...


scary... "remote: Unexpected system error after push was received. remote: These changes may not be reflected on github.com! remote: Your unique error code: 7527a6e1bbc9fe126d51c97feac7b4e3 remote: Unexpected system error after push was received. remote: These changes may not be reflected on github.com! remote: Your unique error code: 7527a6e1bbc9fe126d51c97feac7b4e3"

seems pushes are going into oblivion, but the client side gets a return code that the push was successful.


Yeah I got this as well and no changes on github.com so much for creating this release tonight. :/ I hope they will fix this otherwise this push went into oblivion and the client thinks all is well.


I was getting plain-old 500 errors, which is what brought me here


Github's security team should clarify what's going on.


Two little birdies have tweeted that there was an electrical fire at a key datacenter.


We had networking issues on Rackspace earlier today...


They're also on Prolexic right now, so...big DDoS, likely state-sponsored?


How can you spot Prolexic? I just checked their DNS/IPv4 and it seems to be pointing regular hosting


    $ traceroute github.com
    traceroute to github.com (192.30.252.130), 30 hops max, 60 byte packets
    ...
     9  level3-pni.iad1.us.voxel.net (4.53.116.2)  17.609 ms  15.057 ms  10.113 ms
    10  unknown.prolexic.com (209.200.144.192)  9.186 ms  9.462 ms  9.315 ms
    11  unknown.prolexic.com (209.200.144.197)  17.753 ms  17.767 ms  18.851 ms
    12  unknown.prolexic.com (209.200.169.98)  9.922 ms  9.542 ms unknown.prolexic.com (209.200.169.96)  11.471 ms
    13  192.30.252.215 (192.30.252.215)  13.569 ms 192.30.252.207 (192.30.252.207)  9.660 ms 192.30.252.215 (192.30.252.215)  13.150 ms
    14  github.com (192.30.252.130)  9.051 ms  8.833 ms *


traceroute github.com


Reminds of the talk where a Github guy says, basically, we just push, push, push, and if something breaks we hear about it on twitter.


That was Zach Holman. He called it TDD - twitter driven development.


And their status page says 100% operational (as updated 5 minutes ago).


Help! The site is back, but I just pushed a new branch that isn't showing on the site, meaning I can't create a pull request.

    circuitry git:(feature/middleware) git push origin feature/middleware
    Counting objects: 18, done.
    Delta compression using up to 8 threads.
    Compressing objects: 100% (18/18), done.
    Writing objects: 100% (18/18), 5.01 KiB | 0 bytes/s, done.
    Total 18 (delta 9), reused 0 (delta 0)
    To git@github.com:kapost/circuitry
     * [new branch]      feature/middleware -> feature/middleware
This branch is not appearing in the repo: https://github.com/kapost/circuitry/branches


We're still recovering; please give the site a bit of time to come back! Not everything can be expected to work until we've gone green on status.github.com. Thanks


Got it, thanks!


Does anyone know if an extremely controversial new peice of software was recently pushed ?


No, but there's recently been a lot of talk about the Hidden Tear ransomware source being taken off of GitHub soon. Given that the author has already been blackmailed and all the drama surrounding that, this is a possibility.


There was an error in ionic with Github's error, but I cannot create a new issue :P

    Downloading: https://github.com/driftyco/ionic-app-base/archive/master.zip
    x Invalid response status: https://github.com/driftyco/ionic-app-base/archive/master.zip (503)
    Error Initializing app: [object Object]
    errorHandler had an error [TypeError: Cannot read property 'error' of undefined]
    TypeError: Cannot read property 'error' of undefined
    ...


US west side on http://map.norsecorp.com/ looks like fireworks. I wonder if that where Github servers are


This was basically my train of thought:

"Damn! I'm tired of fighting installing cordova/ionic. WTF is happening now? Oh Github is down... okay that's new"


Why do they have an angry unicorn on their outage page?


They use unicorn[1] as a HTTP server

[1] http://unicorn.bogomips.org/


Huh, TIL. I always assumed that they meant that the errors themselves are unicorns because they rarely happen and even when they do, you usually don't see them.


First time I saw that logo was on this Android ROM, I wonder which used it first?

http://aokp.co/

(they're using a different but similar image now, but they used to use that exact same logo)


Ha, thanks :)


Maybe they are using gunicorn?


That's a python server - based on the ruby server "Unicorn"


Ahh, didn't know of Unicorn. Today I learned, thanks!


they want to be a 'unicorn' in VC speak


If people are going to depend on github being up for their CI workflows, etc., there should be serious effort expended at the network and cache layer to be suitably reliable. It's probably fine to not be able to do developer-level actions for hours, but even a 5m outage in deploying other systems is unacceptable for most businesses.


Initially I was control R spamming, now I'm just pondering what a complete apocalypse will look like.


Time to goto the park!


Some "Github Pages" are down


Interestingly. Our (minimal) traffic to github shifted transit carriers when this outage happened.

Between 2016-01-28 00:39:47+00 and 2016-01-28 00:43:26+00 there was a flurry of BGP updates that caused that.

I'm not sure on the exact timing of the outage, this could either be a symptom or a cause.


Static HTML pages are loading. Example, my personal website: http://peterburk.github.io

But my website loads the content from .MD files via raw.githubusercontent, and those appear to be down.



Actually, I'm pretty dumb. Can't move the repo to bitbucket since Github is down.


If you have a local copy of the repo, you definitely could.


True. If it's not back in a few hours, I'll have to resort to that.


Well that's the last time I ever write a deployment that assumes Github is working...


I heavily rely on Github # tags to document my codebase. Failures like this make me wonder if there is any way to decouple my codebase from Github while preserving all the comments and issues on commits I've built up.



Good thing we've got mirrors, and don't depend on GitHub for releases. Still, we interact with our customers through GitHub and use it heavily in our workflow. Productivity is impacted if not stopped.


If you were stuck like me looking up an OSS github project, http://archive.org/web/ has github pages.


Github is back online! Github is back online! Github is back online! Github is back online! Github is back online! Github is back online! Github is back online! Github is back online!


I wonder how many origin2->bitbucket are being created right now.


Doesn't look like DDoS, maybe they deployed some dodgy code?


...graph is shocking, it's like somebody pulled the plug out.


likely a data center problem


I think I'm setting some kind of Ctrl-R land speed record.


Heh, talk about a negative feedback loop. DDOS propagating more DDOS....


That would mean a positive feedback loop. https://en.wikipedia.org/wiki/Positive_feedback_loop

(Positive in this sense has a similar meaning in behavioral psychology's terminology for operant conditioning, adding punishment.. which both come off as odd with our common use of positive as good and beneficial :/ https://en.wikipedia.org/wiki/Operant_conditioning)

...Carrying on with your analysis, since uninterrupted positive feedback loops result in explosion, I guess that would mean either (for this outage) github's fallback status page would go down or the parts of the network carrying that traffic, which could not handle their increased load, would go down.

Probably better we would rework these control mechanisms and reroute \ rewire \ entirely change these processes.


Oops!

[Oh My Zsh] Would you like to check for updates? [Y/n]: y

Updating Oh My Zsh

Username for 'https://github.com':


> January 28, 2016

> 00:00 EST The status is still red at the beginning of the day

They seem to have some time issues as well since it's not yet January 28th in EST.


Perhaps it's in a different timezone to EST


That would be odd since all other times are (correctly) reported in EST.


Always funny when a DCVS is used more like a CVS like github. They go down and you have all the same failings as if just using cvs/svn.


Seems like as good a time as any to head home then.



Please create a new branch before emergency commits.


Weird, Github is down, but I was able to pull new code not existing on my local repo from Github successfully...


I wish they say more than

> We're investigating a significant network disruption affecting all github.com services.


I would prefer that they focus on getting back up and publish a post-mortem after the fact.


The best approach to this was shown recently by Slack, where they had a huge team of people on social media while their engineers figured out the outage.


It's not like Github has 2 engineers only.


OMG! PONIES!!!


It looks like they stopped updating the availability percentages on the status page? ¯\_(ツ)_/¯


This is why I run my own GitLab server as a mirror! Or just in case GitHub gets hit by a bus.


If this happens often enough maybe my company will start using their Enterprise install


Githib is down in San Francisco


Even getting read-only access would be nice. I just want to look at some source.


On the upside, they gained lots of Twitter followers...


Welp. Remember to vendor your Go dependencies, folks.


Too bad I'm not teaching someone to use git and/or github right now. Perfect opportunity for a practical joke. "Oh, great, look... you broke GitHub with that last command."


Google code was turned off and now github?


Can't update to a newer Go devel version; can't fetch Go packages, sigh ...


Why can't you update to a new Go devel version? The code on github is just a mirror of https://go.googlesource.com/go


Because this is my workflow for it:

$ brew reinstall go --head

==> Reinstalling go

==> Cloning https://github.com/golang/go.git

Updating /Library/Caches/Homebrew/go--git

Username for 'https://github.com': ^C


Ugh. It has also killed my productivity this last hour because an API I was supposed to be writing against is only hosted GitHub.


DELETED because this joke has been made


Are they associating unicorns with crashes on purpose? Is the same illuminati that controls our financial markets taking control of Github?


Does this look like a DDos attack?


Github Pages still working though.


Back online as of 2:28AM UTC


They just now managed to get "effecting" changed "affecting", so there's hope!


Grammar police are on the job- everyone can calm down... Unless the grammar police are the REASON for the outage. Mother should I trust the government?


Is this China again? Ughhh.


"significant network disruption" == DDoS :( Come back, Github, please.


@githubstatus updated


I just really hope they honor their code of conduct while they're investigating this.


That doesn't even make sense.


I'm kind of tired of paying for a service that has so many outages.


What's your service level agreement?


A bitbucket license is $10. Paying $50 a month is redonk.


Whats your service level agreement?


I liked that question better with the apostrophe.


What?s your service level agreement'


Free for small teams. Even closed-source projects.


Go to bitbucket, or self-host, or find some other shiny new competitor. The market speaks with its feet.


http://git.oschina.net/ is a good choice and it's private repo is also free.


You can try https://coding.net if you know Chinese.


We're already planning a bitbucket move. Unfortunately it's not planned for another week or two.


Having used both, I find github's tools (mostly) better. And bitbucket isn't 100% reliable either.


Self-hosted is.


Self-hosted isn't 100% reliable either. Nobody is that good. I've done serious high availability (financial system that would land on the front page of the New York Times the way Github lands on HN), and it wasn't 100%. And the overhead of high availability is incredible.

No way would I do self-hosted version control. I have better things to do than babysit servers for commodity services.


Assuming you pay someone to maintain it, maybe. Which costs a lot more than a github/bitbucket contract.


"Fool me once, shame on you. Fool me twice, shame on me."

I can learn from my mistakes and improve progressively.

If I rely on another instead, I cannot, and I cannot (necessarily and confidently) see to it that they learn from their mistakes and improve.

By relying on another (at least an unreliable \ uncommunicative \ uncooperative one), I cannot improve my chances of them not making those same mistakes again, taking me down with them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: