Hacker News new | past | comments | ask | show | jobs | submit login
GitHub is degraded/down (githubstatus.com)
263 points by juancampa 46 days ago | hide | past | web | favorite | 162 comments

Is it just me, or has Github's quality of service been continually degrading over the past several months? What is going on internally? Is this because of the Microsoft acquisition? Increased usage? An internal transition to Azure?

...is it time to move away from Github?

It might also be covid related. People are working from home, people responsible for system upkeep might not be immediately responsive, more demand on the servers for whatever reason, etc.

I would agree that the coronavirus could be a factor here. At the same time, I've been noticing issues since probably December or January (before the coronavirus started being a real problem), which makes it seem like maybe there are multiple issues.

Of course, I'm not actually internal to Microsoft or Github, so I have no idea and it's all opaque to me.

Alternative explanation - they've been deploying big new features with some regular cadence lately. New features carry risk and I think we're seeing that.

Same here - seems like these issues have been going on since last year, especially with Actions.

Maybe it's the demand side? With remote work, the intensity of usage went up at least in our company as we rely more on written communication. At the same time, some people will use the time to start side projects or get into programming.

On the other hand, shouldn't there be a productivity drop with so many people working from home while their kids also aren't in school? I'd expect that to offset any increase in demand—after all, Git isn't Slack; remote work shouldn't cause people to push all that much more often, right?

Depends... I'm getting about 40% more done, with fewer interruptions, and not having a couple hours of commute and lunch driving.

Personnel may be partially unavailable, but - home or office - developers are doing the same job as they always did. At least from the users point of view it didn't change that much.

Cloud servers everywhere are also under much heavier load. They may have moved much of it into Azure, which is famously overloaded right now.

My guess is opening their paying offering for free and also Github actions which is CPU-intensive and does much more than the traditional CI/CD tools.

Ideally GitHub Actions would be completely independent of GitHub's core services/servers (e.g. how Travis CI, Circle CI, etc. works), but that seems like it may not be the case.

Also, I'm still anticipating a fuller report on the database issues they mentioned have been the root cause of many outages over the past few months.

I prefer to pay 8$ per month for a stable service any time over 4$ per month over a service that fails during a critical build, just like it happened to me this time.

Not just you.

This page https://web.archive.org/web/20190801000000*/https://github.c... has been[1] “500 internal server error” since late-December (globally it seems). Nobody cares (“not a priority” (c) support), nothing on githubstatus.

[1] blue circles on web.archive are errors too

If you view the historical uptime, it does seem like there have been more incidents in the past three months, but otherwise the waters look calm (as reported at least): https://www.githubstatus.com/uptime?page=1

Yes, and so this is actually a thing that bugs me, because I use Github every day. Over the past several months, there have been numerous days in which I've had problems (`You can not comment at this time`, 500s, etc.) and no corresponding status report.

It seems like the historical uptime page paints a far rosier picture than I am actually experiencing.

I wonder how companies like github decide to determine this when outages are geospecific. Do they not report until an outage is affecting 50% of a geographic region before its reported as a partial outage?

If you do nothing, it lands by default on the "git operations" view, which is by far the most stable, since, well, it consists of executing the battle-tested git program.

If you want to see more the state of github "extras", you'd need to select "github actions" or "webhooks", which have a fair amount of downtime (about once a week or so, which seems about right).

Interesting how the most stable component of the company is the open source one of course ^^

It's almost as if keeping a complex service like Github online and available to millions of users is hard.

Building skyscrapers is hard. Does that make it OK for them to fall down regularly?

GitHub hasn't collapsed killing thousands or needing to be completely rebuilt though, so that analogy doesn't work. This is more like there's a flood in the lobby so maintenance has closed the front door for a bit.

I didn't mean for the point to be about the consequences of the failure. What I was trying to argue against was the notion that it's fine for things to fail, just by virtue of them being hard. There are a lot of complicated systems in the world that work extremely reliably.

Planes sometimes crash without killing people or needing to be completely rebuilt; that doesn't mean that this is clearly undesirable.

It's not a good thing that Github is down. It's an inevitable thing that comes from complexity at scale though. Hard things are hard, whether that's planes, buildings, or web apps.

I wonder if it has something to do with the past 3 months have been coronavirus related, higher internet usage?

They've been pushing lots of features in the platform so not surprising that it's a bit unstable now.

In an ideal world, pushing new features would have no impact on stable mature features like browsing files, comment threads, etc

Internal to Amazon we consider that about 80% of issues/outages/etc are due to changes. This may sound "duh" but this is over 10k plus investigations.

Much if the work is just minimizing the impact of this changes by finding them before customers do.

This includes things like unit, integration testing, canaries, cellular/ zonal / regional deploys, auto rollbacks, multi-hour bakes, auto load tests, and much much monitoring. Not to mention cross team code reviews, game days, ops reviews.

Ideal I agree ... and yet the real world is exactly the opposite ;)

That’s part of the micro service promised land, right?

No, I think it's more part of how to run a complex system with a lot of people changing stuff at once. Having good monitoring, kill switches, staged rollout, continuous deployment, and so on are all things that contribute more making a reliable service than how microserviced it is.

Too bad this is the real world.

Which world is that?

If you're looking for somewhere else, SourceHut has had no unplanned outages in 2020, despite being kept online by an army of one. The software and infrastructure is just more resilient and better maintained. Our ops guide is available here:


It's also the highest performance software forge by objective measures:


Full disclosure: I am the founder of SourceHut.

It's unfair to Github to make the claim that your infra is more resilient and better maintained. Their load is orders of magnitude greater than yours. My driveway also doesn't have potholes it doesn't mean it's more resilient than the freeway.

I don't think so. I backed up the claim here:


SourceHut is at least 10x lighter weight and has a distributed, fault tolerant design which would allow you to continue being productive even in the event of a total outage of all SourceHut services.

Sidebar, I just want to say, you are one of the few people I’ve observed doing actual “modern” web development.

When most people talk about “modern web” or modern anything in software they think it means “using all the latest tools”.

That often means things like ES6 and Webpack, which have nice surfaces, but which create nightmares under the hood.

That’s the opposite of what modern architecture was. It was about embracing the constraints of materials. Given the properties of concrete, what is the limit of what you can do with it. Go there, and no further. And don’t cover it up, just finish the dang slab and get on with the rest of the house.

ES6 means transpiling, which means webpack, which means a massive machine of hidden complexity, which if you’re lucky exposes a nice smooth surface where everything is arrow functions and named exports. And if you’re unlucky is a flimsy piece of cardboard over the nightmare underneath.

You (SourceHut) seem to be building a UI that actually takes note of how the browser is. And you are trying to push the big numbers... how reliable your service can be, how many endpoints can one person maintain, while letting the materials of the web (forms, urls) dictate the details.

That’s true modernism.

So, bravo. I’m glad to see you out in the world. It takes courage to step outside of the norm and I’m rooting for you.

Just wanted to interject that browsers (other than IE 11) have over 98% coverage for ES6 without transpiling.

Care to expand on this some more? Perhaps you have some other examples of both good front-end and server side, modern, day web development?

I'm sure you've done a great job building up your infrastructure, but if you have the level of traffic Github has, what would your uptime be?

Who's to say? It's not GitHub scale, and even if everyone in this thread moved to SourceHut, it still wouldn't be GitHub scale, but it would be serving your needs just fine. I feel totally comfortable recommending SourceHut over GitHub as a service which can be expected to have better uptime and performance, because it is a fact - even if we operate at different scales.

And I believe sr.ht would beat out GitHub at their scale anyway. The services are an order of magnitude more lightweight. And the design is more fault tolerant: we use a distributed architecture, so one part of the system can go down without affecting anything else - as if GitHub's issues could go down without anything else being affected. And many of our tools are based on email, a global fault-tolerant system, which would allow you to get your work done more or less unaffected even if SourceHut was experiencing a total outage. We'd automatically get caught back up with what you were up to in the meanwhile once we're online, too.

I've spoken to GitHub engineers about some of the internal architectural design of GitHub, too, I'm confident that SourceHut's technical design beats out GitHub's in terms of scalability. And, despite already winning by a good margin, I'm still spending a lot of effort to push the envelope further on performance and scalability.

> Who's to say?

And then you go on to say it. I'm glad that SourceHut exists, and I like many of its principles, and it's probably better designed too, but walking into a thread where someone is having an outage and then claiming that you'd do much better is in poor taste no matter how good you are or how many of your services work offline.

I responded directly to someone who said they were considering alternatives, and wouldn't've otherwise.

Right, and I think it is great to bring up how your service can handle outages better than GitHub would due to it being decentralized. The part I have issue with is saying that you'd do better than GitHub about keeping your site up, pointing to the issue that they are in the middle of resolving–that just seems like kicking them while they're down, especially since you haven't actually shown that you can do better. (Yes, you have good uptime in the past, but I don't see what's stopping the power going out to some of your servers, or you pushing a bug into production, or any number of other things that shouldn't go wrong but often do, especially as the number of users increases.)

>what's stopping the power going out to some of your servers

Redundant power supplies

>pushing a bug into production

Nothing, but again, SourceHut is demonstrably better in this regard: because it's distributed, a bug in production would only affect a small subset of our system, and the system knows how to repair itself once the bug is fixed.

And I don't think I need to apologise for kicking Golaith while he's down. Someone said they want alternatives, so I pitched mine with specific details of how it's better in this situation, and that doesn't seem wrong to me. I would invite my competitors to do the same to me. We should be fostering a culture of reliability and good engineering - and if I didn't hold my competitors accountable, who will? "Here's an alternative" has more teeth than "I wish this was better."


I'm referring to the commit to which I initially replied:


"...is it time to move away from Github?"

Yeah, I reread your comment, but you responded before I deleted apparently :-) My mistake.

> SourceHut over GitHub as a service which can be expected to have better uptime and performance, because it is a fact

Most of us could throw any of the open source solutions on a $20 Linode instance and probably have excellent uptime. How many active repos do you host, and on how many servers?

About 18K git & hg repositories, for about 13.5K users. We also run about 5,000 CI jobs per week, including for some large projects like Nim and Zig, Neovim, OpenSMTPD, etc. We have 10 dedicated servers at the moment. And I didn't throw an open source solution on these servers - I built these open source services from the ground up.

So you're comparing your scalability with a company with over 40m users and 100m repos.

Can you talk about the geographic distribution of your 10 servers?

I would like to remind you of my earlier point:

SourceHut is not the same scale as GitHub. This does not change the fact that SourceHut is faster and more reliable. We have an advantage - fewer users and repos - but still, that doesn't change the fact that we're faster and more reliable.

This has been objectively demonstrated as a numerical fact:


And yes, 9 of those servers are in Philadelphia (the other is in San Franscisco, but it's for backups, not distribution). That doesn't change the fact that, despite being more distant from many users, our pages load faster. In this respect, we have a disadvantage from GitHub, but we're still faster.

GitHub and Sourcehut are working at different scales. That doesn't change the fact that SourceHut is faster.

I was considering your claim:

> we use a distributed architecture

> SourceHut is faster

I wasn't questioning that some of the web features are fast. I'm sure when Github was 10 servers their pages were fast too. I suspect if I threw Gitlab on a 9-server cluster on AWS they'd also be quick.

Not geographically distributed, but distributed in the sense that different responsibilities of the overall application are distributed among different servers, which can fail independently without affecting the rest. Additionally, the mail system on which many parts of SourceHut relies is distributed in the geographical sense, among the hundreds of thousands of mail servers around the world which have standard and 50-year-battle-tested queueing and redelivery mechanisms built in.

And yes, throwing GitLab on a 9 server cluster on AWS might be fast. But, I'm ready to bet you that SourceHut will be faster than it still, and I have a ready-to-roll performance test suite to prove it. And I know that SourceHut is faster than GitLab.com and GitHub.com, and every other major host, and you don't have to go through the trouble of provisioning your own servers to take advantage of SourceHut's superior performance.

> This has been objectively demonstrated as a numerical fact:

While your tests are indeed objective, I don't think they're very useful. For example, why does your performance test ignore caching?

GitHub's summary page loads 27KiB of data for me unauthenticated, which is about 6% of the 452KiB you're displaying in your first table. The vast majority of developers who browse GitHub will not be loading 452KiB of static assets every single page load.

Anecdotally, GitHub's "pjax" navigation feels about as fast as SourceHut on my aging hardware.

Even with caching, SourceHut is a lot smaller than that. SourceHut benefits from caching, too - the repo summary page comes from 2 requests and 29.5K to 1 request and 5.7K with a warm cache. And in many cases, the cache isn't the bottleneck, either - dig into the Lighthouse results for specific pages to see a more detailed breakdown.

Thanks for being so transparent about your operations.

Maybe this is somewhere in the manual and I missed it, but do you have some way of automating the configuration of your hosts and VMs? For example, do you use something like Ansible?

No, I provision them manually. Being based on Alpine Linux makes this less time-consuming and more deterministic. At some point I might invest in something completely automated, but right now the manual approach is simpler - and if it's not broke, don't fix it.

Ah, OK. Also, have you written anywhere about why you chose to use colocation rather than VPS (or "cloud") hosting, or leased dedicated hosting for the CI system? If you could use someone else's hardware rather than having to select, buy, and set up your own, then at least in theory, you could spend more time on other things. But I'm sure you have your reasons for making the choice that you did. I'm just curious about what those reasons are, if you're inclined to share.

There are lots of reasons, but the most obvious one is cost. All of SourceHut's servers are purpose-built for a particular role, and their hardware is tuned to that. The server that git.sr.ht runs on is pretty beefy- it cost me $5.5K to build. I paid that once and now the server belongs to us forever. I ran the same specs through the AWS price estimator, and it would have cost ten grand per month.

A little bit off topic. I just wondering it support git write access over https? Not just read only.

No, write access is only supported over SSH, for security reasons. SSH key authentication is stronger than password authentication, and git.sr.ht doesn't have access to your password hash to check anyway.

Yeah I'm sure there are no possible scaling issues between your service (how many people use it / repos acitve) vs github or gitlab....

your $2/month pricing. Is it $2/person/month?


I set up a self-hosted Gitea this year and moved my repos over and couldn't be happier with it. It's faster than GitHub, clones the GitHub design/UI so that everything's where I expect it to be, has a dark mode, and supports U2F. It's easy to deploy, back up, and maintain, the Gitea devs have done a great job.

It's much less complicated (both from an admin standpoint, as well as a UI standpoint) than GitLab. I paired it with a Drone installation (also self-hosted) for CI and (sometimes) CD.

It all works great, and is way easier than I thought. If there's downtime, I'm (usually) in control of when or how long, as I have root on the box.

I'm also not giving my money to a giant military contractor (Microsoft, the owners of GitHub) any longer, which is a huge deal for me from a personal moral standpoint (YMMV).

Positive side for such downtime, turns out people gather to look at your landing/home page now just to see if the service is up. Can the cost of a downtime be effective to grab few customers for your new feature just published on your homepage?

you could move to gitlab, but from what im hearing the pricing is higher than github (is this still true?)

Barring that you always have the tried and true (and for some reason abhorred by start-ups) option of running your own gitea or gitlab instance. Its not hard, and most of this stuff can be done in dockerless containers if you want.

If cloud servers are getting "overloaded" as some commenters say, you could even buy a few racks or u's of colo somewhere or use a cloud provider that isnt the most popular meme on YC. Vultr and ramnode are both good options and youd be supporting a small business, not Bezos next giga-yacht.

Github actually recently reduced their prices to match some of Gitlab's offerings.


>Vultr and ramnode are both good options and youd be supporting a small business, not Bezos next giga-yacht.

vultr is considered a small business now? crunchbase lists them as having 50-100 employees, and they seem to be owned by Choopa, LLC which some sources list as having 150 employees.

GitLab community advocate here, just wanted to share the most up to date GitLab pricing information: https://about.gitlab.com/pricing/ Thanks!

It has been since the Microsoft acquisition.

I chalk most of the early ones to moving services over to Azure.

Lately though, I don't know. Azure is running pretty close to capacity, so maybe it's part of the problem

May it has more to do with their changes in pricing and a surge in uptake of customers?

That would point to Azure hosting. Anyone notice a similiar pattern?

Github is still hosted on AWS though afaik.

As of the end of 2017, they were using their own datacenters.


As many readers are stating there seem to be larger, internet-wide issues today in US.

For hosting your own repos AWS CodeCommit works very well.

1) Microsoft took over 2) M$ migrates some ADO (Azure DevOps) features to Github (e.g., Github Actions) 3) If Github was not on Azure before M$ bought it (very likely, but needs citation) they will probably migrate to Azure at some point

I'm pretty sure Github Actions work predates the MS acquisition... I'm also pretty sure that they are trying to align the backend systems more to Azure, but have no insight into how much of that took place.

The fact that you used "M$" indicates that you are predisposed to blame Microsoft for actions that are likely not from the parent, and discount any changes from the top down that have occurred within MS. And while I have a lot of issues with MS and Windows in particular, MS today is not the same as MS even a decade ago.

I would guess that since they introduced free private repos, usage has been increased a lot. Eg. I used to use bitbucket but switched over to github when they did that, cus the github desktop program is nice and works a lot more smoothly with gtihub as opposed to bitbucket.

I don't think it's just GitHub. I noticed everything seemed slow but thought it was just my ISP. Turns out from looking at downdetector.com there seems to be a large spike in reported problems across many sites and providers, all occurring at the same time.

Yeah apparently Centurylink and Comcast are having issues as well. Customers have been complaining about slow load times with our products today. Coworkers complaining about slow load times in general and especially with Salesforce. I’m being rerouted from my local CloudFlare data center to Chicago as well.

Not entirely sure what’s up, but this doesn’t thing like an issue limited to only GitHub.

I thought this at first too but downdetector sorts all the problem sites to the top which makes it look like the internet is melting when you first look at it. It actually seems reasonable for this many sites to be having issues at any one given time.

Maybe the cloud providers are running out of capacity?

Maybe The Cloud is full.

Yes, saw similar notice from bitbucket

GitHub is having major availability issues these past three months. 4 incidents in Feb, 4 in March and 6 in April so far.

Source: https://www.githubstatus.com/history

I look forward to reading root cause analysis promised by Nat in Feb: https://twitter.com/natfriedman/status/1233079491204804608

Azure is also refusing to allocate me capacity - I'm wondering if this is a general MSFT outage?

Here in Minnesota, several coworkers and I are having trouble with our ISP. And a website we host is (apparently) being DDoSed. And now GitHub's down, and you're reporting some Azure issue. Is something going on...?

No, I don't think so (and I'm also in Minnesota). I'm going to guess that the increased load is just pushing services over the edge.

Edit: Also, interestingly enough, I am now reliably hitting Cloudflare's ORD (Chicago) datacenter instead of MSP. If you visit https://snazz.xyz/cdn-cgi/trace (or any other Cloudflare-backed website), what comes after the COLO= for you?

Getting rerouted to ORD instead of DEN as well. Coworkers have been complaining about slow internet all day, there’s been customer complaints about slow load times with our products as well.

You can test Cloudflare at https://cloudflare-test.judge.sh/#snazz.xyz

So ORD is probably just higher-capacity, which is why free plan users like me are getting routed to it?

Most likely, the other/normal DC is likely either overloaded or has part of its hardware under maintenance.

STL! Ha. Wonder what's going on. I'm on CenturyLink here in the Twin Cities.

Yeah, that is strange. I’m on Comcast. They’re probably just attempting to reroute traffic to higher-capacity links at the expense of latency.

Also having erratic few minutes outage from my ISP in the UK, which never happened for years.

Global network issue somewhere?

Also having issues with CenturyLink fiber at home in MN. Stuff is very slow today.

They posted a warning that the uptick in Teams usage and general cloud use has meant they have much less spare capacity - see https://azure.microsoft.com/en-us/blog/update-2-on-microsoft...

Sure, just anecdotally this is my first time running into this so it might be that capacity issues are particularly acute right now.

Even that article states that there hadn't been any service disruptions in the US up to now, but I am certainly experiencing it now.

Or their data centers are just full. Teams had a tremendous growth, so did their O365 suite and probably also Azure itself. At some point even the largest vendors run out of servers.

I had never heard of Microsoft Teams until yesterday when I saw it on a TV commercial.

For companies that live on Office 365, it is very popular. With COVID-19, it became a very quick add-on for groups that were already in the O365 ecosystem to add group chat functionality, pretty much with a few clicks. My son’s school was using it before they went remote, but now their usage has increased dramatically.

I wonder how much the load will change once schools close for the summer. All of these attempts at remote teaching and learning will stop over a period of several weeks, and I imagine a large amount of capacity will be available again for a while?

In my experience, nothing is correlated more with downtime than code changes. Github has been pushing a lot more features since the Microsoft acquisition, and has felt down a lot more often since then.

There has been one major outage a month the last several months, with sprinklings of little outages (I recall webhooks down quite a lot). The cadence of these outages is rapidly changing my perception of Github as a reliable service.

At my organization, we saw an uptick in timeouts spanning vendors that started at the same time - 6:55am PST. Makes me think there's an internet wide event occurring right now

Quite a few issues being reported here: https://www.thousandeyes.com/outages

Which providers are you seeing issues with? I'm curious if we can corroborate.

I was referring to WorldPay and Adyen (payment providers), but also saw issues directly w Github.

We were experiencing issues with Netlify at 6:15am PST.

CenturyLink ISP is having issues: https://downdetector.com/status/centurylink/

Today marks the 3rd time I've broached the topic w/ management of getting the self-host enterprise option... Compared to (public) GitHub's problems this year so far, our AWS EC2 instances are orders of magnitude more reliable. Sure, the internet can still go down, but my VPN into us-east-1 from Texas has been unbroken for weeks now.

At this point I'd almost prefer to pull it all in-house and manage it myself so the entire team doesn't have to lose a whole day of productivity over all this. I am so glad we moved away from using GH Actions for builds because we would be absolutely hosed right now on supporting our customers.

I'm not sure the irony is lost on you that; people generally prefer services like github (and; similarly EC2) precisely _because_ it's not on-prem and that if it's down there are hundreds of talented engineers working to resolve those issues.

Unfortunately anecdotal experience here is spotty. Services I run for my team are several 9's higher in terms of actual availability (note: I did not say "uptime"); and contrarily other internal services at my company have many times less reliability than services like github.

Given that people prefer external hosting for the reasons I mentioned, for many I think pulling it in-house is unappealing.

Does the GH Enterprise option not provide some degree of support with initial setup and configuration? Is the enterprise support more or less responsive than the public channels? I would expect those same engineers are also responsible for maintaining the enterprise offering.

Also, does isolation of a private GH instance from the public instances not provide some degree of added reliability considering the potential for DDOS or simply extreme load?

I absolutely grant you the IT infrastructure concerns. BUT Amazon is our vendor on that. GitHub provides the software. It's not like I'd be standing up a new series of physical hosts to run an on-prem GitHub built from source and managing all of the hell around that. This would be simply putting a GH-provided image on a EC2 instance and making sure we have frequent snapshots.

My typical response from GitHub Enterprise support has been fantastic over the years. I'll also note that after GitHub Actions/Packages came out on Github.com the lead time for support response did increase substantially for non critical tickets due the increased support burden for those new services. My most recent tickets have been answered promptly so they must have figured out the staffing issues.

You should be well aware that actual architecture of GitHub Enterprise isn't truly highly available (1 active and 1 or more standby instances) unless you are a huge customer to support clustering mode. Which means you likely need to take downtime to implement upgrades since they typically require rebooting the VM.

I work for a fortune 100 company and GHE has been a nightmare. Not only do you need a team to maintain it, but then that team needs to be equipped enough to help support any other internal services (CI/CD pipelines) that integrate with it. Github.com is just an infinitely more enjoyable experience.

There are larger problems I think... our call center is having trouble with Five9 in San Francisco.

I also I tried setting up an EC2 instance in North Cali and just pinging it from Kansas City via Google fiber experienced 37% packet loss.

same with Five9 from the KC Metro, also accessing azure resources. No issues on my Google Fiber. But we have agents all over the midwest, all having issues.

Created https://gitbackup.org for this very reason.

Where do you store all that data?

Should only be 2-3 PB. Storing it all on https://tardigrade.io

Only 3 petabytes? At $0.01/GB/Month, it seems that works out to about $360,000/year in storage costs alone. Cheers, but you're paying for this just as a public service?

There is discount pricing over 100 TB. Its a tech demo so Tadigrade.io sponsors it. Pretty powerful demo on easily storing large amounts of data especially when Github goes down.

The site says "Storing 542 TB" though :D

Have been increasing sync speed over time. It about double where it was a few weeks ago. It definitely does take some time to upload so much data.

Out of curiosity how do you or how to you plan to monetize this? Seems like it would get expensive really quickly.

It is a tech demo of https://tardigrade.io, because easily storing all of Github certainly gets peoples attention.

Its only expensive because cloud providers overcharge. The listed prices on Tardigrade start at half of Amazon S3, and there are significant discounts if you are storing more than 100 TB.

Why not make two versions of GitHub, one free and one paid, having professional-level uptime and support? Is that the idea behind GitHub Enterprise (https://github.com/enterprise)?

Yes. Enterprise has an SLA (when GitHub hosts it) and an option to self-host.

Oh okay, I didn't know GitHub had an option to host Enterprise. (It's way overbudget for me right now, but just curious.)

Who would have guessed that handing control of a company to Microsoft would make it less reliable.

Git pushes are going really slow for me, but they do work after a few tries. If you normally use a credential cache and it's asking you for your password again, hit control-C and retry the operation. That seems to have gotten my last few pushes to work.

got my push to go through, unfortunately downloading packages from https://codeload.github.com was timing out

CircleCI's having a lot of trouble, too. Intermittent 500s, and even the images on the 500 page don't load.

Seems like something bigger might be going on.

What usually happens with CircleCI after a GitHub outage is that they're hit with a flood of GitHub webhooks when GitHub comes back online. Then Circle starts to slow down under the load of all of the new jobs that have been queued.

One usually causes the other, so we shouldn't infer anything larger from that datapoint.

In this case, CircleCI and Github were down at the same time, and seemed to come out of it around the same time too. I don't know what to infer from that either.

I noticed it being more of a centurylink issue.

I think my biggest nightmare is SO, GH, and HN all being down at the same time.

Edit: A nightmare is an exaggeration. But it would slow down work.

Reddit and Rocket League servers were degraded simultaneously yesterday, and if I recall correctly they both host on Google Cloud (I may be mistaken there). My paranoia of a high impact state-sponsored cyber attack have been high!

Yup. Google Cloud seems to be having frequent problems lately.

However, I feel like GitHub being down AND Stack Overflow would slow down work. And HN is a good place to know what's going on.

Oh for sure. Reddit and Rocket League being down just mangled my relaxation time, GH and SO would have much worse material consequences.

I think AWS was having issues yesterday and today as well. Do you think someone is targeting cloud providers?

Could always just RTFM if SO goes down ️

I knew this will be here lol

As a huge fan of JetBrains products (I use WebStorm, PyCharm, DataGrip in roughly equal parts) I am considering:


Anybody else got experience with that and/or TeamCity?

We use TeamCity and can't say I have any complaints. We utilize the tagging feature pretty heavily in our deployments and source all artifacts from TeamCity builds. The builds also post success or failure to Gerrit.

I feel compelled to repeat my earlier sarcasm:

If only there was some kind of distributed version control system. /s

I don't feel bad being an "old fogie" who demands local copies of my dependencies.

Github is as much or more a collaboration tool as a Git upstream. The Git part is easy to distribute. The collaboration tool, not so much.

And to anticipate the likely next argument - no, mailing lists are not a better tool for collaboration. They are distributed, sure. In so saying, I have exhausted the list of their virtues.

The "we deploy production directly from Github" pathology is the one I'm talking about. Not being able to "collaborate" for a brief outage is fine. Not being able to deploy code isn't.

You gotta CI/CD from somewhere. Whatever that somewhere is, it can go down.

You can, of course, override your CI/CD and do a manual deploy. It's not a matter of "can." It's a matter of not fully understanding all the checks the CI/CD system does to the code, and all the build steps for prod builds, and therefore not having the confidence to deploy to prod without CI/CD holding your hand. (Which I don't at-all blame devs in bigcorps for not knowing; release management is its whole own thing, and division of labor means that at sufficient scale it doesn't make sense to learn about it.)

Well, the place I run my CI tests is on a VMware virtual machine, e.g.

  #!/bin/bash -e
  git clone https://git.example.com/myrepo
  cd myrepo
  sh do.tests
If the tests are automated enough to run with every Git checkout, they can be easily enough be run by hand too.

The real world script I use is a little more complicated, because the code base in question is two decades old so I need to make some changes to how the code looks to the tests so that the tests can run.

I once worked for a company which had a series of tests which took eight to ten hours to run. Running those tests with every single Git checkin was out of the question; we instead used cron to run the tests every night.

Part of the point of having release infrastructure is so that every engineer doesn't have to know how to manage a deployment, even. Having worked both with and without it, I have to say I favor the former. The occasional loss of productivity due to a service provider outage, or even due to whatever the hell's going on today, is vastly outweighed by the day-to-day baseline requiring next to no time spent on the mechanics of taking code from review approval to live in prod.

As a side note, gotta say, scare-quoting "collaborate" seems like a pretty weird flex in a time when everybody's stuck working from home whether they prefer it that way or not. Maybe you prefer to work entirely within a silo, rarely dealing with colleagues in the course of your day-to-day. Most don't, nor should they. Most kinds of work, even the kinds of work that we do, really aren't better done that way.

I worked in a silo for years, not so much by choice as by virtue of the fact that my company at that time could not afford another engineer of similar capability. I got by well enough, thanks in no small part to a youthful talent for improvisation, and the fact that things were simpler then. I wouldn't go back to that life if you paid me. But even if I would, I couldn't. Times change, and the time of the engineer as hermit has largely passed. There are things about it I miss, sure. But I don't miss them so much that I want to imperil my ability to keep earning a living by ignoring that the world has moved on.

edit: Of course, this ignores the difference between modern dev work, and modern sysadmin (as opposed to devops) work. It probably doesn't make much sense to analyze the latter as though it were the former. Or vice versa.

In my experience, there are a lot of developers who get really uncomfortable once they start having more developers working on their code. There’s a lot of effort which has to be done documenting the code: Either by explaining how the code runs to other engineers on the team, or by having documentation showing how the code is arranged and runs, something a lot of engineers are not very good at doing.

It takes a lot of talent to do collaboration well, because it means having to let go of the code being “my code”, and it means having a lot of people skills, such as the ability to empathize with how someone not familiar with the code will see things, and how to write documentation about the code structure so that things can be in maintainable “boxes”.

I mean, that's fair, but growing past that is also part of learning to be an effective engineer. It's not always a comfortable process - it sure wasn't at first for me! - but that's often true of any kind of growth.

I think it's better described as a skill than as a talent, too. The former is learned, the latter innate, and as someone with absolutely none of the latter, I can attest that the former is attainable nonetheless.

You read a lot more into that that I intended.

I'm suffering some fatigue w/ the word "collaborate" being thrown around in marketing collateral right now. You used the word several times, and I engaged my "marketing avoidance" mode.

That's fair. My circle includes several acquaintances who prefer talking loudly about how engineering should be, to accomplishing much of anything in the realm. So, similarly overfitting, I engaged a mode of my own.

Not sure why you're getting downvoted, but yes, there are plenty of businesses that can't afford an outage in the push-to-prod path.

But it might be worth keeping in mind that there are plenty of businesses that can afford afford delays, especially if they can save money in exchange for that higher risk.

As always, engineering isn't about the perfect solution for everybody, but the right tradeoffs for your particular use case.

All those deployments typically rely on:

1. Scripts available in the git repository.

2. Parameters that are either stored (possibly encrypted) in a filesystem, on a parameter server (like AWS SSM key manager), or in 1Password/LastPass.

It's only a pathology if a team couldn't, as a worst case, coordinate a deployment from a single machine over the phone.

Our nonprofit offers both privacy friendly cloud storage and gits with gitlab. We also offer gitlab hosting for private instances. https://git.stealthdrop.cloud and https://my.stealthdrop.cloud. Our service are free and we are 501c3 donations are deductible.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact