...is it time to move away from Github?
Of course, I'm not actually internal to Microsoft or Github, so I have no idea and it's all opaque to me.
Also, I'm still anticipating a fuller report on the database issues they mentioned have been the root cause of many outages over the past few months.
This page https://web.archive.org/web/20190801000000*/https://github.c... has been “500 internal server error” since late-December (globally it seems). Nobody cares (“not a priority” (c) support), nothing on githubstatus.
 blue circles on web.archive are errors too
It seems like the historical uptime page paints a far rosier picture than I am actually experiencing.
If you want to see more the state of github "extras", you'd need to select "github actions" or "webhooks", which have a fair amount of downtime (about once a week or so, which seems about right).
Interesting how the most stable component of the company is the open source one of course ^^
Much if the work is just minimizing the impact of this changes by finding them before customers do.
This includes things like unit, integration testing, canaries, cellular/ zonal / regional deploys, auto rollbacks, multi-hour bakes, auto load tests, and much much monitoring. Not to mention cross team code reviews, game days, ops reviews.
It's also the highest performance software forge by objective measures:
Full disclosure: I am the founder of SourceHut.
SourceHut is at least 10x lighter weight and has a distributed, fault tolerant design which would allow you to continue being productive even in the event of a total outage of all SourceHut services.
When most people talk about “modern web” or modern anything in software they think it means “using all the latest tools”.
That often means things like ES6 and Webpack, which have nice surfaces, but which create nightmares under the hood.
That’s the opposite of what modern architecture was. It was about embracing the constraints of materials. Given the properties of concrete, what is the limit of what you can do with it. Go there, and no further. And don’t cover it up, just finish the dang slab and get on with the rest of the house.
ES6 means transpiling, which means webpack, which means a massive machine of hidden complexity, which if you’re lucky exposes a nice smooth surface where everything is arrow functions and named exports. And if you’re unlucky is a flimsy piece of cardboard over the nightmare underneath.
You (SourceHut) seem to be building a UI that actually takes note of how the browser is. And you are trying to push the big numbers... how reliable your service can be, how many endpoints can one person maintain, while letting the materials of the web (forms, urls) dictate the details.
That’s true modernism.
So, bravo. I’m glad to see you out in the world. It takes courage to step outside of the norm and I’m rooting for you.
And I believe sr.ht would beat out GitHub at their scale anyway. The services are an order of magnitude more lightweight. And the design is more fault tolerant: we use a distributed architecture, so one part of the system can go down without affecting anything else - as if GitHub's issues could go down without anything else being affected. And many of our tools are based on email, a global fault-tolerant system, which would allow you to get your work done more or less unaffected even if SourceHut was experiencing a total outage. We'd automatically get caught back up with what you were up to in the meanwhile once we're online, too.
I've spoken to GitHub engineers about some of the internal architectural design of GitHub, too, I'm confident that SourceHut's technical design beats out GitHub's in terms of scalability. And, despite already winning by a good margin, I'm still spending a lot of effort to push the envelope further on performance and scalability.
And then you go on to say it. I'm glad that SourceHut exists, and I like many of its principles, and it's probably better designed too, but walking into a thread where someone is having an outage and then claiming that you'd do much better is in poor taste no matter how good you are or how many of your services work offline.
Redundant power supplies
>pushing a bug into production
Nothing, but again, SourceHut is demonstrably better in this regard: because it's distributed, a bug in production would only affect a small subset of our system, and the system knows how to repair itself once the bug is fixed.
And I don't think I need to apologise for kicking Golaith while he's down. Someone said they want alternatives, so I pitched mine with specific details of how it's better in this situation, and that doesn't seem wrong to me. I would invite my competitors to do the same to me. We should be fostering a culture of reliability and good engineering - and if I didn't hold my competitors accountable, who will? "Here's an alternative" has more teeth than "I wish this was better."
"...is it time to move away from Github?"
Most of us could throw any of the open source solutions on a $20 Linode instance and probably have excellent uptime. How many active repos do you host, and on how many servers?
Can you talk about the geographic distribution of your 10 servers?
SourceHut is not the same scale as GitHub. This does not change the fact that SourceHut is faster and more reliable. We have an advantage - fewer users and repos - but still, that doesn't change the fact that we're faster and more reliable.
This has been objectively demonstrated as a numerical fact:
And yes, 9 of those servers are in Philadelphia (the other is in San Franscisco, but it's for backups, not distribution). That doesn't change the fact that, despite being more distant from many users, our pages load faster. In this respect, we have a disadvantage from GitHub, but we're still faster.
GitHub and Sourcehut are working at different scales. That doesn't change the fact that SourceHut is faster.
> we use a distributed architecture
> SourceHut is faster
I wasn't questioning that some of the web features are fast. I'm sure when Github was 10 servers their pages were fast too. I suspect if I threw Gitlab on a 9-server cluster on AWS they'd also be quick.
And yes, throwing GitLab on a 9 server cluster on AWS might be fast. But, I'm ready to bet you that SourceHut will be faster than it still, and I have a ready-to-roll performance test suite to prove it. And I know that SourceHut is faster than GitLab.com and GitHub.com, and every other major host, and you don't have to go through the trouble of provisioning your own servers to take advantage of SourceHut's superior performance.
While your tests are indeed objective, I don't think they're very useful. For example, why does your performance test ignore caching?
GitHub's summary page loads 27KiB of data for me unauthenticated, which is about 6% of the 452KiB you're displaying in your first table. The vast majority of developers who browse GitHub will not be loading 452KiB of static assets every single page load.
Anecdotally, GitHub's "pjax" navigation feels about as fast as SourceHut on my aging hardware.
Maybe this is somewhere in the manual and I missed it, but do you have some way of automating the configuration of your hosts and VMs? For example, do you use something like Ansible?
It's much less complicated (both from an admin standpoint, as well as a UI standpoint) than GitLab. I paired it with a Drone installation (also self-hosted) for CI and (sometimes) CD.
It all works great, and is way easier than I thought. If there's downtime, I'm (usually) in control of when or how long, as I have root on the box.
I'm also not giving my money to a giant military contractor (Microsoft, the owners of GitHub) any longer, which is a huge deal for me from a personal moral standpoint (YMMV).
Barring that you always have the tried and true (and for some reason abhorred by start-ups) option of running your own gitea or gitlab instance. Its not hard, and most of this stuff can be done in dockerless containers if you want.
If cloud servers are getting "overloaded" as some commenters say, you could even buy a few racks or u's of colo somewhere or use a cloud provider that isnt the most popular meme on YC. Vultr and ramnode are both good options and youd be supporting a small business, not Bezos next giga-yacht.
vultr is considered a small business now? crunchbase lists them as having 50-100 employees, and they seem to be owned by Choopa, LLC which some sources list as having 150 employees.
I chalk most of the early ones to moving services over to Azure.
Lately though, I don't know. Azure is running pretty close to capacity, so maybe it's part of the problem
The fact that you used "M$" indicates that you are predisposed to blame Microsoft for actions that are likely not from the parent, and discount any changes from the top down that have occurred within MS. And while I have a lot of issues with MS and Windows in particular, MS today is not the same as MS even a decade ago.
Not entirely sure what’s up, but this doesn’t thing like an issue limited to only GitHub.
I look forward to reading root cause analysis promised by Nat in Feb: https://twitter.com/natfriedman/status/1233079491204804608
Edit: Also, interestingly enough, I am now reliably hitting Cloudflare's ORD (Chicago) datacenter instead of MSP. If you visit https://snazz.xyz/cdn-cgi/trace (or any other Cloudflare-backed website), what comes after the COLO= for you?
Global network issue somewhere?
Even that article states that there hadn't been any service disruptions in the US up to now, but I am certainly experiencing it now.
There has been one major outage a month the last several months, with sprinklings of little outages (I recall webhooks down quite a lot). The cadence of these outages is rapidly changing my perception of Github as a reliable service.
At this point I'd almost prefer to pull it all in-house and manage it myself so the entire team doesn't have to lose a whole day of productivity over all this. I am so glad we moved away from using GH Actions for builds because we would be absolutely hosed right now on supporting our customers.
Unfortunately anecdotal experience here is spotty. Services I run for my team are several 9's higher in terms of actual availability (note: I did not say "uptime"); and contrarily other internal services at my company have many times less reliability than services like github.
Given that people prefer external hosting for the reasons I mentioned, for many I think pulling it in-house is unappealing.
Also, does isolation of a private GH instance from the public instances not provide some degree of added reliability considering the potential for DDOS or simply extreme load?
I absolutely grant you the IT infrastructure concerns. BUT Amazon is our vendor on that. GitHub provides the software. It's not like I'd be standing up a new series of physical hosts to run an on-prem GitHub built from source and managing all of the hell around that. This would be simply putting a GH-provided image on a EC2 instance and making sure we have frequent snapshots.
You should be well aware that actual architecture of GitHub Enterprise isn't truly highly available (1 active and 1 or more standby instances) unless you are a huge customer to support clustering mode. Which means you likely need to take downtime to implement upgrades since they typically require rebooting the VM.
I also I tried setting up an EC2 instance in North Cali and just pinging it from Kansas City via Google fiber experienced 37% packet loss.
Its only expensive because cloud providers overcharge. The listed prices on Tardigrade start at half of Amazon S3, and there are significant discounts if you are storing more than 100 TB.
Seems like something bigger might be going on.
One usually causes the other, so we shouldn't infer anything larger from that datapoint.
Edit: A nightmare is an exaggeration. But it would slow down work.
However, I feel like GitHub being down AND Stack Overflow would slow down work. And HN is a good place to know what's going on.
Anybody else got experience with that and/or TeamCity?
If only there was some kind of distributed version control system. /s
I don't feel bad being an "old fogie" who demands local copies of my dependencies.
And to anticipate the likely next argument - no, mailing lists are not a better tool for collaboration. They are distributed, sure. In so saying, I have exhausted the list of their virtues.
You can, of course, override your CI/CD and do a manual deploy. It's not a matter of "can." It's a matter of not fully understanding all the checks the CI/CD system does to the code, and all the build steps for prod builds, and therefore not having the confidence to deploy to prod without CI/CD holding your hand. (Which I don't at-all blame devs in bigcorps for not knowing; release management is its whole own thing, and division of labor means that at sufficient scale it doesn't make sense to learn about it.)
git clone https://git.example.com/myrepo
The real world script I use is a little more complicated, because the code base in question is two decades old so I need to make some changes to how the code looks to the tests so that the tests can run.
I once worked for a company which had a series of tests which took eight to ten hours to run. Running those tests with every single Git checkin was out of the question; we instead used cron to run the tests every night.
I worked in a silo for years, not so much by choice as by virtue of the fact that my company at that time could not afford another engineer of similar capability. I got by well enough, thanks in no small part to a youthful talent for improvisation, and the fact that things were simpler then. I wouldn't go back to that life if you paid me. But even if I would, I couldn't. Times change, and the time of the engineer as hermit has largely passed. There are things about it I miss, sure. But I don't miss them so much that I want to imperil my ability to keep earning a living by ignoring that the world has moved on.
edit: Of course, this ignores the difference between modern dev work, and modern sysadmin (as opposed to devops) work. It probably doesn't make much sense to analyze the latter as though it were the former. Or vice versa.
It takes a lot of talent to do collaboration well, because it means having to let go of the code being “my code”, and it means having a lot of people skills, such as the ability to empathize with how someone not familiar with the code will see things, and how to write documentation about the code structure so that things can be in maintainable “boxes”.
I think it's better described as a skill than as a talent, too. The former is learned, the latter innate, and as someone with absolutely none of the latter, I can attest that the former is attainable nonetheless.
I'm suffering some fatigue w/ the word "collaborate" being thrown around in marketing collateral right now. You used the word several times, and I engaged my "marketing avoidance" mode.
But it might be worth keeping in mind that there are plenty of businesses that can afford afford delays, especially if they can save money in exchange for that higher risk.
As always, engineering isn't about the perfect solution for everybody, but the right tradeoffs for your particular use case.
1. Scripts available in the git repository.
2. Parameters that are either stored (possibly encrypted) in a filesystem, on a parameter server (like AWS SSM key manager), or in 1Password/LastPass.
It's only a pathology if a team couldn't, as a worst case, coordinate a deployment from a single machine over the phone.