Hacker News new | past | comments | ask | show | jobs | submit login
A fork on GitHub is no fork (niels-ole.com)
490 points by nielsole on March 16, 2018 | hide | past | favorite | 229 comments

As someone who runs a bunch of private repos on Github, and hires freelancers and vendors to work on them, I like the way Github manages forks of private repos. When I end a relationship, and remove that person from my private repo, I want their fork of my code to go away too.

Why? Because it's cleaner. It means there are no abandoned copies of my codebase sitting around forever forgotten in random Github accounts. I set up a private repo because I want to control access to my code. That's why a private repo is private.

It also makes it harder for people to misrepresent their relationship to me later on. Sure, anyone with read access to one of my repos can pull it down and save a local copy. But that is different from logging into Github and having a fork of my repo. A Github fork looks and works differently from a local copy.

This whole situation seems weird because the idea of a "private" repo is not inherent to how git was intended to work; it's something Github invented to make money. So I'm not surprised that it violates people's expectations sometimes.

That makes no sense to me. You're relying on people not making their forks out of github. If that strategy works at all, it's incidental and I wouldn't rely on it for anything that matters.

> You're relying on people not making their forks out of github.

I'm not. Did you read my 3rd paragraph?

This is not a security strategy, it is a strategy for managing relationships--which is the purpose of using Github in the first place. There are far more secure ways to manage a git repo than Github, if that is the goal.

> it is a strategy for managing relationships

Not really. That means that you've worked with benevolent actors up until now. It's not that hard to keep a clone of the repo and upload later after the termination of the business rapport, either as a private or a public one.

The people who have access to my private repos may or may not be benevolent, but they are certainly under contract, and that's the relationship that Github is helping me manage.

The contract says that the codebase is mine, and the contractor doesn't have rights to use it for their own purpose. The Github private repo rules just make that real in a practical sense. It means the default way of working comports with what the contract says. It makes it easy for everyone to do the right thing, and harder to do the wrong thing.

Sure, a rogue contractor could re-upload my codebase to a new repo in their account later, but that is a pretty obvious violation of the contract. If I found out, I'd feel pretty confident suing them (if it came to that). There's nothing they can point to that says I authorized them to do that.

In contrast, if the default was to leave an official fork of my code in their Github account forever, then it becomes easier for them to do the wrong thing, and harder for me to say I didn't want them to do it. Some future employee could even do the wrong thing accidentally.

To reiterate, Github private repos help me manage my relationships. They don't enforce the terms of my relationships.

You know how git work right? It is decentralised! I don't see any added value! When I work on a project I always have the master, develop and feature branche(s) pulled from the repo on my machine. I could just create a new private repo or even a public one and push my local branches.

But unless you are giving your contractors a locked-down internet-disconnected laptop and physically locking them in an empty room, you need to trust that they aren't going to steal your code like that anyway.

Certainly. I---and some others, seemingly---was just saying that what Github does is not that much of an help (which is what seems to be meant at the thread root). The keyword is contractor here, as it should be the contract that protects you from theft &c. Though I'm not much knowledgeable about business, so maybe I'm failing to see how that's useful.


I think he was very clear that his approach had nothing to do with the security of the actual code.

Yes, and nobody is disputing that.

> It's not that hard to keep a clone of the repo and upload later after the termination of the business rapport, either as a private or a public one.

that's not a github fork; and as they said, it's not really a 'security strategy' (mostly it's a bit of accounting).

Sure, but it won't say "forked from" on it.

Shifting goalposts here- thread-parent said "I want their fork of my code to go away too," not "I want their fork of my code to no longer say 'forked from' on it".

If removing the relationship tag was the only purpose of this, GitHub could just remove the relationship tag, not delete the whole repository (throwing out the baby with the bathwater).

This doesn't look like shifting goalposts to me — it looks like you misunderstood where the goalposts were to begin with. Read the original comment charitably rather than with an eye for holes and I think you'll see that it wasn't about persistent malicious actors, just about what they see as sane defaults for people's relationship with your codebase. Once you're no longer associated with their private codebase, they think it makes sense for you to no longer have an official copy of it in your Github account.

This is what I meant, in fewer words. Thanks.

> Shifting goalposts here- thread-parent said "I want their fork of my code to go away too," not "I want their fork of my code to no longer say 'forked from' on it".

in this context, those mean the same thing. if you copy the code it's not a "fork" in github terms.

> Shifting goalposts

Yup. So I'm not the only one seeing it then.

This is not what people expect, so that's one obvious reason it's very likely the wrong decision. I would have never known this. If you are relying on a public external product on github, you have to go several painful steps to make it 'safe', you have to re-upload it, and there's no good reason to make people do it. It's a giant disincentive to fork things on github. To be safe of finding things later, there will probably be another git repo website that will agree to "host" your forked projects safely, so that you can access them.

I understand on the other side, there's value in being able to stop people who forked your code. But it's a very weak stop-ability, because they could just keep the code locally and re-upload.

What does this have to do with public projects? This is entirely about private repos.

My understanding from reading that was that even on a "public" project the owner can delete it, change it's status, and it disappears from availability.

Fortunately not. As noted in the article[1] linked in the OP:

> Deleting a public repository

> When you delete a public repository, one of the existing public forks is chosen to be the new parent repository. All other repositories are forked off of this new parent and subsequent pull requests go to this new parent.

> Changing a public repository to a private repository

> If a public repository is made private, its public forks are split off into a new network. As with deleting a public repository, one of the existing public forks is chosen to be the new parent repository and all other repositories are forked off of this new parent. Subsequent pull requests go to this new parent.

[1]: https://help.github.com/articles/what-happens-to-forks-when-...

Thanks for the update. I can't add a post-comment to my question because it's been to long, but I was wrong.

> You're relying on people not making their forks out of github.

That makes perfect sense to me; the fork ON github is what lends the fork legitimacy versus "some dude found code."

But they can still download the source and re-upload it to GitHub.

It's really a false sense of control.

They could do that no matter what. However, they're not able to have an officially backed association with the original.

But it won't say "forked from..."

And how does it matter when it's a private repo?

Imagine you run a website, foobar.com. You have foobar.com github org and foobar private repo (and also some public ones). Its the canonical org/repo. ninja123 has a github fork of it and it relays that from the github ui. "forked from foobar/foobar.com". rockstar123 also has a "fork" but it's just code sitting out there at rockstar123/foobar.com. There is no link to the canonical repo. If you fire ninja123 the link is severed. It just allows a way to show active, approved engagement and also to keep track of approved collaborators.

What if rockstar123 gets arrested for ICO fraud. Do you want others to see he has an official upstream link to your repo even after you have severed the relationship?

What if rockstar123 makes his fork public? It's just messy to keep the link.

I do not want to believe Github allows to make a privately-forked repository public. Other than that, I agree that the link is better removed, but what I don't understand is how that helps with keeping the code itself private, which is what I thought was meant at the root of the thread.

Op is not claiming it keeps it private. It's just used as a tool to signal and track approved, active collaborators to the official upstream. That's beneficial to control.

> I don't understand is how that helps with keeping the code itself private, which is what I thought was meant at the root of the thread.

out of curiosity, what made you think that?

I think you're asking about the part after the comma, so here is the part from the comment that started this thread that makes me think so:

> Why? Because it's cleaner. It means there are no abandoned copies of my codebase sitting around forever forgotten in random Github accounts. I set up a private repo because I want to control access to my code. That's why a private repo is private.


You have foobar private repo bad actor makes foobar public repo edits readme says public edition of foobar repo. People says ooh that's the one I want!

Why would they think that? It won't say "forked from foobar/foobar.com" and it won't be under the footer org.

That's like me make a random repo and adding a readme that says "official public windows 10 source repo"

The op doesn't want the code to be on github and his problem was about his code still being on someone elses accoint as a private fork. It sounded more like "if they get hacked my code isn't there"

Being legit doesn't seem to have any meaning for him or the author of the article. Also the code is the only thing needed for legitimacy. The iphone bootloader code in some repo is still the legit iphone bootloader (as an example)

> That makes perfect sense to me; the fork ON github is what lends the fork legitimacy versus "some dude found code."

No, he made two points. The first one is that this way you don't have your code in random freelancer's accounts, which I'm saying is wrong: a freelancer can take his code, and upload it back to his account.

So if this mechanism makes sense to you as a way of guaranteeing that your code won't be in other people's accounts, you're mistaken too.

In fact if you don't believe me give me access to your private repos and lets see what happens.

You're misunderstanding the intent I think. This isn't a guarantee that the code is kept private. It's a guarantee that the other copies don't have a Github "fork" relationship to the true repo.

Nobody is disputing that access to the private repo allows for a copy to be kept. But it's an "island", not a fork.

> So if this mechanism makes sense to you as a way of guaranteeing that your code won't be in other people's accounts, you're mistaken too.

Come on, no one is arguing that, and I'm sure you know that.

He said it's "cleaner", not more secure. Yes anybody can re-upload his code to a random github account, but then it's easier to see it's stolen code in contrast of having a github fork of the private repository.

While yes, you can simply create a new repo on GitHub with the local copy and name it similarly, it will not be linked to the original as a fork on GitHub.

"So if this mechanism makes sense to you as a way of guaranteeing that your code won't be in other people's accounts"

Nobody but you is claiming this.

Me and the person I was responding to.

> It means there are no abandoned copies of my codebase sitting around forever forgotten in random Github accounts.

I would be willing to bet that it would be fairly trivial for Github to identify fraudulent repos like that.

It is a way of avoiding that all too common situation:

- collaborator get a copy of your code, works a bit on it;

- collaboration stops;

- a few months pass and you start collaborating with the same person again;

- collaborator then pushes some changes based on outdated information and breaks everything

> collaborator then pushes some changes based on outdated information and breaks everything

github does have better ways to prevent that. that's the point of using git, right?

But nothing is stopping them from cloning and pushing up to BitBucket or a private git hosting service. You merely have the appearance of cleanliness with none of the security.

The only way to really protect yourself is with a contract that stipulates the repo and any clones a contractor may have access to must be destroyed upon termination of the contract.

This has nothing to do with security. Would you upload secrets into a private Github repo? I wouldn't. Private repos are not more secure than public repos.

Github is a tool for collaboration. The point of a private repo is to manage who you are collaborating with. Once I don't want to collaborate with someone anymore, there is no reason for them to have a Github fork of my code anymore. As I said, it's cleaner.

I'm fully aware that they can save a local copy (did you read my 3rd paragraph above?), and of the importance of contracts.

Access control is 100% security. If you have a private GitHub repo, you have uploaded secrets. All that code is your secret. You pay GitHub to protect all the copies they know about, but you are fooling yourself if you rely on that after you've given access to that private repo away. You fundamentally can't revoke knowledge, just future access.

No, secrets are pieces of information that you could use to compromise my production environment--like passwords or private keys. Those should never be uploaded to Github, period.

You're talking about credentials. Secrets are often credentials, but more generally refer to anything confidential.

Code doesn’t necessarily have to be confidential. “Secret recipe” algorithms, maybe.

It’s about trade-offs. If you must have “trade secrets” amongst your code, perhaps you can keep that code available only to whom you trust will follow your wishes with it, and have separate repos to share wherein you do not care as much about collaborators “stealing” the code.

As soon as you let someone into your codebase, almost anything can happen, right?

There are cases where it would make sense to sacrifice risks related to code-copying in order to have faster/better development.

Also, what’s stopping anyone working at ANY company from taking the company code and doing whatever they want?

Code is seldom everything.

as far as i can tell, they didn't speak to "revoking knowledge", and did speak to "future access"... so your comment reads strangely to me.

"Nothing"? Obviously that's untrue. There are many things stopping them. Professionalism, for example.

There's "nothing" stopping someone at a job from printing every single one of their corporate emails and putting them into a binder to save forever, even after they've left the position. Nothing except professionalism, a sense of business ethics, the law, and the fact that it's a huge hassle.

This seems like a problem that's not a problem. If you are working in a private repo then there should be some formalized arrangement on who owns the code and who doesn't, and that would be the originator of the private repo. If, for some reason, that relationship is less clear, then as you say other contributors can just create their own clones of the repo elsewhere. If the obvious relationship is one of collaborative ownership, then a public repo is the obvious choice.

Heck, nothing is stopping them from cloning and pushing to _github_ as an independent repo.

Except if that were as easy and obvious as forking, the OP wouldn't be complaining about it, heh.

> But nothing is stopping them from cloning and pushing up to BitBucket or a private git hosting service.

...which is not a github fork, and not what they're talking about.

For that matter, what's stopping them from pushing a clone to Github instead of forking through Github's web interface?

That's naive wishful thinking at best. The cons far outweighs the pros on this feature.

They're one button away from cloning the entire repo out of Github and bypassing all these fancy access control mechanisms without you even knowing about it.

This is one feature that causes a lot more pain than actual utility.

It doesn't cause pain for my contractors because a) they know how Github private repos work, and b) they don't want a copy of my code in their Github account once our relationship has ended. Killing forks upon removal makes life easier for everyone.

I don't have any particular issue with the way that Github do this, but you're exaggerating the effect of the github hosted fork. Somebody using a private fork to misrepresent their relationship to you seems like a stretch, as all it shows is that you gave them access to your repo, which you in fact did do. Also, not only "could" they clone a local copy, they almost certainly have done so as that's how 99% of work flows go.

Also, git does inherently allow for private repos, otherwise it would have been licenced differently, and it inherently allows for access control (ie repos don't have to be readable without authentication).

Like I said though, github's stance of disabling a private fork of a private repo when the permission is revoked doesn't seem unreasonable to me. I think the key thought missed by the OP was that they were both private repos. Also, they could deal with it differently when it's non-payment that caused the shutdown rather than a deletion/revoked permission

The explicit problem here though is not that the friend disabled the repo and all forks of the private repo got disabled, it't that github disabled the friend's private repo and this caused forks to be disabled.

First one understandable, I fork a private repo I can't expect it to stay forked if for some reason person who owns it doesn't want me to.

Second one is screw github.

> But that is different from logging into Github and having a fork of my repo.

How is it really different? When you push it back up to github it is exactly as if you'd have forked it. The only difference is that it isn't marked as a fork and in this respect not shown when you look at the original repo graph.

It's nice that it works for your usecase, but since it's not a real protection against anything and only looks like a safety measure against abondend accounts that still have a copy of your code, it shouldn't be a feature that's on by default. (Imho)

> The only difference is that it isn't marked as a fork and in this respect not shown when you look at the original repo graph.

Thats the whole point. Of course you can't stop someone from pushing a repo anywhere they want. But imagine you run a website, foobar.com. You have foobar.com github org and foobar private repo. Its the canonical repo. ninja123 has a github fork of it and it relays that from the github ui. "forked from foobar/foobar.com". rockstar123 also has a "fork" but it's just code sitting out there at rockstar123/foobar.com. There is no link to the canonical repo. If you fire rockstar123 the link is severed. It just allows a way to show active, approved engagement and also to keep track of approved collaborators.

What if rockstar123 gets arrested for ICO fraud. Do you want others to see he has an official upstream link to your repo even after you have severed the relationship?

I don't think the existence of a "github fork" shows any active approval. The default behavior for repos is to allow anyone to fork them without requesting approval.

I think a lot of people don't understand your "I want no link on GitHub" argument and try to explain that you can't keep the code.

I'm coming from a different angle. I read the article as "wasn't able to access his fork anymore". If that is true, I think that is bullshit.

Severing the ties to the private repo? Fine, I get the (your?) point. But what I forked should still be there. Previously as "forked from...", now without the link?

Did I misunderstand the article?

What you're looking for is perhaps not a decentralized version control system.

While git is a decentralized version control system, I would argue that using Github is not really decentralized. Even when everyone has a full local copy to work on, the norm is for everyone to set the Github repo as the remote origin and push/pull from there, rather than from each other.

This is in contrast to how the Linux kernel development works, for instance.

Decentralized doesn't necessarily mean peer to peer. In the case of git, it simply means users work with a local repository that can easily be synchronized with another repository. A team contributing to the kernel could very well use github to do so, change their mind and switch their remote to bitbucket and continue the work there. If bitbucket exploded, they could set up their own remote, push any of the local copies and use that. If their internet connections exploded they could print patches and mail them to each other. Then when they're back online and done with their change, they can create a patch file and mail that to Torvalds.

But that's all besides the point. The point is that if you don't want your code laying around where you have no control over it and don't trust your contract and the law to be enough of a deterrent, you shouldn't give people copies of it and probably reconsider what kind of contractor you're willing to work with, rather than loading the term "fork" with an additional meaning.

I mean, you could even do git peer-to-peer via ssh. Each person sets up sshd and a bare git repository, add it to their remote and now they can push to each others "ssh" repositories and then fetch from it when wanted.

Sure, or with patches broadcasted on radio, scribbled on napkins etc.

I don't think you intended to counter my argument but in case anyone is interested in elaboration: my point about centralized/decentralize is that using github doesn't necessarily centralize your work. It's only central in the sense that any other remote repository is: if you can't access it you can't. If github dies, copies of the repository may still exist on a million other computers and any of those can be used as a basis for further collaboration by a wide variety of means. From a peer to peer perspective, github is just a peer like any other.

Even if its not decentralized I can still clone all the code in the centralized version control system.

An other solution would be to create an org, give access to people you want as collaborators and forbid forking. They can work in branches in the main repository, and you can use protected branches to alleviate the risk of incorrect pushes.

When they clone to the machine on which they're going to work, they get a copy of the repo. That's part of the point of git.

Git, by design, is a platform for the sharing of managed text.

But there is no github.com upstream relationship. Sure they can upload it, but it will be obvious it's rogue copy since there is no "forked from.." messaging to the canonical repo.

Why not allow both mechanisms? Real forks and shadow forks that will become disabled when the parent becomes disabled?

The author states that he didnt bother with support (luckily he had a local copy) so here are my 2 cents

I've been in the exact same spot under very similar circumstances. What I can add is that the github support was great, we got it resolved within the working day.

The repo was "deleted" to free up the private repos of my colleague but we didnt know that the fork on my account would go down with it. Its been probably a couple months till we got back to it and noticed the issue. They managed to recover the repo and unblock my fork. I just did a fresh repo and moved everything over.

While it is odd behavior, if its really bad they will probably manage to get you back on your feet.

Also Im not affiliated with github in any way just have some repos from student days over there. If it matters im on gitlab and google repos

So they keep copy of code after it's deleted?

In their business-continuity backups, you'd have to assume yes.

However, I would certainly be surprised if a Github support staff member went and fished out the off-site backup and mounted it to recover one missing repo. You'd hope that would require quite a few staff members to orchestrate.

But in general, yeah. Most engineers don't think about deletion very hard when they start designing a system, and it's also convenient (although with GDPR possibly soon illegal, situation dependent) for auditing and research to merely flag things as deleted and not show them to the customer. As a result, true data deletion generally gets justified away as a bit inconvenient and not really desirable anyway, and almost no software company really deletes anything on demand. Best you can hope for is semi-annual data purges to reclaim disk space.

If this bothers you, consider joining us in the world of self-hosting your services. Gitlab is in my opinion considerably better than Github anyway. Certainly it's worth spinning up a docker container and taking a peek around.

> this taught me not to rely on Github as only storage for code

If you didn't already know this, then... good lesson to learn.

Just like backups: don't rely on a single source of anything.

I say this every chance I get, but Github can and will hold your code ransom if your premium account lapses while you still have private repos. They do not give you the option to set those repos to public, they require you to pay money to regain access.

Maybe you can have them switch the repos to public if you go through their support, but Github doesn't offer that in their ransom note, and that would be an unacceptable solution, anyway.

Whether or not this is a risk to a given developer (or their company) is irrelevant to me, because they still have policy that allows them to hold code hostage for ransom, and that should make Github a complete non-starter when deciding on an SCM host.

Calling it a randsom note is a tad dramatic. You pay them for the service of private repos. If you stop paying, you stop getting the service.

Just like every other business ever.

Generally you are correct: service is paid for -> stop paying -> stop service. However, Github has a free tier for public repositories. Why can one not convert the repository to public then?

I like how Dropbox handles this. If your paid membership is cancelled and you are using more space than the free tier, they don't delete your files, just make them read-only. You can view or download them, but if you want to upload or update something, you have to buy premium or bring your space use within free tier limitations.

People may put all manner of things in private repos that they don't want to be made public, so github shouldn't just expose them to the outside world.

That doesn't mean they couldn't just offer a button to make them public after the fact, I don't think anyone was suggesting to make them public automatically...

GitHub shouldn't make them public, for sure. But people should.

Because github can’t make that decision. It’s your responsibility to convert it to public. There could be sensitive information in the private repos.

...hence why they should give you that option if your paid account expires. It feels like people are being intentionally obtuse in this thread

Nope, that is exactly what it is. There is no reason not to give read-only or export-only access to private repos if your account is locked. Otherwise just delete the whole thing.

If anyone has ever cloned the project, Github cannot hold your code to ransom because you already have a copy of it and the whole history including commit messages. That is the beauty of git.

Github do have your issues and un-merged pull requests "to ransom", though. Make sure to back those up if they are important.

How can they hold your local clone of the code hostage?

A sane option is to deny GIT access while still allowing the primary account to download a copy of the repo for a set amount of days.

Instead they play "data roach motel" and hold your data hostage. Very uncool.

Pay your bills. I cannot take the idea that they're holding anything "hostage" when the whole thing is that you're not paying your bill. If you don't pay your car bill, they'll take it back.

7 bucks a month is hardly ransom. it would be better than automatically making private repo's public when you stop paying

> 7 bucks a month is hardly ransom.

You do not know other peoples' situations. And strictly speaking, demanding money to access your data is ransom.

> it would be better than automatically making private repo's public when you stop paying

False equivalency. I already said that "disabling repo and allowing primary user to download" was acceptable. I said nothing about private->public.

This whole situation only reaffirms my distrust in all cloud services.

EDIT: Seriously, -1 ? The user's data is sacrosanct. You disable most functionality, but you never, I repeat never delete data or make it irretrievable within a envelope of time for recovery. The only exception to that is if the user explicitly requests a permanent deletion - then you do so after appropriate warnings.

You're being downvoted for they ridiculous hyperbole of using the terms "ransom" and "hostage" for the situation, when it's just that you didn't pay your bill, so you don't get access to the service. Any web host would do the same, as would any other business. You don't pay your bill, you don't get the service. Why is that so hard to comprehend?

I think that's because the service is hosting the data, not the data itself. Holding onto the data which you (expect to) own is pretty much akin to a person taking care of a pet for money and when the money is not paid, keeps the cat. It's the obvious solution - the cat-sitter doesn't have any other leverage here - but Github does have other options than holding onto the cat.

A better parallel would be storage lockers. And there, if you stop paying eventually they cut the lock off and sell your stuff.

Has anyone created a service that you auth in through GitHub and it backs up all your repos to S3 or Dropbox or something? I'd pay a few bucks for something like that.

Edit: I found backhub.co. Unfortunately their pricing is per-repo and my usage model involves lots of tiny repos.

Set multiple urls for one remote using `--push`


and you are about done.

This is only a partial solution, right? Services such as Backhub.co also backup your issues: https://backhub.co/features/backup-github-issues/

(we have 1000+ closed issues in our project, nice to preserve)

Right. The service is certainly expected to do more. Was mostly addressing the issue of mirroring the data to dropbox. But even then my "solution" only saves whatever you do locally. So if you merge PRs through github, it wouldn't get mirrored.

If you have self-deployed Gogs (which is a smallish OSS GitHub clone) you can easily set it to mirror other repositories periodically. Very useful for backups.

Also simple to self-deploy since it's essentially a single service written in Go.

Check out Gitea, which is a community managed fork of Gogs.

If you're writing research software you can backup to zenodo (i.e. CERN) for free.

>Has anyone created a service that you auth in through GitHub and it backs up all your repos to S3 or Dropbox or something? I'd pay a few bucks for something like that.

So you'd pay money for a tool that copies your data, but not simply pay the money to access said data?

This doesn't make any sense at all.

I am beginner so bear with me. I learnt git 2 years back and always end up using the same set of commands and never anything new which lets me work very smoothly on projects. I read a lot of blogs to understand how git works and how useful git reflog is. I was/am often confused about what is offered solely by git and github as these two are two different things. Is fork/PR/Issues native to git? I still don't have an answer.

Git is to GitHub what e-mail is to Gmail. GitHub is a specific service that offers git hosting and some extra goodies.

As for what is part of what, anything you do with the ‘git’ command is part of git. Anything you have to do on github.com is not.

>Anything you have to do on github.com is not

Just to be clear we can create a branch on github as well as on git though I am not forced to do so on github. This maybe confusing for a beginner as both provide branching.

Yeah, I nearly said “anything you do on github.com,” but then remembered there are things you can do both ways.

Unless you install GitHub's hub and alias `git` to `hub` as suggested in the install doc—then some of what you can do in Git is GitHub specific

Forks are clones that live on GitHub, with some access control.

PRs are native to GitHub, but I can send you an email Requesting that you Pull from my remote. The Linux kernel team still emails patches I think.

Issues are a GitHub thing.

Thanks. This answers it. It looks like whenever git is introduced to someone is always through github. I think this is why most people assume a different picture of git than what it actually is. I think this demarcation between git and github must be clearly stated.

Github is just the end point where the program git pushes your code for storage and viewing.

There are plenty of other alternatives, including ones you can host yourself, the most popular that I've seen being GitLab and BitBucket.

None of those things are native to git. The command line git tool’s help options open up the git manual. It is pretty comprehensive; so if the topic is missing, it’s probably a github feature. (Also, things like ‘git checkout --help’ open up subsections of the manual)

Git is a decentralized source control system, also known as a DVCS (distributed version control system).

"GitHub" is essentially just "git" on some someone else's computer with a bunch of extra features.

>"GitHub" is essentially just "git" on some someone else's computer with a bunch of extra features.

The comment below you answered my question : PR/Issues are native to github.

I'm not sure what you mean by "PR/Issues are native to github", there are other choices for code review for self-hosted git:


Incidentally Gerrit predates github.

"GitHub PRs/issues are native to GitHub". If you switch to Gerrit, the code commits themselves follow, but the PRs and issues don't. They're only on GitHub.

I wouldn't point to gerrit as an example of self-hosting git. Gerrit's "patchsets" and "change-id"s concepts, as well as the weirdo remote push refs, really turns the git experience into something... very different. It's very far from both how "normal" git commits and tracking branches workflow work, and from github/gitlab/bitbucket's pull-request/merge-request style.

How is it not a an "example of self-hosting git" when it's built on top of git?

I understand from your comment that it's not your preference but that doesn't mean its not a self hosted git solution.

Because the gerrit workflow is extremely different from all other common git workflows (which is to say, you build a set of commits in your own branch or fork and ask the maintainer to pull it). Gerrit actively encourages you to rebase and rewrite (and lose) your development history in the pursuit of a "perfect" one-hot non-merge history patchset, which actively loses information about the context of where your changes where applied.

The problem with submitting a rebased branch is that while there are no apparent merge conflict and your changes appear to have been developed on the tip of master, your changes may actually have been developed on a much older version of master. Builds may appear to compile fine but they might not make sense - when you discard and rebase the context of commits, you lose important information.

As usual, cloud services are other people's computers, but it seems devs now keep forgetting about it.

Yet, most users are content with GitHub being the sole owner of their issues, pull requests, code review, wiki...

GitHub wikis are just git repos themselves, and I clone mine. It's easier to edit them locally, too.

As for issues, I've suggested to GitHub that they make them available via git, and they said simply "I have passed your suggestion on to the team to consider adding the ability to clone Issues". I don't think they realize how nerve-wracking it is to put any data at all in GitHub issues. Even their recommended backup software is out of date and doesn't back up Issues completely.

This is typical. Most businesses don't duplicate their workflow stuff to a parallel system because database backups are sufficient.

GitHub, even with relatively recent issues and downtime, has better infrastructure and skilled personnel than most companies hosting their own instances of these services on some reclaimed box.

Even if they have great infrastructure, this article is evidence that they can still take you down at any given moment if they so much as feel like it.

You need those database backups.

Or you can pay them and not violate their terms. Outrage over GitHub admin actions is loud, but for the 99.999% of cases everything is fine.

> not violate their terms

I haven't checked GitHub specifically, but most of those terms include, "and we can ditch you any time for any reason, maybe with 30 days notice if you're paying"

Not to mention it's also common to include "we can change the terms at any time" - and of course, there's also the fact that this issue in specific has nothing to do with a ToS violation, but with permissions on the parent repository, by relying on them with no backups you're not only subject to their staff's whim, but to the whim of any bugs or "features" in their code.

If you want to take advantage of their infrastructure, that's fair, I understand, but at the very least, run some tooling to backup issues, pull requests, wiki pages, etc on a regular basis.

With 26 million users, five nines is not very reassuring.

Some of the strain here is around two very different meanings of the word "fork", but only a single feature with that name on GitHub:

1) When I want to contribute to a project, make some local changes and offer them back to the mainline, I fork. My fork is a pseudo-throwaway thing; it contains code that is hopefully destined either to make it back to mainline, or likely to be abandoned if mainline moves on in active development.

2) When I want to go my own way with a project, using it as a starting point for a new direction of development that may wander in a direction quite far from the original (it might get renamed, it might continue development of the original becomes abandoned, etc.), I also fork.

Numerically meaning 1 is surely far more common, so it makes sense for the GitHub features and permissions etc., to optimize for that case.

Meaning 2 is closer to the original idea of a fork, from open source vernacular long predating GitHub.

Naming is hard, and I don't blame them for picking a single name for these two technologically similar (but socially different) scenarios.

I don’t think GitHub actually labels #2 as a fork, but traditionally that’s been the meaning.

They just needed a word to show that the repo is not original and is tied to an upstream repo.

For example, when I wanted to “fork” a project[0] that was no longer maintained and not accepting PRs, I actually had to contact GitHub to “unlink” my repo so that I my repo could be standalone and utilize all the features on GitHub (can’t remember which features off the top of my head).

[0]: https://github.com/styfle/geoslack

I think you're right. So my conclusion is....a fork on GitHub is no fork.

I don’t know much of the historical details, but whenever I think of a fork, I think of processes. I think that GitHub’s official definition is just a simplification of what happens as a side effect.

I forked Popcorn Time when it came out, just because I was impressed with the app and I wanted to look over (and possibly modify) the code. I only forked in though, I didn't clone it locally. A week later when I had some free time to do so, same issue, my personal fork disappeared. It was mildly aggravating, but also a soft reminder to question how much trust to put into third-parties.

I wouldn't call it a trust issue, but a due diligence one: making sure you do enough to ensure anything you care about is safe.

In this instance making sure that you understand the service you are using (including intricacies like this), or keeping an extra backup of your own (a locally cloned copy of the repo), or for proper paranoia, both of the above.

No matter how much you trust github and those in positions of responsibility for the projects you interact with, if you don't know the exact details of the feature-set you can come unstuck due to operating under false impressions. Github has done nothing wrong, as that is how the service is intended to work and it is documented as such (though it would appear some of the documentation is incorrect or at least inconsistent?), but you still get an inconvenient surprise because of a misunderstanding.

A little paranoia can be very useful!

I'd argue this is barely worth the time spent. You'd use at least a hundred services of this sort, with a flux of at least one a week. 2-3 hours reading up on intricacies, setting up backups for your data? It's a crazy amount of time to spend on something you dont care about - you just want it to work.

If argue that if you care enough about the content that you'll do more than say "oh well" to yourself if it becomes inaccessible, then researching this sort of thing or (probably easier but maybe more costly in terms of bandwidth use) rigging it up to your existing backup infrastructure is the minimum effort you did make. If you don't you have no high ground from which to complain or blame others from if it going away inconveniences yourself.

If you don't care about it that much, why do you keep it anyway?

Well, it is stated on the help page, regarding private repositories and its fork: https://help.github.com/articles/what-happens-to-forks-when-...

> Private forks inherit the permissions structure of the upstream or parent repository.

Ah, thanks for pointing me to it. I still find the behaviour unintuitive, but at least it is documented somewhere.

> Luckily I found an old local copy of my project, but this taught me not to rely on Github as only storage for code.

This is the distillation. If something matters to you, you should personally take responsibility for its long-term storage. Apparently for the author, finding a true duplicate was just luck. To minimize the risk, maintain multiple copies on multiple media. Github is only one medium.

The context is especially notable: this is source code and source code is (in a majority of cases) very small. Maintaining personal archives of project source is a comparatively simple and low-cost matter.

How reliant people are on Github is puzzling. If Github somehow failed one day, many businesses and OSS projects would just fail to operate. What surprises me is that software teams in companies use it too, when just sending commits/patches around is quite easy with any VCS.

Kind of like how that one NPM package took down like 14,000 other packages when someone took a DMCA request against an author's OTHER package and he just took them all down in protest--which took down their dependencies and their dependencies and ... "broke the world".


The code in question was like 11 lines of code (with curly and spaces!!) for calculating left-hand padding for a syntax highlighting code box.

Does this apply to public repos?

I have a close friend who past away a few months ago, and wanted to try to find some way to preserve all of his GH repos for his wife and children. I've locally cloned all of them, but the prospect that GH might delete all of his stuff if he fails to login for a certain amount of time or something is deeply concerning.

I don't suppose GH has some kind of memorial/preservation mode the way Facebook does?

"Inactive accounts may be renamed or removed by GitHub staff at their discretion."


There are a few blog posts of people who got GH support to release the names of inactive accounts, though it's not clear if those accounts had any repos at all.

> Every user account and organization on GitHub can have unlimited collaborators on any number of public repositories.

From: https://help.github.com/articles/github-s-billing-plans/

So if your friend's repositories are all public no payment is required.

You can ask for a fork to be unlinked politely. Done that for a few times to re-enable search and etc.

Hey Niels!

I'm John, and I work at GitHub. Sorry to hear you're having difficulty.

From your post:

> I haven’t tried contacting customer support, but as this appears to be official policy I would not expect a change there.

I'd recommend you reach out to support@github.com, they're usually very helpful with things like this.

Hey John,

thanks for reaching out. The individual case was resolved (found a local copy). But I find the policy worrisome for future projects. If the customer support was happy to reenable access for me, what is the point of the current policy?

That's why I've been migrating my Github/lab projects to IPFS as well. I have a few cheap VPSes I have my stuff pinned.

"Cloud" (read: someone else's server) is unreliable at best, or malicious at worst like this case. The people may not be bad actors, but the tech definitely is.

Would like to hear more about your git with IPFS workflow.

How are VPSs not on someone else's server too?

If people can get the fork back after reaching out to support@github.com, why would not GitHub resolve this issue once and for all, so that people will not need to reach out to support@github.com any more?

If Support just do this kind of thing why not put it in the UI. It's like unlinking a fork - GitHub Support just do it on asking - but why is GitHub spending unnecessary manpower when it could just be a button in the settings saying 'Unlink'.

I think the last sentence may be wrong

> I haven’t tried contacting customer support, but as this appears to be official policy I would not expect a change there.

I'm sure the support would be able to do something for you there, anyway it's always good to contact them, even if they can't do anything, before saying things like this online.

I disagree. People need to be aware of this before it becomes an issue for them. Git is distributed and yes, relying on one resource is a risk in itself, however - if a fork isn't a fork it should be made clear.

yes but the point is he didn't even ask the support so his claims are just wrong. Maybe the support will always replace your fork with a copy if you just ask.

I agree the UX is bad though and should be improved

My point is he shouldn't have to ask support to let him have access to something he took a fork of. Bad UX aside, a fork allows one to think of a permanent branch that is distinct from the original. This is not the case.

Related to that: if a private repo is forked, the target account doesn't need a paying subscription. In that sense, it's the root repository holder that is paying for the new private repo.

I'm not arguing if it's a good or bad thing, it just seems consistent in that regard.

At least on Bitbucket they're pretty explicit about the rules around forks. Creating a new private repository, you get the option to select [No Forks | Allow Forks | Allow only private forks]. I don't see any kind of configuration like that on github.

That is silly either way. Once you have a local copy you can just push that copy to any git hoster and thus make it public. Forking through their web ui is just a convenience feature.

Does that differentiation necessarily mean Bitbucket repos are immune to this problem?

No of course not. They're just a bit more explicit about the fact that permissions can persist across forks.

GitHub gives you the option to allow/disallow forking for private repos at both the repo and organization level.


Apparently there's no github web UI to actually copy a repo so it's not listed as a fork, but of course git makes it quite possible, and github even provides directions, using git directly.


I'm imagining a browser plugin that notices when a repo you are about to fork is private, and (or even without that) adds a 'copy' button which gives you text to copy-and-paste into your terminal to make a true copy.

On the one hand, it bugs me when I see a repo that was obviously cloned/forked from another REPO and uploaded new without the github (fork) link to the original... but upon seeing this, I many have to reconsider doing that with a few repositories.

When I left a prior job, I forked and took over updates of a relatively popular open-source library under my own GH account... I'd hate to see it all disappear because someone from a former job decides to nuke the upstream repo.

In the alternative case, a company who terminates an engineer can't prevent them from accessing the code if they've forked it. This is a"problem" with git, which GitHub has tried to solve for its paying customers. Whether or not you agree... GitHub has worked like this for years.

If GitHub wants to implement DRM-like copy control, they should implement actual DRM-like copy control, and leave fork the heck alone.

Part of the problem is that instead of being explicit about how things work they're being implicit. Instead of users being told a fork is not a, you know, fork they're lied to only to have the fork vanish later.

Why is there no banner at the top of the repo warning you this is a fake repo? Why is there no ability for the original owner to delegate a full fork to the children? If this implemented this correctly you'd expect to have control over how it works (or turn it off).

I think a fork is just an internal branch on the original repositoriy for the Github Servers. From a data / sys admin point of view, it makes sense, since a hard-copy of a repository (and all the git objects with it) is just generally useless.

I would guess that it's stored and dealt with using a basic "copy on write" strategy. It's functionally a copy, which is all that usually matters. This may have been an oversight.

It's a bit more complex than that actually. https://www.youtube.com/watch?v=-ZNKR9wFe8o

Yeah, I wouldn't be surprised if this was an oversight as well. I can't imagine that GitHub would delete the git blobs from their servers, even if you stop paying (harddrives are cheap!).

A possible workaround?

Fork a private repo. Make it public, thus forcing a new repository network to be made. Then take it private again if you choose.

In the case of a public fork, it seems you're safe from it getting taken out from you. (relatively speaking)

The fact you have to do this to make a fork, fork, is problematic. You could also download the fork, and then upload it again as a new root, but in both cases it entirely defeats the point.

A lot of people are coming up with workarounds or blindly defending Github in this thread. I think I'm the only way that sees this decision was done for commercial reasons ($), and has nothing to do with people "maintaining control of their private repo."

Frankly if fork doesn't, you know, fork due to some quirk in some random article you aren't going to read or find that to me is a massive black mark against Github. That's all I care about, Github working like Git, in most major ways.

`I think I'm the only way that sees this decision was done for commercial reasons ($)...`

Is it fair for me to expand "for commercial reasons" as:

Github is a for-profit business, and their business model is to provide a service to users and that also requires them to be able to turn that service off when customers stop paying.

`... and has nothing to do with people "maintaining control of their private repo."`

I disagree here, I think you're using a notion of "control" that is too narrow. A notion that captures the scope better would be that Github is adding value by enabling project management and collaboration. The ability to do pull requests, manage issues, etc. is all tied to forking.

That's what makes their service worth paying for, and it's also what they have to turn off if you don't want to pay.

`That's all I care about, Github working like Git, in most major ways.`

What you're saying here is reasonable enough I expect you can find providers that are pretty close to that standard, and it's a fair point that Github's business model has some sharp edges. In this case, it's a consequence of "we turn the lights off if you don't pay," and an alternative has to deal with the same bottom line.

Forks of a private repo can't be made public through the GitHub web interface.

>GitHub has the right to suspend or terminate your access to all or any part of the Website at any time, with or without cause, with or without notice, effective immediately. GitHub reserves the right to refuse service to anyone for any reason at any time.

Companies write these "We are can do whatever" clauses to license agreements so that they can point to them and make people go away.

IANAL, but EULA terms like are typically in breach with Consumer Law protections against unfair contract terms. If you are private person (consumer) paying for a service, this kind of term may be null and void.

So it sounds like active links between forks is intended to mirror the valid links between people/organizations that are doing business with one another. If people or organizations end their relationship, then there's a way to cutoff any secondary contributors to the codebase to have the ability to make pull requests from -and- receive code updates.

At first blush, I thought, "GitHub is mean and just locking-in their paying customers forever!!" But now after reading the comments here, I think this is a very pretty way to represent the nuances of a fork relationship.

If the original owners of a private codebase disappear, then it is reasonable and just that all of the forks to their code should as well. I believe it's reasonable and just because GitHub have no way to interpret _why_ the original owner dropped off and must respect their ownership by breaking the relationship.

You might use gitlab to clone your github repos. Web interface to clone them is super easy (obviously you can do it from the console also), there is no limit to private repos (their business model seems to be charging for CI time and selling on-premise licenses).

I'm not affiliated with either hosting services and I use both :)

Backups, backups, backups, BACKUPS.

Don't rely on GitHub being your only source location. Git is a DVCS, meaning you don't have to have a single upstream. Make sure you are pushing / mirroring to a third party (GitLab, Bitbucket, personal Git Server, etc)

Do you know a good way to automate that?

  $ vi /usr/bin/gpa
  git push github $?
  git push bitbucket $?
  $ gpa

You could have a cron job automatically sync repos, but right now I have my own local copies and I use GitLab's "Repository mirroring" [1] which has worked really well for me!

  [1] https://docs.gitlab.com/ee/workflow/repository_mirroring.html

a crontab script that does `git fetch` would be enough

This really sucks... I know companies that made forks of public (opensource) repo's just to have a "backup" in case the person or organisation decides to take down the repo. This should really be fixed as I don't see ANY benefit in the current way github handles this now. People talking about giving contractor's or other third party people access to later revoke them (after they are done) thinking they wont have a copy, are just plain simple wrong. Git is decentralised and the contractor/dev will always have his code on his machine unless HE decides to delete it.

Just yesterday on HN I was recommended of git-ssb.

I think it's time.

Is fork part of git? I don't see it mentioned in the official documentation: git-scm.com

Fork seems like some feature cloud storage services invented.

The terminology is a GitHub-and-friends thing, but the idea is at the core of git. When you clone a repository, you're creating a "fork" of the repo on your own computer. Git is a DVCS (distributed VCS), where you can make many copies/forks of a repository and freely push/pull commits between them. (As opposed to something like SVN, where the repository lives on a single server, and all commits go directly to that server without a separate "push" step.)

The terminology is at least a couple decades older than GitHub has even existed, and was even mentioned somewhere by Linus in the creation of Git.

Fork used to have a very negative connotation, and that was related to the difficulty required to pull one off. When you forked a project, you had to set up your own hosting, web space, issue tracker, version control system (and you probably couldn't clone and start off the upstream's VCS), and so forth. The technical barrier made it so that you only forked a project when you have severe disagreements with upstream; some of the most famous in history include EGCS from GCC, and XEmacs from Emacs.

With DVCSes, including Git, much of that technical difficulty is automatically removed. Every clone can potentially be a fork (as in, a new independent project), and this was one of Linus's intentions. If forking is easy, it keeps upstream on their toes. After being adopted by DVCS hosting sites like GitHub, it turned around to being a positive term even.

I observe that people "fork" repositories (which is basically a cheap clone rather than a fork, bad terminology there) not for having a copy of their own to work on, but more like a glorified "like/+1/star". So even with public repos, it seems quite acceptable that Github tries to deduplicate aggressively.

This is also simply to their per private repo pricing model. You can fork private repos with a free account. I'd just ask github support about this particular issue and I'm pretty sure they'll can either make the repo open or at least make the code available to you somehow.

This is no different than Google Docs and many other similar services. If you share a Google Doc with someone it will show up in their list of Docs. But it is just a reference to "their" doc. If they delete the doc there's no doc to reference and your "apparent" copy of the doc (which was really just a reference) disappears.

If you really wanted your own personal copy you need to make a duplicate. That duplicate will be yours not a reference to theirs.

It was strange the first time I ran into it. I deleted an account. I had shared the documents with another account of mine. Then I noticed when I deleted the first account I lost access to those docs. Now I know I hadn't copied the docs I'd only had references to them. Next time I'll know to make copies if I actually want copies and not just access to another account's shared docs.

This seems to be an edge case, affecting only private repositories. I can see how this can make sense in that context. Can we do without the pitchforks? (pun intended)

It still doesn't make sense in that use case. He could have cloned it to his local machine and still had access to the code. Had he done that he could have pushed it back to github as a completely new repository thereby creating a legitimate fork. It's a silly requirement from github.

Did you contact GitHub regarding this and ask them, before bashing them on social media? What was their response? They'll probably gladly remedy the situation.

> Luckily I found an old local copy of my project [...] I haven’t tried contacting customer support, but as this appears to be official policy I would not expect a change there.

The policy itself needs to be changed, the individual case is resolved.

Or the policy was misinterpreted. The piece of text that got quoted is present in every single EULA ever and does not at all indicate if this is actually intentional by GitHub.

Besides, you can request to disconnect your fork from an upstream project. I expect such a request will also resolve this problem.


No, that doesn’t make sense.

If you want a mirror of a repository, self hosted Gogs/Gitea is a great way to do this in a way that makes you unbeholden to the upstream.

Hum, I would think it’s to avoid spreding of private company repositories when employees get terminated. It does make sense.

Git is a distributed version control system and Github is not in a privileged position. Don't treat it like it is. Github is a convenience. Nothing more.

"Forks" are really just branches and branches are really just references. Forking just adds a few more references to the same repository.

You realize that implementing forks as branches creates a LOT more work for Github to make it work? It makes absolutely no sense for them to do so.

A copy-on-write fork is most likely what is being implemented.

What do you mean? What's the difference between a fork and a branch? It's not copying anything.

I believe that GitHub stores all forks in a network with a single git object store.

And this is why I dont have a github account. One day, all those foolish users will be left hanging, crying for the dust which has replaced their code and CI. The deserve worse; they have enabled this central organization to subvert the heart of OSS. May github go down forever, along with google translate and all those other projects which have crippled our technical progress.

Git is a DVCS, this is only an issue if you use it as if it were a traditional VCS.

Which is how almost everybody uses it, so not such a helpful observation.

Unless people are doing shalow clones, almost everybody uses git as DVCS.

There's a little more nuance to it than that. While git users generally keep a local copy of a repo, collaborating with others is pretty much always done through a central server. Git users could theoretically collaborate without a central server, but it currently a far too manual process to be considered practical. There were a couple of attempts to implement decentralized collaboration in git under the name gittorrent, but those didn't go anywhere. Failure is understandable in this space, decentralized collaboration is a _hard_ problem and the tools to do it are still in their infancy. IPFS is probably the furthest along at this point and they're still a long ways away from having something production ready.

Sure, but it's decentralized in a sense that there's nothing special about any of the clones. If developers lose their shared clone of the repository, they can re-create it easily by seting up another shared clone that anyone can access and push there from their local repos. Nothing will be lost.

"Central" server is not special in any way, other than being readily accessible by everyone in the team.

Some services break this by piling other stuff on top of git that's not distributed (like issues, ci, etc.), but that's not a problem of git.

But in most people's workflows there is something special about the central server: it is the source of truth about the state of public branches. That is why losing the central server is so disruptive. It means you have to re-establish the source of truth somewhere else and get everyone to transition over to that new source.

With a truly decentralized system the source of truth would be content addressable. For example, it could be obtained via a public key rather than a server's domain name.

If that were true the article author would not have encountered this issue

Author was presumably saved by this (old local copy).

A poor workman blames his tools

Well, I mean, what did you expect? You'd be able to cheat the system otherwise.

What if I just cloned the repo, created a repo of the same name under my own account and then pushed up my local clone to the new repo?

Nothing is stopping you from doing that. The preferred experience is not to do it though.

I should have completed my question in my earlier post. Specifically, would doing that instead of creating a fork through Github's interface prevent one from losing access to their project when the previous copy is deleted from Github?


In what way? Genuinely curious

For instance, you'd be able to create private forks in an organization without paying for the Github organization, then keep this private fork once the original membership is canceled.

Removing access to the fork in that instance would make sense, but that's not what happened here. The author didn't fork in an organisation, he forked to his own personal Premium account. Which he continues to pay for.

They could give you an option to make it public, if this is the concern.

OP states that he still has unlimited private repos (presumably on a paid account) so it wouldn't be cheating the private repo limit in this case. GitHub could simply count a forked private repo against the account's private repo limit if the upstream repo is disabled to avoid the type of cheating I'm thinking of.

I haven't paid for private repos (at least not for a few years) and yet I'm a member of several ... and I can fork them as needed without paying. In my case, I would expect that I would lose access to my fork if the private repo was no longer a paid account.

Right, but I'm saying GitHub could allow you to keep accessing those cloned repos if the upstream repo was disabled as long as you switch to a paid plan. I.e. they could treat it the same way they treat your own private repos when you go from a paid to unpaid plan.

Yes ... it could be that if either of the private repos was paid, they would be available.

Not totally following you on this. Do you mean the system as in terms of Github's profit model?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact