Why? Because it's cleaner. It means there are no abandoned copies of my codebase sitting around forever forgotten in random Github accounts. I set up a private repo because I want to control access to my code. That's why a private repo is private.
It also makes it harder for people to misrepresent their relationship to me later on. Sure, anyone with read access to one of my repos can pull it down and save a local copy. But that is different from logging into Github and having a fork of my repo. A Github fork looks and works differently from a local copy.
This whole situation seems weird because the idea of a "private" repo is not inherent to how git was intended to work; it's something Github invented to make money. So I'm not surprised that it violates people's expectations sometimes.
I'm not. Did you read my 3rd paragraph?
This is not a security strategy, it is a strategy for managing relationships--which is the purpose of using Github in the first place. There are far more secure ways to manage a git repo than Github, if that is the goal.
Not really. That means that you've worked with benevolent actors up until now. It's not that hard to keep a clone of the repo and upload later after the termination of the business rapport, either as a private or a public one.
The contract says that the codebase is mine, and the contractor doesn't have rights to use it for their own purpose. The Github private repo rules just make that real in a practical sense. It means the default way of working comports with what the contract says. It makes it easy for everyone to do the right thing, and harder to do the wrong thing.
Sure, a rogue contractor could re-upload my codebase to a new repo in their account later, but that is a pretty obvious violation of the contract. If I found out, I'd feel pretty confident suing them (if it came to that). There's nothing they can point to that says I authorized them to do that.
In contrast, if the default was to leave an official fork of my code in their Github account forever, then it becomes easier for them to do the wrong thing, and harder for me to say I didn't want them to do it. Some future employee could even do the wrong thing accidentally.
To reiterate, Github private repos help me manage my relationships. They don't enforce the terms of my relationships.
that's not a github fork; and as they said, it's not really a 'security strategy' (mostly it's a bit of accounting).
If removing the relationship tag was the only purpose of this, GitHub could just remove the relationship tag, not delete the whole repository (throwing out the baby with the bathwater).
in this context, those mean the same thing. if you copy the code it's not a "fork" in github terms.
Yup. So I'm not the only one seeing it then.
I understand on the other side, there's value in being able to stop people who forked your code. But it's a very weak stop-ability, because they could just keep the code locally and re-upload.
> Deleting a public repository
> When you delete a public repository, one of the existing public forks is chosen to be the new parent repository. All other repositories are forked off of this new parent and subsequent pull requests go to this new parent.
> Changing a public repository to a private repository
> If a public repository is made private, its public forks are split off into a new network. As with deleting a public repository, one of the existing public forks is chosen to be the new parent repository and all other repositories are forked off of this new parent. Subsequent pull requests go to this new parent.
That makes perfect sense to me; the fork ON github is what lends the fork legitimacy versus "some dude found code."
It's really a false sense of control.
What if rockstar123 gets arrested for ICO fraud. Do you want others to see he has an official upstream link to your repo even after you have severed the relationship?
What if rockstar123 makes his fork public? It's just messy to keep the link.
out of curiosity, what made you think that?
> Why? Because it's cleaner. It means there are no abandoned copies of my codebase sitting around forever forgotten in random Github accounts. I set up a private repo because I want to control access to my code. That's why a private repo is private.
That's like me make a random repo and adding a readme that says "official public windows 10 source repo"
Being legit doesn't seem to have any meaning for him or the author of the article. Also the code is the only thing needed for legitimacy. The iphone bootloader code in some repo is still the legit iphone bootloader (as an example)
No, he made two points. The first one is that this way you don't have your code in random freelancer's accounts, which I'm saying is wrong: a freelancer can take his code, and upload it back to his account.
So if this mechanism makes sense to you as a way of guaranteeing that your code won't be in other people's accounts, you're mistaken too.
In fact if you don't believe me give me access to your private repos and lets see what happens.
Nobody is disputing that access to the private repo allows for a copy to be kept. But it's an "island", not a fork.
Come on, no one is arguing that, and I'm sure you know that.
He said it's "cleaner", not more secure. Yes anybody can re-upload his code to a random github account, but then it's easier to see it's stolen code in contrast of having a github fork of the private repository.
Nobody but you is claiming this.
> It means there are no abandoned copies of my codebase sitting around forever forgotten in random Github accounts.
- collaborator get a copy of your code, works a bit on it;
- collaboration stops;
- a few months pass and you start collaborating with the same person again;
- collaborator then pushes some changes based on outdated information and breaks everything
github does have better ways to prevent that. that's the point of using git, right?
The only way to really protect yourself is with a contract that stipulates the repo and any clones a contractor may have access to must be destroyed upon termination of the contract.
Github is a tool for collaboration. The point of a private repo is to manage who you are collaborating with. Once I don't want to collaborate with someone anymore, there is no reason for them to have a Github fork of my code anymore. As I said, it's cleaner.
I'm fully aware that they can save a local copy (did you read my 3rd paragraph above?), and of the importance of contracts.
It’s about trade-offs. If you must have “trade secrets” amongst your code, perhaps you can keep that code available only to whom you trust will follow your wishes with it, and have separate repos to share wherein you do not care as much about collaborators “stealing” the code.
As soon as you let someone into your codebase, almost anything can happen, right?
There are cases where it would make sense to sacrifice risks related to code-copying in order to have faster/better development.
Also, what’s stopping anyone working at ANY company from taking the company code and doing whatever they want?
Code is seldom everything.
There's "nothing" stopping someone at a job from printing every single one of their corporate emails and putting them into a binder to save forever, even after they've left the position. Nothing except professionalism, a sense of business ethics, the law, and the fact that it's a huge hassle.
This seems like a problem that's not a problem. If you are working in a private repo then there should be some formalized arrangement on who owns the code and who doesn't, and that would be the originator of the private repo. If, for some reason, that relationship is less clear, then as you say other contributors can just create their own clones of the repo elsewhere. If the obvious relationship is one of collaborative ownership, then a public repo is the obvious choice.
Except if that were as easy and obvious as forking, the OP wouldn't be complaining about it, heh.
...which is not a github fork, and not what they're talking about.
They're one button away from cloning the entire repo out of Github and bypassing all these fancy access control mechanisms without you even knowing about it.
This is one feature that causes a lot more pain than actual utility.
Also, git does inherently allow for private repos, otherwise it would have been licenced differently, and it inherently allows for access control (ie repos don't have to be readable without authentication).
Like I said though, github's stance of disabling a private fork of a private repo when the permission is revoked doesn't seem unreasonable to me. I think the key thought missed by the OP was that they were both private repos. Also, they could deal with it differently when it's non-payment that caused the shutdown rather than a deletion/revoked permission
First one understandable, I fork a private repo I can't expect it to stay forked if for some reason person who owns it doesn't want me to.
Second one is screw github.
How is it really different? When you push it back up to github it is exactly as if you'd have forked it. The only difference is that it isn't marked as a fork and in this respect not shown when you look at the original repo graph.
It's nice that it works for your usecase, but since it's not a real protection against anything and only looks like a safety measure against abondend accounts that still have a copy of your code, it shouldn't be a feature that's on by default. (Imho)
Thats the whole point. Of course you can't stop someone from pushing a repo anywhere they want. But imagine you run a website, foobar.com. You have foobar.com github org and foobar private repo. Its the canonical repo. ninja123 has a github fork of it and it relays that from the github ui. "forked from foobar/foobar.com". rockstar123 also has a "fork" but it's just code sitting out there at rockstar123/foobar.com. There is no link to the canonical repo. If you fire rockstar123 the link is severed. It just allows a way to show active, approved engagement and also to keep track of approved collaborators.
I'm coming from a different angle. I read the article as "wasn't able to access his fork anymore". If that is true, I think that is bullshit.
Severing the ties to the private repo? Fine, I get the (your?) point. But what I forked should still be there. Previously as "forked from...", now without the link?
Did I misunderstand the article?
This is in contrast to how the Linux kernel development works, for instance.
But that's all besides the point. The point is that if you don't want your code laying around where you have no control over it and don't trust your contract and the law to be enough of a deterrent, you shouldn't give people copies of it and probably reconsider what kind of contractor you're willing to work with, rather than loading the term "fork" with an additional meaning.
I don't think you intended to counter my argument but in case anyone is interested in elaboration: my point about centralized/decentralize is that using github doesn't necessarily centralize your work. It's only central in the sense that any other remote repository is: if you can't access it you can't. If github dies, copies of the repository may still exist on a million other computers and any of those can be used as a basis for further collaboration by a wide variety of means. From a peer to peer perspective, github is just a peer like any other.
Git, by design, is a platform for the sharing of managed text.
I've been in the exact same spot under very similar circumstances. What I can add is that the github support was great, we got it resolved within the working day.
The repo was "deleted" to free up the private repos of my colleague but we didnt know that the fork on my account would go down with it. Its been probably a couple months till we got back to it and noticed the issue. They managed to recover the repo and unblock my fork. I just did a fresh repo and moved everything over.
While it is odd behavior, if its really bad they will probably manage to get you back on your feet.
Also Im not affiliated with github in any way just have some repos from student days over there. If it matters im on gitlab and google repos
However, I would certainly be surprised if a Github support staff member went and fished out the off-site backup and mounted it to recover one missing repo. You'd hope that would require quite a few staff members to orchestrate.
But in general, yeah. Most engineers don't think about deletion very hard when they start designing a system, and it's also convenient (although with GDPR possibly soon illegal, situation dependent) for auditing and research to merely flag things as deleted and not show them to the customer. As a result, true data deletion generally gets justified away as a bit inconvenient and not really desirable anyway, and almost no software company really deletes anything on demand. Best you can hope for is semi-annual data purges to reclaim disk space.
If this bothers you, consider joining us in the world of self-hosting your services. Gitlab is in my opinion considerably better than Github anyway. Certainly it's worth spinning up a docker container and taking a peek around.
If you didn't already know this, then... good lesson to learn.
Just like backups: don't rely on a single source of anything.
Maybe you can have them switch the repos to public if you go through their support, but Github doesn't offer that in their ransom note, and that would be an unacceptable solution, anyway.
Whether or not this is a risk to a given developer (or their company) is irrelevant to me, because they still have policy that allows them to hold code hostage for ransom, and that should make Github a complete non-starter when deciding on an SCM host.
Just like every other business ever.
Github do have your issues and un-merged pull requests "to ransom", though. Make sure to back those up if they are important.
Instead they play "data roach motel" and hold your data hostage. Very uncool.
You do not know other peoples' situations. And strictly speaking, demanding money to access your data is ransom.
> it would be better than automatically making private repo's public when you stop paying
False equivalency. I already said that "disabling repo and allowing primary user to download" was acceptable. I said nothing about private->public.
This whole situation only reaffirms my distrust in all cloud services.
EDIT: Seriously, -1 ? The user's data is sacrosanct. You disable most functionality, but you never, I repeat never delete data or make it irretrievable within a envelope of time for recovery. The only exception to that is if the user explicitly requests a permanent deletion - then you do so after appropriate warnings.
Edit: I found backhub.co. Unfortunately their pricing is per-repo and my usage model involves lots of tiny repos.
and you are about done.
(we have 1000+ closed issues in our project, nice to preserve)
Also simple to self-deploy since it's essentially a single service written in Go.
So you'd pay money for a tool that copies your data, but not simply pay the money to access said data?
This doesn't make any sense at all.
As for what is part of what, anything you do with the ‘git’ command is part of git. Anything you have to do on github.com is not.
Just to be clear we can create a branch on github as well as on git though I am not forced to do so on github. This maybe confusing for a beginner as both provide branching.
PRs are native to GitHub, but I can send you an email Requesting that you Pull from my remote. The Linux kernel team still emails patches I think.
Issues are a GitHub thing.
There are plenty of other alternatives, including ones you can host yourself, the most popular that I've seen being GitLab and BitBucket.
"GitHub" is essentially just "git" on some someone else's computer with a bunch of extra features.
The comment below you answered my question : PR/Issues are native to github.
Incidentally Gerrit predates github.
I understand from your comment that it's not your preference but that doesn't mean its not a self hosted git solution.
The problem with submitting a rebased branch is that while there are no apparent merge conflict and your changes appear to have been developed on the tip of master, your changes may actually have been developed on a much older version of master. Builds may appear to compile fine but they might not make sense - when you discard and rebase the context of commits, you lose important information.
As for issues, I've suggested to GitHub that they make them available via git, and they said simply "I have passed your suggestion on to the team to consider adding the ability to clone Issues". I don't think they realize how nerve-wracking it is to put any data at all in GitHub issues. Even their recommended backup software is out of date and doesn't back up Issues completely.
GitHub, even with relatively recent issues and downtime, has better infrastructure and skilled personnel than most companies hosting their own instances of these services on some reclaimed box.
You need those database backups.
I haven't checked GitHub specifically, but most of those terms include, "and we can ditch you any time for any reason, maybe with 30 days notice if you're paying"
If you want to take advantage of their infrastructure, that's fair, I understand, but at the very least, run some tooling to backup issues, pull requests, wiki pages, etc on a regular basis.
1) When I want to contribute to a project, make some local changes and offer them back to the mainline, I fork. My fork is a pseudo-throwaway thing; it contains code that is hopefully destined either to make it back to mainline, or likely to be abandoned if mainline moves on in active development.
2) When I want to go my own way with a project, using it as a starting point for a new direction of development that may wander in a direction quite far from the original (it might get renamed, it might continue development of the original becomes abandoned, etc.), I also fork.
Numerically meaning 1 is surely far more common, so it makes sense for the GitHub features and permissions etc., to optimize for that case.
Meaning 2 is closer to the original idea of a fork, from open source vernacular long predating GitHub.
Naming is hard, and I don't blame them for picking a single name for these two technologically similar (but socially different) scenarios.
They just needed a word to show that the repo is not original and is tied to an upstream repo.
For example, when I wanted to “fork” a project that was no longer maintained and not accepting PRs, I actually had to contact GitHub to “unlink” my repo so that I my repo could be standalone and utilize all the features on GitHub (can’t remember which features off the top of my head).
In this instance making sure that you understand the service you are using (including intricacies like this), or keeping an extra backup of your own (a locally cloned copy of the repo), or for proper paranoia, both of the above.
No matter how much you trust github and those in positions of responsibility for the projects you interact with, if you don't know the exact details of the feature-set you can come unstuck due to operating under false impressions. Github has done nothing wrong, as that is how the service is intended to work and it is documented as such (though it would appear some of the documentation is incorrect or at least inconsistent?), but you still get an inconvenient surprise because of a misunderstanding.
A little paranoia can be very useful!
If you don't care about it that much, why do you keep it anyway?
> Private forks inherit the permissions structure of the upstream or parent repository.
This is the distillation. If something matters to you, you should personally take responsibility for its long-term storage. Apparently for the author, finding a true duplicate was just luck. To minimize the risk, maintain multiple copies on multiple media. Github is only one medium.
The context is especially notable: this is source code and source code is (in a majority of cases) very small. Maintaining personal archives of project source is a comparatively simple and low-cost matter.
The code in question was like 11 lines of code (with curly and spaces!!) for calculating left-hand padding for a syntax highlighting code box.
I have a close friend who past away a few months ago, and wanted to try to find some way to preserve all of his GH repos for his wife and children. I've locally cloned all of them, but the prospect that GH might delete all of his stuff if he fails to login for a certain amount of time or something is deeply concerning.
I don't suppose GH has some kind of memorial/preservation mode the way Facebook does?
There are a few blog posts of people who got GH support to release the names of inactive accounts, though it's not clear if those accounts had any repos at all.
So if your friend's repositories are all public no payment is required.
I'm John, and I work at GitHub. Sorry to hear you're having difficulty.
From your post:
> I haven’t tried contacting customer support, but as this appears to be official policy I would not expect a change there.
I'd recommend you reach out to email@example.com, they're usually very helpful with things like this.
thanks for reaching out. The individual case was resolved (found a local copy). But I find the policy worrisome for future projects. If the customer support was happy to reenable access for me, what is the point of the current policy?
"Cloud" (read: someone else's server) is unreliable at best, or malicious at worst like this case. The people may not be bad actors, but the tech definitely is.
> I haven’t tried contacting customer support, but as this appears to be official policy I would not expect a change there.
I'm sure the support would be able to do something for you there, anyway it's always good to contact them, even if they can't do anything, before saying things like this online.
I agree the UX is bad though and should be improved
I'm not arguing if it's a good or bad thing, it just seems consistent in that regard.
I'm imagining a browser plugin that notices when a repo you are about to fork is private, and (or even without that) adds a 'copy' button which gives you text to copy-and-paste into your terminal to make a true copy.
When I left a prior job, I forked and took over updates of a relatively popular open-source library under my own GH account... I'd hate to see it all disappear because someone from a former job decides to nuke the upstream repo.
Part of the problem is that instead of being explicit about how things work they're being implicit. Instead of users being told a fork is not a, you know, fork they're lied to only to have the fork vanish later.
Why is there no banner at the top of the repo warning you this is a fake repo? Why is there no ability for the original owner to delegate a full fork to the children? If this implemented this correctly you'd expect to have control over how it works (or turn it off).
Fork a private repo. Make it public, thus forcing a new repository network to be made. Then take it private again if you choose.
In the case of a public fork, it seems you're safe from it getting taken out from you. (relatively speaking)
A lot of people are coming up with workarounds or blindly defending Github in this thread. I think I'm the only way that sees this decision was done for commercial reasons ($), and has nothing to do with people "maintaining control of their private repo."
Frankly if fork doesn't, you know, fork due to some quirk in some random article you aren't going to read or find that to me is a massive black mark against Github. That's all I care about, Github working like Git, in most major ways.
Is it fair for me to expand "for commercial reasons" as:
Github is a for-profit business, and their business model is to provide a service to users and that also requires them to be able to turn that service off when customers stop paying.
`... and has nothing to do with people "maintaining control of their private repo."`
I disagree here, I think you're using a notion of "control" that is too narrow. A notion that captures the scope better would be that Github is adding value by enabling project management and collaboration. The ability to do pull requests, manage issues, etc. is all tied to forking.
That's what makes their service worth paying for, and it's also what they have to turn off if you don't want to pay.
`That's all I care about, Github working like Git, in most major ways.`
What you're saying here is reasonable enough I expect you can find providers that are pretty close to that standard, and it's a fair point that Github's business model has some sharp edges. In this case, it's a consequence of "we turn the lights off if you don't pay," and an alternative has to deal with the same bottom line.
Companies write these "We are can do whatever" clauses to license agreements so that they can point to them and make people go away.
IANAL, but EULA terms like are typically in breach with Consumer Law protections against unfair
contract terms. If you are private person (consumer) paying for a service, this kind of term may be
null and void.
At first blush, I thought, "GitHub is mean and just locking-in their paying customers forever!!" But now after reading the comments here, I think this is a very pretty way to represent the nuances of a fork relationship.
If the original owners of a private codebase disappear, then it is reasonable and just that all of the forks to their code should as well. I believe it's reasonable and just because GitHub have no way to interpret _why_ the original owner dropped off and must respect their ownership by breaking the relationship.
I'm not affiliated with either hosting services and I use both :)
Don't rely on GitHub being your only source location. Git is a DVCS, meaning you don't have to have a single upstream. Make sure you are pushing / mirroring to a third party (GitLab, Bitbucket, personal Git Server, etc)
$ vi /usr/bin/gpa
git push github $?
git push bitbucket $?
I think it's time.
Fork seems like some feature cloud storage services invented.
Fork used to have a very negative connotation, and that was related to the difficulty required to pull one off. When you forked a project, you had to set up your own hosting, web space, issue tracker, version control system (and you probably couldn't clone and start off the upstream's VCS), and so forth. The technical barrier made it so that you only forked a project when you have severe disagreements with upstream; some of the most famous in history include EGCS from GCC, and XEmacs from Emacs.
With DVCSes, including Git, much of that technical difficulty is automatically removed. Every clone can potentially be a fork (as in, a new independent project), and this was one of Linus's intentions. If forking is easy, it keeps upstream on their toes. After being adopted by DVCS hosting sites like GitHub, it turned around to being a positive term even.
"Forks" are really just branches and branches are really just references. Forking just adds a few more references to the same repository.
A copy-on-write fork is most likely what is being implemented.
If you really wanted your own personal copy you need to make a duplicate. That duplicate will be yours not a reference to theirs.
It was strange the first time I ran into it. I deleted an account. I had shared the documents with another account of mine. Then I noticed when I deleted the first account I lost access to those docs. Now I know I hadn't copied the docs I'd only had references to them. Next time I'll know to make copies if I actually want copies and not just access to another account's shared docs.
The policy itself needs to be changed, the individual case is resolved.
Besides, you can request to disconnect your fork from an upstream project. I expect such a request will also resolve this problem.
"Central" server is not special in any way, other than being readily accessible by everyone in the team.
Some services break this by piling other stuff on top of git that's not distributed (like issues, ci, etc.), but that's not a problem of git.
With a truly decentralized system the source of truth would be content addressable. For example, it could be obtained via a public key rather than a server's domain name.