
What is a fork, and how GitHub changed its meaning - ddevault
https://drewdevault.com/2019/05/24/What-is-a-fork.html
======
3JPLW
It's a little funny to focus on GitHub's fork/pull-request model if you're
trying to critique their centralization and lock-in. Every single successful
open source project — on GitHub or not — has a canonical "source" branch and
some sort of organization/leadership that decides what goes in. I'd argue that
GitHub's behavior here isn't a power grab so much as a reflection of reality.

The "real" lock-in is with the _discussion_ model — on both issues and pull
requests — and the organizational structure you're able to set up. It's easy
to move the source code to another service. Heck, you can even email patches
to maintainers of a project on GitHub without creating an account. It's not
easy to move historical discussions/reviews or "commit bits"/maintainership
roles, and of course you can't participate in reviews of an emailed patch that
a maintainer PRs for you unless you create an account.

~~~
perennate
I'd argue that at least their API indicates they try to reduce rather than
reinforce this lock-in. The API lets you build tools like
[https://github.com/colmsjo/github-issues-
export](https://github.com/colmsjo/github-issues-export) to automatically
export data from issues and other project data stored outside of git.

Moving this data is still not trivial but I don't think GitHub could
unilaterally make it much easier.

([https://developer.github.com/v3/issues/](https://developer.github.com/v3/issues/))

~~~
organsnyder
I'd love to see a standard develop for storing this information in-repo.
GitHub could certainly drive such an initiative.

~~~
Ironchefpython
You might be thinking about fossil. Every feature is in-repo.

[https://fossil-scm.org/home/doc/trunk/www/index.wiki](https://fossil-
scm.org/home/doc/trunk/www/index.wiki)

~~~
carlmr
That looks really cool. Is anybody already using this?

~~~
rkeene2
Well, SQLite is using it of course ;-)

Other large projects include Tcl/Tk and many related extensions, as well as
all the cool stuff I write:

[http://chiselapp.com/user/rkeene/](http://chiselapp.com/user/rkeene/)

------
shafte
I don't know if it's correct to characterize Github's model as a power grab.
The design of Github definitely pushes things in a more centralized direction,
but I think that approach is superior in many cases and it's not purely for
profit.

For many projects, having a single "canonical" version is the best experience
for both users of the project and developers. Linux is large and important
enough that it may make sense to have many different distros running a
slightly different set of patches and accept the overhead of managing multiple
sources of truth. For smaller projects with more narrow contributor bases, it
would be noisy and confusing.

~~~
lixtra
> I don't know if it's correct to characterize Github's model as a power grab.

The important question would be if Github forks and pull requests are an open
protocol or attached to the platform. I’m not aware if I can make a pull
request from - say - bitbucket to github. Can I ? Then it’s not a power grab.
If not then redefinition of fork/pull request is an extend and extinguish
move.

Edit: okay, extinguish is too harsh. GitHub doesn’t want to extinguish git.

~~~
TeMPOraL
Forks and PRs are UI sugar on top of Git operations, so it's technically open.
That said, merging in a PR is an operation that touches 2 repos, with
different user privileges, so I feel it would be hard (though not impossible)
to make the UI work with multiple services, without adding extra cross-service
auth headache.

~~~
the_duke
That's only true for private repos though, for open source repos it would work
just fine.

------
dmh2000
I'm a casual GitHub user, and for me the fork button enabled me to grab a copy
of some other repo, make changes , commit to my github repo without perturbing
the main project and still keeping an online history at github. Its a step
above just downloading the zip file and below getting commit rights to the
original project. If I just made a personal branch, then I would have
everything but the online backup because I couldn't commit.

One example is used by many online learning companies: they provide some
baseline of code for use in a course. you need to get it to use in the lessons
(edit: and make changes that you want to save). You can download a zip, clone
or fork the repo. zip and clone don't get you the online backup.

It would be interesting to see how many people use Github for the reasons
cited in the article (using a fork as staging for a pull request) or like I
do.

As I wrote this, it made me realize I am a parasite on GitHub and the projects
I fork, since I rarely contribute back (mainly because I don't have anything
useful).

~~~
johannes1234321
You don't need a fork button for that.

    
    
      git clone ssh://example.com/repo.git
      [... Edit ...]
      git push ssh://myserver example.com/myrepo.git master
    

Benefit of the fork button is that GitHub links those repos on their site.

~~~
zbentley
Ah yes, all of the students in GP's example need merely need to learn how to
provision, DNS-associate, and maintain a server, then they can store their Git
repositories on it. That will surely not add any undue overhead for them.

~~~
johannes1234321
There is no need to run a server. This is only about the "fork" button. You
could but GitHub.com in the place of example.com. Git hosting existed before
GitHub.

------
morpheuskafka
I can't imagine doing pull requests over email, which this article notes is an
original feature of git. Maybe kernel developers could handle it, but there
are a lot of people who don't even know how to send plaintext emails and this
would not be conducive to reviewing PRs on mobile. I definitely like the
GitHub flow better than SourceHut's.

~~~
Sir_Cmpwn
I shared this video on the Lobsters thread which seems to have been helpful
for some:

[https://yukari.sr.ht/aerc-intro.webm](https://yukari.sr.ht/aerc-intro.webm)

Don't repost this, please, the official aerc announcement is coming in a few
weeks.

I'm working on making a UI similar to GitHub's for reviewing patches which is
built on email underneath, but can be used entirely on the web. The first step
of this became available a few weeks ago, and now email threads are being
rendered into a review UI which is similar to GitHub's with inline feedback
and such:

[https://lists.sr.ht/~philmd/qemu/patches/5556](https://lists.sr.ht/~philmd/qemu/patches/5556)

This page is fairly new and still needs a bit more work towards mobile
support, but I hope that gives you some more confidence in the platform. I
intend to extend this so that you can also review patches from the web, which
will generate emails on the mailing list, and prepare patchsets from the web
as well. The end result will be a UI which is remarkably similar to GitHub in
terms of usability on the web, but is backed up with distributed technologies
and seamlessly integrates with the workflows of devs who would rather use
email.

On the whole Sourcehut is actually quite a bit better at responsiveness than
GitHub, too. Rather than a dedicated and inferior mobile site, almost all
sr.ht pages are responsive and equally capable on all form-factors.

~~~
instantwhat
Friendly pedantry: in that video, you referred to vi as "vee"? :)

~~~
Sir_Cmpwn
Yes, because that's what it's called :)

~~~
instantwhat
So you're saying that you don't care what the original author named his
software, and you're going to say it your way, just to be different?

------
crazysim
It is a curse that GitHub's code search can't code search forked repositories.
If there's a really popular fork of a dead repo, it's code is invisible.
Though, I have a feeling that is a technical, not business, limitation.

~~~
WorldMaker
The Network graph has gotten increasingly buried in GitHub UX (currently it is
under Insights > Network, at one point I recall it was a top level tab of its
own) because it's a very useful power user feature, but can be super confusing
to new users. (It's particularly a shame it is so hidden because IMNSHO it is
the best commit graph that GitHub has, much more useful than the default
commits list on the Code tab.)

The Network graph is extremely useful for getting a sense if a particular fork
is dead and if there's a lot of activity happening on another fork (and
letting you jump directly to other fork).

~~~
dcbadacd
Assuming GitHub is willing to show you the graph, for some projects it isn't.

~~~
WorldMaker
I've only seen GitHub not show the graph at all when the total fork count for
a Repo was huge and they've since changed it to always try to show "top 100
most recently committed forks" instead as a performance optimization in that
case.

Easiest example off the top of my head:
[https://github.com/DefinitelyTyped/DefinitelyTyped/network](https://github.com/DefinitelyTyped/DefinitelyTyped/network)

------
jrandm
> On GitHub, a fork refers to a copy of a repository used by a contributor to
> stage changes they’d like to propose upstream

I'm not sure this is accurate and is the crux the argument. IMO a fork on GH
refers to a "fresh" (in terms of the tooling around the repo, eg issues/PRs)
copy of the repo on another account, which may be used as a branch to push to
upstream but may also be the "traditional" fork, the difference is entirely in
how the copy is used. If you've got a public "personal branch" with its own
associated tooling it doesn't seem like a stretch to call that a fork, whether
temporary and intended to be sent back upstream or not, and to me it's a
difference without a distinction about why "personal branch" is a better term
than "fork." Forking a repo (spinning off a dev section, including CI/CD/issue
tracking/management -- a branch implies sharing that infrastructure IMO) isn't
the same thing as forking a project (spinning off a competing project).

To the cathedral and bazaar points, I don't see how GH affects the development
style at all. The only thing that really makes the mailing-list driven Linux
dev more decentralized is that it's done via email... yet someone still
chooses what goes into releases, maintains & hosts that mailing list/website
mirror and could limit access, or the "core" team could simply email each
other and lock out any public view. GH can be configured to be just as open
(if needing their "fork" model to keep anyone from being able to push over the
original hosted copy) or just as closed, depending on how the project is run.
The cathedral and bazaar is about project philosophy and management, not the
underlying tech, to my reading.

Given the disclaimer that the author is building a GH competitor my cynical
thought is this is really marketing aimed at the programmer niche; I would
have enjoyed it more as a contrast against centralization and the benefits of
his (as I understand it) more free/open/decentralized competitor.

All that said, I think sr.ht/sourcehut is a cool project and can easily see
myself switching to it. How have the ads in 2600 been working?

------
hyperpallium
I really like how github fork can be implemented as new HEAD etc references to
the same objects. New commits refer to the original project's objects.

i.e. git's content addressing, intended for identical distributed objects also
automatically enables de-duplication of identical centralized objects.

Regarding the article, it seems to be saying you can't bazazr-fork a project
on github. I don't see why not - although the default _fork_ is associated
with the forked project, why can't you start a new project, using a clone of
the forked project?

~~~
Sir_Cmpwn
>I really like how github fork can be implemented as new HEAD etc references
to the same objects. New commits refer to the original project's objects.

This can be done (more effectively, actually) without the user's explicit
involvement in the fork process. You can dedupe blobs across the entire
platform on git push.

>Regarding the article, it seems to be saying you can't bazazr-fork a project
on github

I'm not saying you _can 't_ (I thought that was clear enough), but that GitHub
is designed to encourage you to use a different approach.

------
robbintt
Fork distributes control over history and abstracts permission management away
from the traditional git branch pattern so you don't clutter the namespace of
the canonical remote but can still offer work.

Fork is almost nothing, it's just a useful pattern.

------
k__
I don't understand. I forked quite some repos on GitHub, because they weren't
maintained anymore or I needed some feature the maintainer wouldn't implement.

------
dreamcompiler
In the absence of external energy inputs, decentralized systems tend toward
centralization. I don't much like it, bit it's not really Github's fault that
it happens.

