
GitHub-backup:  backs up everything GitHub knows about a repository or a user - progval
https://github.com/joeyh/github-backup
======
joeyh
There are a lot of different distributed issue trackers built on top of git.
None widely known or used and all incompatable. If GitHub included issues and
pull requests in a git repo (either in branch or in a separate git repo as
they do for wikis), it would become an instant de-facto standard.

With such a standard, many new things would immediately spring up to use it.
Bug reporting, and managing bug reports, and forwarding bug reports between
related projects is a big pain point, largely due to the fractured mess of bug
tracking systems. A thousand flowers would bloom.

The only reason GitHub has to not do this, as far as I can tell, is that
keeping the issues locked in their silo makes it a little bit harder for
competition to migrate repositories from GitHub. Although I hear GitLab
migrates issues anyway, and the API makes this not hard (although perhaps
needing lots of time due to rate limiting). And of course, they probably
slapped a SQL database down on day 1 for issues without thinking too much
about it, and so it would be some effort to move them into the git repo now.

Not holding my breath, which is why I wrote github-backup. Well, also because
it recursively backs up forks, and I've lost changes in deleted forks enough
times to want to back them up automatically.

~~~
jstepka
What you're after is something like Fossil;

* [http://www.fossil-scm.org/](http://www.fossil-scm.org/)

... which is a distributed version control system that embeds issue and wiki
data.

What would be a great next step for version control (similar to how we went
from cvs to svn to git) would be to embed issue and PR data into the
repository data structure.

I haven't worked on Git / Bitbucket in a while (at Docker now), so I haven't
been tracking it very close... what would be ideal, maybe it's already
there... would be to design and build on the popularity of Git with a module
that stores the issue / PR data giving you true portability. Over time that
module could be made mandatory giving you the flexibility you're after.

~~~
snuxoll
Never heard of fossil until now, looks like an interesting project. I can see
the reasoning for the lack of a rebase command but man things like git merge
--squash make our commit history so much cleaner (but again, the argument
raised is that the developers care about history as it happened, not as they
wanted it to happen).

------
mintplant
Thanks for this!

A couple years ago I was having trouble falling asleep at night. I found a
series of music mixes that helped me drift off, and they were working well
until one day the creator decided to scrub every trace of themselves from the
internet and disappear. I generally believe that people should be able to
delete what they put online, but ever since then I've maintained my own
personal archive of things I wouldn't want to lose access to forever.

I use HTTrack for backing up static sites, youtube-dl for YouTube, SoundCloud,
and the like, and now I'll be using this for repos on GitHub. Any more good
archiving tools?

~~~
meowface
>I generally believe that people should be able to delete what they put online

I might get some flak for this, but I think the opposite. I think when someone
posts something online, it should remain online forever. It's a tragedy that
information ever gets deleted. The fact that someone could publish something
thousands or even millions of people enjoy and then take it away from them is
a shame. Even if it's something only a handful of people enjoy.

Even in cases of blatant slander, I don't think censorship or deletion is ever
justifiable. In cases of proven slander/libel, some sort of bright red notice
should be placed above and below the content indicating a court found it to be
false and libelous.

People do make mistakes, but I think it's better for information to always be
"append-only". If I had some old embarrassing blog posts, I wouldn't delete
them, but rather add a warning or disclaimer that I no longer hold those views
and regret making that post.

I have a _lot_ of embarrassing things about me on the Internet that I wouldn't
want an employer to find, from when I was much younger, but which they can
find through careful Googling. I just have to accept those things happened,
and to try and make a case for why these were mistakes from when I was a
teenager and not how I am today.

I totally support archives and web scraping, and am appalled by the EU's
"right to be forgotten" ruling.

~~~
breakingcups
There can be a lot of reasons to want to delete something you've posted
online, a stalker for example. I think it's careless to dismiss all such
reasons in one swoop.

~~~
meowface
I agree that's a valid reason, but on principle, I think they still shouldn't
have that option. They should address the stalking directly. Removing content
isn't going to dissuade a stalker; it'll probably only make them more
interested, honestly.

------
NotUsingLinux
This shows (again) the fundamental issue with digital identity.

People try to save their data or 'status' like stars from one provider to
another, while painfully experiencing that vendors only lockin their data, but
it belongs to them.

It will be very interesting when there rises a plattform for digital trust
which gians enough users. Could be comming from the programming community
first.

Does anyone see things like GitTorrent as a soulution to the problem?

[https://github.com/cjb/GitTorrent](https://github.com/cjb/GitTorrent)

------
slantedview
So you backup things like stars. Can you restore?

------
blainesch
In the "why" section you list 3 reasons.

> In case something happens to GitHub. More generally because keeping your
> data in the cloud and relying on the cloud to back it up is foolish

I disagree, cloud backups are more reliable.

> In case someone takes down a repository that you were interested in. If you
> run github-backup with your username, it will back up all the repositories
> you have watched and starred.

If the repo goes down it's already going to be a lot harder to use, but it
seems easier to fork a repo.

> So you can keep working on your repository while on a plane, or on a remote
> beach or mountaintop. Just like Linus intended.

When have you not been able to work on a local repo locally?

Overall, I think I'm missing the point.

~~~
lucaspiller
> In case something happens to GitHub. More generally because keeping your
> data in the cloud and relying on the cloud to back it up is foolish

Instead of the cloud, I read that as "the hands of a company that may get
sold, shutdown, agendas may change, etc". I don't think that will happen with
GitHub anytime soon, but 10 years ago I would have said the same about
Sourceforge :-)

~~~
fapjacks
That is usually how it goes, isn't it? People kill the golden goose because
they start to think their platform is invincible and the laws of physics don't
apply to them anymore. Then they become vulnerable to all kinds of problems.

