
Show HN: Effort to clone unmaintained SourceForge projects to GitHub - hydragit
https://a-sf-mirror.github.io/
======
pavlov
Why Github? Copying from one commercial provider to another doesn't solve the
fundamental problem. Using git helps, but most of those old repos will never
get cloned.

In 10 years time, Github may be the tired old service that gets acquired by a
hedge fund that decides to monetize their repos. Such things are part of the
corporate lifecycle.

~~~
Klathmon
Do you have any other suggestions? Hosting these repos on donated/personal
machines is (IMO) significantly less likely to stand the test of time.

At least with a commercial entity there is a bit more "trust" involved that
they won't disappear out of the blue one day. And if the time comes that
Github starts to collapse, the process can be repeated.

Just because something isn't permanent doesn't mean it's pointless.

~~~
wslh
> Do you have any other suggestions?

Archive.org ?

And weird that nobody suggested the Bitcoin block chain. I don't think
binaries are a good fit but source code doesn't require a lot of space. With
the current and future block size it will take sometime to make it happen.

~~~
joshstrange
Archive.org (while an amazing resource created lovingly by amaing people) is
not a great front-end for stuff like this. Github is very easy to get started
with and excels at code hosting. As for the blockchain that's a terrible idea,
there are so many things wrong with it including the cost to push all of that
data into the blockchain and the fact that while source code can be small it
isn't always and it's magnitudes of times larger. Right now each block is 1MB
and blocks take some 10 minutes for just 1 confirmation so you are looking at
< 1.7KBpss (13.6Kbps) "upload" speeds. IF you actually attempted this you
would have to have some sort of header on each transaction to tie it all
together which lowers the speed even further. I'd bet money that if you
started uploading nodes would either ban you or the core devs would do
something to stop the chain from being filled with shit that now has to get
replicated to 10's of thousands of machines across the globe.

~~~
wslh
I received a lot of downvotes but my comment was a bit ironical since in many
forums (even in HN in the past) when someone talked about backups many people
suggest the block chain.

I said: _With the current and future block size it will take sometime to make
it happen_.

~~~
ajkjk
The absurdity of that suggestion it not absolved by the fact that someone else
has said it before, or by the fact that you said it will take 'some time for
it to happen'. It's completely not an option for the migration we're
discussing.

~~~
wslh
It is not absurdity if it is irony and I clearly said that it can't be done
now but may be in the future. I can't see the future, do you?

~~~
hughw
It's not practical, but it's a worthy goal for blockchain computing, or some
descendant of it. So I am glad you brought it up.

------
TazeTSchnitzel
Why aren't you mirroring the binaries? These are vital for people in the
future who do not have the time to set up a build environment for software
from a decade ago.

I'd also echo the concerns of others about GitHub.

Proper archivists should do for SourceForge what they did for other projects.
Archive Team, maybe? Looks like they have a wiki page:
[http://www.archiveteam.org/index.php?title=SourceForge](http://www.archiveteam.org/index.php?title=SourceForge)

~~~
bentpins
This was in progress, 830GB was downloaded before a Sourceforge guy popped
onto the IRC and said he's ok with the archiving, but that the robots.txt
should be respected. This would put things at a practical standstill. So the
downloading was paused, I'm not really sure what's happened in the week since.

Right now Xfire's videos, several URL shortners' links, and Toshiba Support
material are being archived. If you have spare cycles and bandwidth, and want
to contribute, running an instance of the "ArchiveTeam Warrior" is pretty easy
through docker or a VM.
[http://archiveteam.org/index.php?title=Warrior](http://archiveteam.org/index.php?title=Warrior)

~~~
nadams
Honestly I think ignoring robots.txt in this case is acceptable. Even if he
programs in code to respect robots.txt - once the management at sourceforge
get wind of what he is doing - what is stopping sourceforge from putting up
robots.txt everywhere blocking him?

~~~
ihatehn
Look at their current robots.txt; they're already prohibiting robots to crawl
the actual source code:
[http://sourceforge.net/robots.txt](http://sourceforge.net/robots.txt)

------
Osmium
Honestly, this is a serious issue for my field. There are so many obscure
academic binaries hosted on SF... I hope someone manages to mirror them. [The
fact that a lot of the scientific community is so backwards in adopting modern
coding standards is another conversation for another day.]

------
estrabd
Sourceforge is on the radar here, but maybe it's time to step it up.

[http://www.archiveteam.org/index.php?title=Fire_Drill](http://www.archiveteam.org/index.php?title=Fire_Drill)

Update: seems others have linked to archiveteam.org, so maybe that's the best
route. Is the OP part of the AT effort or do they know about each other? Maybe
they should.

------
lcswi
Nice! But in my opinion better help archiveteam with their efforts!

~~~
hydragit
I confess my ignorance regarding archive.org's various collections. There seem
to be a lot of them, which one are you referring to?

~~~
danieloaks
They're not a part of archive.org (just a totally separate group with similar
interests, lead by Jason Scott). The specific project page is here:
[http://archiveteam.org/index.php?title=SourceForge](http://archiveteam.org/index.php?title=SourceForge)

The best place to pop in is probably on IRC. For the Sourceforge project it's
#coldstorage on EFnet,
[http://chat.efnet.org:9090/?nick=&channels=%23coldstorage&Lo...](http://chat.efnet.org:9090/?nick=&channels=%23coldstorage&Login=Login)
for the web client. Though note, the ArchiveTeam project seems to be paused
right now.

~~~
bentpins
To add to this - ArchiveTeam often works with archive.org who have arranged
long term storage for a lot of retrieved content.

------
jmkni
Nice.

I agree with what the others are saying, there's a lot of source code for
solving obscure programs that is only on Sourceforge.

One example I found recently is a program called QLumEdit. I recently had to
figure out how to work with EuLumdat files, and if it wasn't for the source
code for this program on Sourceforge I would have been completely stumped
(well not quite, but it would have taken me ages).

If SF goes down the toilet, a lot of knowledge goes with it so this is awesome
to see!

If anybody is interested, I was converting this code from C++ to .net, my
horrible hacky unrefactored effort is here -
[https://github.com/bumblebeeman/eulum.net](https://github.com/bumblebeeman/eulum.net)

I am planning to make this code nicer, and develop it into a WPF app when I
have time!

I am getting pretty close too, here is my .net generated version of the images
this program produces: [http://imgur.com/PCmpnJ2](http://imgur.com/PCmpnJ2)

------
ksherlock
That's great. I started doing that myself (my own git server, not github) for
some projects I care about. This effort seems include a very narrow list,
though.

For CVS, though, I suspect cvs-fast-export [1] will do a better job than git-
cvsimport.

1 [http://www.catb.org/esr/cvs-fast-export/](http://www.catb.org/esr/cvs-fast-
export/)

~~~
hydragit
Thanks, I'll have a look

------
jaytaylor
What about creating a torrent containing all these unmaintained SF projects
(with binary downloads included)?

This would dramatically increase the odds that the content is never lost.

~~~
nextweek2
The problem with torrents is the lack of incremental update support. If the
base torrent gets updated it gets a new hash identifier. How do you know its
been changed to ensure you get the latest version. When you do the swarm
effectively gets diluted because some are on the new architecture and some are
on older versions.

~~~
jaytaylor
If the SF projects aren't being updated, then what's the issue? The
information is, by definition, static.

------
frik
> Currently, for each cloned project, we mirror its CVS repository and its
> website.

Please add "SVN" (Subversion)

------
coliveira
This seems much more like a temporary fix, not really a solution. A few years
from now GitHub can do the same thing that sf did. This after all seems to be
the fate of commercial companies that explore open source, once they start to
lose users to new competitors.

------
egsec
Note 1: Moving things to GitHub or elsewhere does not remove them from
SourceForge. So SF can continue to host and enjoy links on unmaintained
websites, search engines etc.

Note 2: If their business model is offering popular binaries and source, they
can just copy these from other sites and repackage them. Open source software
allows you to do this. If no one else is interesting in bundling and
monetizing, then they can buy traffic and still succeed.

Note 3: Remember that academy award winning movie from 1943? Not so great it
today's light. While perhaps one of the goals of the Internet and cheap
storage is to keep a copy of everything, and its often better to _not_ re-
invent the wheel, if something fall by the wayside, and its needed, it will be
created.

Note 4: There are plenty of websites which catalog useful abandonware, that
someone had to find a physical disk drive from. If the software has value,
chances are someone will eventually repost it somewhere without a massive
organized effort.

\----

There is clearly value in moving over some project to GitHub or elsewhere, but
if some things are not migrated or moved life will go on.

~~~
GFK_of_xmaspast
Per the historical record, that "academy award winning movie from 1943" was
'Casablanca'.

