
GCC Still Months Away from Transitioning to Git, Reposurgeon Being Ported to Go - dcu
https://www.phoronix.com/scan.php?page=news_item&px=GCC-Reposurgeon-Py-To-Go-90
======
Jyaif
What the... how can converting a svn repo to a git repo take that much memory?
Can't you go through the svn commits one by one, and each time create a
matching git commit??

~~~
db48x
You certainly can, but then you lose all the branches. The SVN history is
purely linear, with branches represented as directories containing copies of
the files. Older versions of SVN kept less metadata about how those branches
were created, so reposurgeon has heuristics to make reasonable got branches
out of them, which the user can then fine-tune and correct.

Added to this is the damage introduced by various tools used to convert to SVN
from CVS, and the lack of metadata that CVS kept about branches.

You could easily partition the list of SVN nodes and process each subset
individually, but the results would be wrong. I don't know that its
necessarily impossible to parallelize, but its not very easy. It may be
possible to have a first pass that identifies the dependencies between
commits, and the future passes use those dependencies to partition the work
(perhaps with a work-stealing queue). However, the user needs to be able to
arbitrarily modify those dependencies, and its not even clear if detecting the
dependencies is less work than doing the whole conversion. Also, all the graph
traversal operations have terrible data locality.

I'm only peripherally involved, but its a fun project. When I used Reposurgeon
to convert an SVN repository some years ago, I found that it took 40 hours to
complete. This was easily 20 times longer than the largest Repository anyone
else had converted, but my repo was only 2 or 3 times larger; the performance
is certainly not linear in the number of commits. With some profiling and some
patches, I was able to get it down to a few hours, and others have improved it
more since then. It's still nonlinear though, so the time will still blow up
on large repositories; there's no escaping those annoying exponents.

------
pvorb
Could anyone elaborate on why he's not simply using git-svn?

