
EdenSCM – A cross-platform, scalable source control management system - imoldfella
https://github.com/facebookexperimental/eden
======
Game_Ender
Facebook rewrote Mercurial, while Microsoft has essentially expanded git with
their own virtual file system VFSforGit [0] and a bunch of performance
improvements.

0 - [https://vfsforgit.org/](https://vfsforgit.org/)

~~~
searchableguy
Curious why facebook went for mercurial?

~~~
wincent
They explain it here: [https://engineering.fb.com/core-data/scaling-mercurial-
at-fa...](https://engineering.fb.com/core-data/scaling-mercurial-at-facebook/)

The official reason is that the "internals of Git" weren't conduce to the
kinds of invasive changes they needed/wanted. But I think the truth is closer
to being that it was going to be too hard/slow to get those invasive changes
past the Git mailing list.

Here's an example of a FB eng reaching out to the mailing list:
[http://git.661346.n2.nabble.com/Git-performance-results-
on-a...](http://git.661346.n2.nabble.com/Git-performance-results-on-a-large-
repository-td7250867.html)

~~~
gen220
This was a super fun read, thanks for sharing.

The mailing list piece is from 2012, and describes how git is very slow on a
synthetic repo with millions of files and commits. Today, my current place of
work has a monorepo that’s approaching the size described in this mailing
list, but git _seems_ to be holding up just fine. If you checkout a branch
that’s far enough away from master it takes a minute, but add, rebase, commit,
status and blame are all negligibly impacted speed-wise. The only issue we run
into is rejected non-conflicting pushes to master during peak hours, with
maybe several dozens of engineers trying to merge and push master
simultaneously.

Does anybody have any insight into what’s changed in git internally since 2012
to support bigger repos?

~~~
beagle3
I don’t think there is one single change that made a huge difference. I follow
the changelogs posted on the mailing list, and of the performance related
changes, it’s often “we got 3-5% speed up on this benchmark on this fs without
making things worse on others”.

Over 8 years and tens of those changes, it adds up to a significant
performance improvement.

------
eitland
BTW : with so many smart and connected people interested in source control
management in one place, does anyone know what happened to veracity scm
([http://veracity-scm.com](http://veracity-scm.com))?

It was very promising but the suddenly stopped updating and then (more or less
intentionally it seemed) links stopped working, but the site is still up 7
years later...

~~~
jboynyc
The developers went on to work on other things:
[https://web.archive.org/web/20130915093113/veracity-
scm.com/...](https://web.archive.org/web/20130915093113/veracity-
scm.com/qa/questions/1935/is-veracity-still-under-development)

~~~
eitland
ah, thanks, I tried to figure out for years because I felt they got so much
right ux wise.

------
beagle3
There is more tooling needed in general - just one recursive grep will
populate the entire EdenFS.

I suppose FB has better tools. But I won’t touch this until the ecosystem is
sufficient (and also, because git and hg are perfectly sufficient for the
monorepo I oversee)

~~~
wincent
The file system, the VCS, and the ability to index/grep the repo are just the
tip of the iceberg. Pretty much everything that needs to access the filesystem
or which depends on the contents/structure of the repo in some way needs to be
re-built from scratch or otherwise dramatically customized to operate in this
new landscape. That means a long catalog of tools and many years of effort.

This was already starting to become true even before FB switched to Eden, and
before it switched to Mercurial (grepping, for example, was already back in
the Git days being served by a custom grep service).

------
amelius
Interesting. This might be a step in the direction of being able to store big
files in repositories without hassle.

However, perhaps OS level support would be preferred. Imagine you have a type
of symbolic link that is not just followed, but executed when you access it.
That would be really powerful and would allow this kind of optimization. And
you wouldn't even need to install or run anything.

~~~
oconnor663
Sounds a lot like FUSE?

------
m12k
Does this work together with a build cache? My dream setup for dealing with
building huge code bases is a file system integrated with the version control
system to only download files when they are accessed (which sounds like what
this does) but also employing a build cache and module system, so it doesn't
even need to download and compile any module that has not been touched, it
just downloads the result from the build cache instead.

~~~
kyrra
Seems like they can be separated systems. For example, Google's Bazel supports
build caches.

[https://docs.bazel.build/versions/master/remote-
caching.html](https://docs.bazel.build/versions/master/remote-caching.html)

------
qznc
> A virtual filesystem for speeding up the performance of source control
> checkouts.

To describe it as a filesystem matches my thinking [0]: "It already resembles
a network file system, so it should provide an interface nearly as easy to
use."

If it really takes off as an Open Source project, we might be able to "mount"
repostories eventually.

[0]
[http://beza1e1.tuxen.de/monorepo_vcs.html](http://beza1e1.tuxen.de/monorepo_vcs.html)

~~~
cat199
> we might be able to "mount" repostories eventually.

a) [https://github.com/presslabs/gitfs](https://github.com/presslabs/gitfs)

b)
[https://en.wikipedia.org/wiki/Rational_ClearCase#History](https://en.wikipedia.org/wiki/Rational_ClearCase#History)

------
smitty1e
If I were going to try something other than git, it would be fossil =>
[https://fossil-scm.org/home/doc/trunk/www/index.wiki](https://fossil-
scm.org/home/doc/trunk/www/index.wiki)

~~~
bch
Fossil is awesome, but (currently) would not scale to Facebook-size repos. It
seems to be fine with long histories (lots of commits), but bogs down on lots
of files (based on discussions surrounding a large repo considering moving to
fossil).

For personal projects, it’s absolutely the bees knees. I can have multiple
checkouts of the same repo, ask fossil about every single repo I have, has a
really pleasant CLI, has a web-interface (which I personally hardly use
anymore), manages tickets, ... sort of github in-a-box, but more pleasant.

------
galaxyLogic
I think the one big differentiator for this is:

"EdenSCM is not a distributed source control system. In order to support
massive repositories, not all repository data is downloaded to the client
system when checking out a repository"

