Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Eden (github.com/facebookexperimental)
624 points by tosh on April 12, 2022 | hide | past | favorite | 227 comments


The SCM ecosystem at Facebook is tremendously powerful and the result of some of the best minds at Facebook working on those systems for many years. From the scaling of the monorepos to the code review workflows, nothing really matches it. The ergonomics of most of the tooling was simply top notch (which it needed to be... engineers, particularly at Meta, are an opinionated lot who don't tolerate poor tools).

It's great to see this out in the wild now.


After doing a round of onsite with fb tools team, I got the impression there were lots of bright engineers that wanted fb pay without having to touch fb products.


A lot of programmers would rather write tools for other programmers than for non-programmer end users. It's the target market that they know best, and it's most prestigious.

I wish more of those bright engineers would try to solve problems for other users rather than re-re-re-re-re-optimizing the life of their colleagues. But writing code for users involves, among other things, knowing something about users, which is more bogged down and less fun.


I work on developer tools at google, and it's not even remotely one of the prestigious teams. it is however the most fun and interesting time I've ever had in a job, and if I left it would only be for the same kind of work.

also you seem to have fallen for the same fallacy as the people who complain about open source devs not working on what they consider sufficiently important problems. trust me, there are no shortage of engineers willing and eager to work on end user problems, and the dev tools work enables them to deliver solutions to those problems faster, and makes those solutions work more reliably. it's a rising tide that lifts everyone's boats, not a zero sum game.


> I work on developer tools at google, and it's not even remotely one of the prestigious teams

Out of interest, what is? (In your opinion)


hard to say from inside the company, since you get to thinking of everyone as just colleagues who happen to be working on different parts of the system. but in terms of larger world prestige I would say the machine learning folks are definitely up there, also the internet/petabyte scale stuff that other developers know enough about the difficulties behind to be impressed by (spanner for instance), and perhaps go since it has achieved a massive amount of popularity. I honestly don't know if there are currently any end user google products "sexy" enough that working on them is extra prestigious.


You're missing webranking, and other core components of core products. Deciding in which order results pop up on google.com is both very challenging and we'll recognized. Same with people that do ad ranking, people that manage the enormous storage systems, etc. If it is core to the company it is prestigious, you never have to explain the impact of your job.


The core product ... which would be in this case search or the DL teams etc.


Their colleagues ARE users, they are just a different type, and a type that engineers understand more.

I would rather a good engineer put their skills towards helping others use their skills more effectively than put that same engineer in a place where they don't like the work and produce below par results.


But on the contrary.. many of the things well-built for the majority of 'non-programmer end users' are not the way I (as a 'programmer end user', whether the particular thing is a programming-adjacent tool or not) would ideally like them to be.

So I'd selfishly adjust it to 'solve other problems for the same users'! Asahi, for example, awesome stuff - I've lost count how many times I've had to explain (to both colleagues and non-engineer family/friends/etc.) 'no no, absolutely agree, love Mac hardware [...]'.


I'm very curious what made you say that building tooling for developers is "most prestigious". It doesn't really match my intuition, can you explain more?


+1, that statement needs a citation, dev tools team is not more prestigious than some medium teams in FAANG in my experience


I’d say citation is not needed when the person has experience in the field, can cite himself telling you their opinion and that will suffice. Your opinion may differ of course


Considering the state of our tooling, where documentation frequently does not show up without extensive config or is not soft-wrapped in Jetbrains products (which are only possibly second to Visual Studio in overall quality), and projects simply stop compiling and need magic tricks like invalidating caches to work again (and sometimes the trick reduces to reinstall every thing and delete all config), I don’t see how more tooling is so scorned.


They are definitely not "it's [sic] most prestigious" market, but they inspire admiration among other devs


For some of us it's really hard to understand / solve problems of people whose life is spent on browsing instagram most of the time - when it's not TikTok.


You’re getting downvoted, and that’s not undeserved because this is very judgmental on its face. But I get the sense there’s a sincere answer to a few sincere questions worth at least asking: have you asked them (users you’ve categorized this way) that direct question? Or asked how work you’re invested in isn’t addressing their problems/goals? If so, and if you found their answers unrelatable, did you ask further questions which might clarify for you what they want?

I know all of that is a lot to ask of anyone. But you might find you learn things you have in common with people you don’t expect to. And it might make your job less frictionful too.


It's not like I don't understand their problems/goals. Just yesterday the person I was spending time with showed a cat video from Facebook, and as I seemed half interested only, she asked me if I like animals.

I love animals in real life, but that doesn't mean that I love watching videos of animals on a screen, for me it's not the same experience.

The main goal that I see social networks addressing with many people is entertainment, but I feel that it takes away from my social life with people, that's why I have aversion to short videos for example.

Facebook didn't have these short videos forced on me, but just a few months ago it started to show video reels for me, and I couldn't stop myself watching them.

After bringing the settings back to just showing what's happening with my friends, the next time I opened it, it started showing addictive videos to me again.

Youtube disabled showing downvotes, which was the best signal for me to see if it's worth to see a 15 minute video, or if it's just a spam talking about nothing for 15 minutes just to optimize content length for the algorithm. And shorts are not a solution, as they don't have any depth either. Also it started spamming me with new channels that reupload videos of interesting people from years ago just to make me think that it's new content.

The main problem for me working on social networks is that making profit can't be separated from making people addicted to the lowest forms of entertainment (I'm a Xoogler).


> The main goal that I see social networks addressing with many people is entertainment, but I feel that it takes away from my social life with people, that's why I have aversion to short videos for example.

Having a different idea of fun than another person is very common. My partner and I, whom I share a lot of personality traits with, have some different definitions of fun. I can spend hours working through math problems and find it fun while plenty of others would find that horribly boring. To _judge_ other people for having a different idea of fun than you is to be condescending; it deifies your idea of entertainment while putting down the others'.

> The main problem for me working on social networks is that making profit can't be separated from making people addicted to the lowest forms of entertainment (I'm a Xoogler).

Is it that hard to accept that other people have different ideas for what being social means? Does everyone have to be like you or you judge them negatively? Your take isn't uncommon on certain parts of the Web (especially the anti-social-media parts) but I've always found it trite and judgemental exactly for the reasons I expressed above. Not everyone has to think and feel the way you do and that's okay. Humanity is large. Some of us like to go to hyper-commercial malls, some like to go on challenging hikes, and others are at the club dancing and imbibing their hearts out. Sometimes people even change what they like to do for fun as they age! That's okay, it's human.


>Is it that hard to accept that other people have different ideas for what being social means? Does everyone have to be like you or you judge them negatively?

I think this is the wrong question - it isn't about being different, its about not even having a customer to empathize with. I can't empathize with a product, no.

As for users, you can be so unfamiliar with a demographic that you have no conception how to help them. I haven't touched facebook seriously in over 10 years (probably close to 12 now), in the world of manufactured FB problems I wouldn't have the foggiest to start.

Now, ignoring the fact FB is absolutely not in the business of helping their users, the question becomes, can or should I become enough like something I see no value in?

I think the answer is probably "no" - if people are busy walking themselves off of cliffs, I have no moral or ethical obligation to help them walk off cliffs, and I really don't have any obligation to walk myself off of one.


Before I respond to anything specific, I want to be really really clear that I share a lot of your (dis)tastes, and I understand where you’re coming from. I’m not even necessarily interested in convincing you to change any of your options, only sharing my perspective.

My only real feedback, for whatever it might be worth, is that this comment—discussing how you feel about these social networks—invites a lot more sympathy, at least to me, than judging how others value them differently.

> It's not like I don't understand their problems/goals. Just yesterday the person I was spending time with showed a cat video from Facebook, and as I seemed half interested only, she asked me if I like animals.

> I love animals in real life, but that doesn't mean that I love watching videos of animals on a screen, for me it's not the same experience.

I’m also not very interested in animal videos, and even fairly seldom interested in animal photos on social media. Videos I pretty much only watch if I’m pressured, and I’d say I skip by probably 80% of photos too.

And I’m an enormous hypocrite about this (I don’t really think so, but I would forgive anyone for thinking so). I post photos (and again occasionally video) of my pup almost daily. Partly this is because I am absolutely in love with her and love to share insights into our life, and adorable things she does.

Another part is there’s a small but very interested subset of my friends and family who, I’m certain, gets real joy out of seeing these updates and feels more connected with me through them. They take a different pleasure and emotional experience from not just my posts, but others I see when the algorithm surfaces them. And every one of them, like all of us, needs and deserves more joy in life.

My pup and I also have, I believe, a fairly unusual origin story which I intend to write about more at length, but I’ll share briefly here. I will not be offended at all if you decide to stop reading here (if I have any point in sharing this here, that’s my point).

I have a fairly light touch on social media these days, for a lot of the same misgivings you describe. A few years ago, not so much. I was on Facebook a lot, and Twitter quite a bit more.

I’m pretty much a nobody on Twitter, but I became first friendly and then eventually romantically involved with someone who’s what they call “Twitter famous”. That fact wasn’t a particular attraction for me or a romantic goal, and still not of much interest to me other than it makes the whole thing feel still surreal to this day. In all honesty, of all of the times I’ve been star struck… well let’s just put it this way, “Twitter famous” means about as much to me as finding out some wealthy person I’ve never heard of was in the next town over. It’s just what happened in my life.

Anyway, we dated for a while and we eventually moved in together. I moved to a very cold place at the start of a very cold winter. And we adopted a puppy! Like some people do when they’re in love. It eventually didn’t work out between us (the humans, I hope it’s obvious pup is still with me and well loved). Sometimes things don’t work out with romance. That’s okay.

I’m ever grateful that she decided it was best if pup went with me. I’ve since had—and continued to raise, and grow with—the best companion of my entire life. All because I implausibly connected with someone on Twitter of all freaking places at (IMO) its absolute worst most toxic point, and completely despite a zillion reasons it would be even more implausible (some not in detail here, but some will be written when I’m ready to tell a more full story).

Why am I telling you (and others who might be reading), of all people, this abridged version? I don’t know. I hope maybe it’ll be more valuable to you than if I just found some way to send a video of my pup? And I guess maybe because if you (or anyone) finds this story interesting, I hope one of the things that stands out is… several important life events and one incredible bond came out of me just being open to and pursuing a connection that feels still so unrelatable to me.

I’m not saying you should ask me for a pup video (but you can!), but just… okay yeah the actual point. People are experiencing a much more complex world of social media than you might see. Give them some grace, please?


I was not planning to respond more in this thread, but you are trying to not argue with me, but rather being constructive, which I appreciate.

Facebook originally started as a social network where I could see my friends' lives, babies, family life, puppies, and it was awesome and interesting, as it's something that I could talk with them about.

The problem with looking at friends' lives from Facebook's point of view is that it's not as addictive, and ads don't integrate naturally.

I loved the book, ,,Chaos Monkeys'', as it explained how impossible it was to monetize the huge active Facebook userbase in the early days, while users were prioritized and making money was just a secondary afterthought that Mark didn't care about.

It turns out that making assymetric relations, where some people/content creators have tens of thousands of followers and hundreds of thousands of views, and not usually friends with (except the case you are talking about) are much more addictive for people, and much easier to monetize as well. At this point though social media gets more media than social, and gets the same problems that other type of media had: creating unhealthy expectation for people whose brains don't really distinguish between what they see there and real life.


Start by removing a heap of condescension and recognizing that there's interesting problems all around you.


Dealing drugs involves solving interesting problems too nevertheless IMHO it is bad for society and I try to avoid both dealers and addicts if possible.


Funny because I've had a lot more positive experiences with and, frankly, respect for, addicts than software engineers.


Couldn't agree more. Anyone who survives and escapes that - including several of my close friends - is a fucking strong person. If I have to read one more comment disparaging addicts and then whining about how sOcIaL MeDiA Is a dAnGeRoUs aDdIcTiOn, I might actually blow my brains out.


This is an unkind comment. Why condescend like this?


I'm not sure why is it condescending. I have met people who do this all the time while I'm trying to just have a drink/dinner or get to know a person, but I'm just not able to connect with heavy instagram/facebook users. I prefer not to see them again, as for me it's very boring, and I see them as being unkind, when in all fairness they are probably just social network addicts.


It's condescending because you're focusing on the media that people use rather than what they have to say in life. There are (obviously) millions of interesting, passionate, intelligent, creative people using both of these apps.


Keep the holier-than-thou judgmental stuff to yourself. Your comment adds nothing.


> I got the impression there were lots of bright engineers that wanted fb pay without having to touch fb products.

I chose a job at Google Engprod (a tooling/developer productivity org) because I specifically like to work on dev tooling and I'm much less interested in working on products aimed at end users. Surely, many engineers at FB feel the same way.


As a facebook employee, and a former google employee, I prefer google's SCM system and tooling instead..

Cider, Piper, CitC, Blaze, etc make a really complete system. Nothing beats it.

This is kind of also surprising because product speed at Fb is much better :)


Without a doubt, CitC/Piper/Blaze/Critique/Code Search is what I miss the most from Google. As you've said it really feels like a complete system. I'd pay decent money for a hosted version of it.


I would pay for that, too, but only if it was provided by another company (not Google).


Did you get a chance to try cider by any chance? I miss it pretty often


I thought Cider was all right. It’s useful and good enough, but I still preferred VS Code even though it’s not as integrated.


Isn't buck a replacement for blaze?


It is, more or less equivalent, but buck integration with piper/citc/cider makes it currently more comprehensive than fb infra. Fb is getting there but much slower. So i was more or less talking overall...


FWIW this is also true on the design side. I've researched a lot of design teams and their custom tooling (I worked on design tools at Atlassian and now am working on them at Figma). Facebooks is leagues above the rest. I've been blown away at what they do there. Some really amazing engineering and design thinking happening on Facebook's internal tooling.


I'd love to know concretely what this means. I don't feel like I see 100s of design related stuff coming out of FB so it's hard to imagine they even need design tools. Maybe I'm just not aware of all of FBs products and world class design.


Not OP, but the first thing that comes to mind is https://origami.design/


Look around on the page, couldn't find the info. Does it use React Native underneath?


It's a prototyping tool, I don't believe it has any codegen facilities. It used to be a Quartz Composer framework and then it evolved in to a standalone app.


Do you know how this compares to Microsoft's efforts to scale git via the Scalar [https://github.com/microsoft/scalar] project?


It's powerful, but fbcode is sloooooooow.


How so? I never thought of it as slow.

Source: Was a Meta infra engineer until last month, working in CDN & LogDevice, among other teams.


What else have you used? Source: Meta infra PE until last month, but have used build and source system at Amazon and found it faster.


Yeah the fbcode build system was hella slow. A minute plus just to build the "action graph." I did marvel at all the lost productivity, especially since an minute plus is enough time for me to context switch to another task, or at least go get (another) cup of tea...

Mercurial / Eden were pretty fast though. Never had complaints about them.


you should have tried buck2 ;) instantaneous


Now all they need to do is build https://www.edenhub.com and I'll move my personal projects. I have never grokked git, despite multiple attempts.


> code review workflows

I see nothing about review workflows in Eden’s README. Any pointers to example tooling — Presentations, code, screenshots — in the wild?


A Mercurial compatible SCM (not sure if it is a fork) built for their workflow (monorepo) and scale (enormous, git is not usable at their scale, at least for a monorepo). Uses Python and Rust. Designed for efficient centralization rather than decentralization.


So if you are at Facebook's or Google's scale, and also run a monorepo, this will be great for you.

Which is to say, this is a product for one company - Facebook.


To their credit, I don't think its use-case is limited to monorepos. I've personally had a multiple GB `.git` file due to storing many data files in a repo. In retrospect, data __shouldn't__ be version-controlled, but it's sometimes the simplest solution, e.g. having a unit-test suite intake a bunch of CSV data. Eden's "a file is checked out only if opened" and "scan only modified directories" would've allowed me to avoid decoupling data from code.


Data should be versioned if it’s one of your inputs, even if you can’t merge it. It’s just that the gut tooling for it (including git-lfs) is horrible.

Perforce (and apparently Eden) make it usable.


data shouldn't be version controlled? Are you saying all the AAA game studios that version control their assets in p4 are doing it wrong?


Data can and should be versioned, but not by just `git add BLOAT`. Take a look at https://dvc.org/: blobs are uploaded to a S3 compatible blob storage, metadata is versioned in a config file and this one gets versioned in git


This. I've got a 35gb repo for the game I'm working on mostly solo.


Or Google, as per your first sentence :)


Google already has its own [=


Would kill to have piper open source though.


I would be interesting but basically useless.

- Requires a custom kernel to run. Although most of these patches are probably floating around the kernel mailing list in some form or another.

- Requires Google's RPC and authentication system. (I guess GRPC is open source now, IDK if piper has switched, but you still need auth)

- Requires Google's group membership seefvice.

- Requires Google's storage engine. I don't remember if they have migrated to spanner yet but even then it would be using internal spanner APIs not the cloud ones.

There are probably also more less obvious ones but the point is that when Google writes software to run in Google production it is based on top if a mountain of infrastructure. I'm not sure the design of Piper is in any way novel enough to be worth it. I mean piper works and scales well but I don't think it is a fantastic VCS.


Why wouldn't it be great for other companies that run a monorepo? Would it be nuts to go from some other monorepo (like Subversion) to this?


I wouldn’t think there would be many companies running repos as large as Facebook.

If you use a tool like this you would basically be on your own. If you “stick to git” (or mercurial or whatever) at least you have all that momentum behind you, and you almost definitely won’t be the first people to encounter a problem.


You don’t need to have millions of files to have hg performance that leaves you wanting. A few GB of repo should do the trick.


Can't wait for the "We switched to Eden from Git" posts in 6 months.


Which will result in lots of people thinking "FB does it! It must be the future! We have to do this as well, or we will be left behind!" sigh.


It’s interesting that they prefer to develop such a tool rather than giving up on the monorepo concept.


One very counterintuitive truth of large scale software development is that as you scale to multiple services, you are gravitationally pulled into a monorepo. There are different forces at work but the strongest at this scale are your data models and API definitions.


The problems with data models can happen at small scale. I remember the first large-ish project that I built, it had a few different components but the problem started when I tried to introduce data models to the Java and Python parts (I had it in my head after reading a book that I needed domain objects or some other nonsense in each language)...mistake, the data was still changing, it took forever to make changes, wasn't critical but I learned my lesson very quickly.

One perspective on this is that many ORM libraries don't take the DB as the source of truth (one very good ORM library that does, and which saved me in this case, was JOOQ). I think a lot of small-scale problems could be solved this way, monorepo is just another variation of this solution: moving the source of truth into the repo.

It is surprising to me how often variations of this problem come up. Obviously, there are solutions from multiple directions: having cross-language definitions (ProtoBuf), Arrow (zero-copy abstractions suitable for high performance), maybe even Swagger which comes at the problem from documentation...but I think this problem still comes up anywhere (and the DB approach is, imo, a very strong approach with a decent ORM at smaller scale).


Does your schema for data models (inception!) have a revision associated with it? If not, deployments are going to be spicy. If so, you end up having to deal with version-rot. Part of why putting this in the repo with your source code is a winning solution is that when you're working off head, you will naturally pick up and test the latest thing, and in most cases your next deploy will also just naturally roll forward as well.


Is it? I'm working for a small company and even at our scale, when the workflow is centralized, I find git a bit painful at times. I mean it's still an amazing tool, don't get me wrong, but when you have to deal with several sub-projects that you have to keep in sync and need to evolve together, I find that it gets messy real fast.

I think the core issue with git is that the submodule thing is obviously an afterthought that's cobbled together on top of the preexisting SCM instead of something that was taken into account from the start. It's better than nothing, but it's probably the one aspect of git that sometimes makes me long for SVN.

At the scale of something like Facebook you'd either have to pay a team to implement and support your split repo framework, or you'd have to pay a team to implement and support your monorepo framework. I don't have enough experience with such large codebases to claim expertise, but I would probably go for a monorepo as well based on my personal experience. Seems like the most straightforward, flexible and easier to scale approach.


If your company is small, I don't think you should be using git submodules at all.

My last place was about 10 years young, 150 engineers, and was still working within a single git repo without submodules.

There is a non-zero amount of discoverable config that goes into managing a repo like that, but it's trivial compared to the ongoing headaches of managing submodules, like you suggest.


We need to track large external projects (buildroot, the Linux kernel for instance) so the ability to include them as submodules and update them fairly easily is worth it IMO. If you're at the scale of Google it probably makes vastly more sense just including the code in your monorepo and pay a bunch of engineers to merge back and forth with upstream and have the rest of your team not worry about it, but for us it would take a lot of time and effort to maintain a clone of these projects in a bespoke repository.


We have customer IP that not everyone is allowed to access and has to be deleted after the project is done. We use submodules and IMO it sucks but I don't see a way around it considering the restrictions.


For extremely small companies (N == 1) git submodules can be neat though. It’s a great way to create small libs without having to bother distribution through LuaRocks, npm, RubyGems and the like.


Submodules are a great way to break out libraries in a language-agnostic way without having them really be broken out. This is independent of team size.


Dan Luu wrote about monorepo. It's worth a read https://danluu.com/monorepo/


You need good tooling to work with large monorepos, you need good tooling to work with large multirepos. Neither option is easy at that scale.


Do Facebook and Google literally have repos with everything they write in there available to everyone that works there (modulo privileged stuff)?


For a little more color on your modulo, the major omission in google3 I can recall from ~9 years ago was Android. For Reasons, I think legal.

The others weren’t “oh huh” enough to be easily recalled writing this comment, which probably speaks to their interestingness. But yes, you can chdir from search to calendar to borg and their dependencies, internal and vendored. It’s pretty much all there. It was pretty splendid, actually, and influences my thoughts on monos to this day.


Not quite, but almost.


the monorepo is handy: simplifies dependency management


But it ads complexity and creates its own issues.


Monorepo (on git) has been awesome for us the last 5 years or so.


The docs still refer to the the tool as Mecurial/hg:

https://github.com/facebookexperimental/eden/tree/main/eden/...


This doesn't surprise me. It's a fork, nobody bothered to update the readme, and as much as folks wanted to update things like documentation, improving the software was a higher priority.


It says it was originally based on Mercurial, but is no longer a distributed source code control system. Are you sure it's still compatible?


I thought Microsoft was working on a "mod" to git that made it work on huge repos, e.g., the Windows source. Did that ever come to fruition?



you mean git lfs? That is alive


Facebook/Meta’s dev tooling, libraries, and infrastructure has always impressed me. React, Hack, jest, and now this. I am very surprised they haven’t tried to monetize this aspect of their business like Microsoft and Amazon did.

They could be a very strong contender in that space instead of being known as a terrible time suck for humanity.


FB did acquire Parse, a mobile-oriented PaaS (in a roughly similar market segment as Google’s Firebase).

It was shut down when FB apparently realized that an open cloud platform business is completely incompatible with how they run their infra.


I think that this was more a margin issue, ads were higher margin so Parse got deprioritised, and eventually killed.


I think it's a tall order for another SCM to challenge git. I can't imagine how it could be any more entrenched in the industry.

Further, I'm happy with git. I played with Mercurial years ago, long enough to work with it day-to-day, and just didn't find any relevant advantages versus git.

I love that people are still out there trying to improve things, and certainly don't want that to stop, but it's difficult for me to imagine switching at this point.


I'm also happy with git, but there's 3 main things that could improve on git IMO:

1) Better handling of large files than git-lfs. As in 10+ GB repos. This is needed for game development (currently they tend to use Perforce or PlasticSCM)

2) Sparse checkout via file system integration (like Eden has)

3) Build system integration, so unchanged files and modules don't even need to be fetched to be compiled, because cached builds can be fetched from a build server instead (requires proper modularization, so e.g. C++ macro expansion doesn't just prevent anything from being cacheable)

These are all features that primarily have value for repos that push the limits on size, like big monorepos (with a huge amount of files) or game development (with big asset files). But get it right, and you could massively cut down the time it takes to check out a branch and build it.


This is by no means a perfect match for your requirements, but I'll share a CLI tool I built, called Dud[0]. At the least it may spur some ideas.

Dud is meant to be a companion to SCM (e.g. Git) for large files. I was turned off of Git LFS after a couple failed attempts at using it for data science work. DVC[1] is an improvement in many ways, but it has some rough edges and serious performance issues[2].

With Dud I focused on speed and simplicity. To your three points above:

1) Dud can comfortably track datasets in the 100s of GBs. In practice, the bottleneck is your disk I/O speed.

2) Dud checks out binaries as links by default, so it's super fast to switch between commits.

3) Dud includes a means to build data pipelines -- think Makefiles with less footguns. Dud can detect when outputs are up to date and skip executing a pipeline stage.

I hope this helps, and I'd be happy to chat about it.

[0]: https://github.com/kevin-hanselman/dud

[1]: https://dvc.org

[2]: https://github.com/kevin-hanselman/dud#concrete-differences-...


I'd be curious to see if you've tried git-annex, I use it instead of git-lfs when I need to manage big binary blobs. It does the same trick with a "check out" being a mere symlink.


I haven't used it, no. Around the time Git LFS was released, my read from the community was that Git LFS was favored to supersede git-annex, so I focused my time investigating Git LFS. Given that git-annex is still alive and well, I may have discounted it too quickly :) Maybe I'll revisit it in the future. Thanks for sharing!


Neither is favored, git-annex solves problems that git LFS doesn't even try to address (distributed big files), at the cost of extra complexity.

Git LFS is intended more for a centralized "big repo" workflow, git annex's canonical usage is as a personal distributed backup system, but both can stretch into other domains.

In this case git-annex seems to have a feature that git LFS doesn't have that would be useful to you.


I work in games, and we use PlasticSCM with Unreal Engine.

> Better handling of large files than git-lfs.

PlasticSCM does really really well with binary files.

> Sparse checkout via file system integration

Windows only [0] but Plastic does this. I've been working through some issues but it's usable as a daily driver with a sane build system.

> Build system integration

UnrealBuildTool at the same time is both the coolest and most frustrating build systems I've ever used. It's _not_ a general purpose build system for anyone and everyone, it's tailored to Unreal, it's slow to run the actual build system, the implementation is shaky at times, but some of it's features are incredible. Two standout features are Unity Builds, and Adaptive Unity builds. Unity builds are common now [1], but adaptive unity is a game changer. It uses the Source control integrations to check what files are modified and removes them from the unity blobs, meaning that you're only ever rebuilding + relinking what's changed, and get the best of both worlds.

[0] https://blog.plasticscm.com/2021/07/dynamic-workspaces-alpha... [1] https://en.wikipedia.org/wiki/Unity_build


ClearCase (claimed to) support points two and three way back in the 90s. It never really worked right; it was always trying to "wink-in" someone else's binary, despite the fact that it was built using a different version of the source.


> These are all features that primarily have value for repos that push the limits on size, like big monorepos (with a huge amount of files) or game development (with big asset files). But get it right, and you could massively cut down the time it takes to check out a branch and build it.

Would this be a good fit for large machine learning data sets?

Every time a new model comes out that touts having been trained on something like a quarter of a billion images, I ask myself "how the heck does someone manage and version a pile like that?"


would you ever need to version individual images? at a high level, you could version the collection as a whole by filtering by timestamp, and moving deleted files to a "deleted" directory, or maintaining filename and timestamps in a database. I'm sure there are lots of corner cases that would come up when you actually tried to build such a system, but I don't think the overall scheme needs to be as conceptually complex as source code version control


> would you ever need to version individual images?

Potentially. It would depend on the image format and what metadata is (or can be) included in the file.


Not handling large binary files is a feature from my perspective. Git is for source code and treating it like a disk or a catch all for your project is how people get into scaling trouble. I don't see any reason why versioned artifacts can't be attached to a build, or better, in a warm cache. I get that it's easier to keep everything together, but we can look at the roots of git and it becomes fairly obvious, Linus wasn't checking in 48GB 4K mpeg files for his next FPS.


> I don't see any reason why versioned artifacts can't be attached to a build, or better, in a warm cache

Because now you've got two versioning systems with different quirks and tools that behave slightly different that _must_ be kept in sync.

> I get that it's easier to keep everything together,

Easier is a bit of an understatement. Putting binary files outside of the project in another versioning system means you need two version control systems, _plus_ custom tooling to glue the two together. Also, the people who interact with these assets primarily are not technical at all, they're artists or designers, and they're the most likely to have issues with falling between the cracks.


> Build system integration

It would be a game changer if any SCM actually manages to achieve this without needing multiple developer teams just for maintenance.

That would probably break gits current dominance within a relatively short timeframe... But I don't think it'll happen within the foreseeable future,


If you want an SCM with direct build system integration there's always GNU make with its support for RCS :)

More seriously, can you describe what "build system integration" would look like? Basically like what GNU make does with RCS? I.e. considering the SCM "files" as sources.

How would such a system build a dumb tarball snapshot?


At my work we have such a system for our monorepo. Basically for each compilation unit (basically subdirectories), it takes the compilation inputs and flags etc and checks them against a remote cache. If it matches, it just pulls the results (binaries but also compiler warnings) from the remote instead of compiling anything. A new clean build is made every few commits to the main branch to populate the cache.

In practice it means that if I clone the repo and build it doesn't ever invoke the compiler, so that it takes only a few minutes instead of an hour.

Bazel has a similar system but I haven't used it myself,

https://docs.bazel.build/versions/main/remote-caching.html


That seems really useful, but how is this different from what ccache and various other compilation caches that cache things as a function of their input do?

The GP talks about "build system integration" being "a game changer", and I can see that being a useful thing for e.g. "priming" your cache, but that's surely just a matter of say:

    ccache-like-tool --prime $(git rev-parse HEAD:)
I.e. give it an arbitrary "root" key and it would pre-download the cached assets for that given key, in this case the key is git-specific, but it could also be a:

    find -type f -name '*.[ch]' -exec cat {} \; | sha1sum | cut -d' ' -f1
Or whatever.


> how is this different from what ccache and various other compilation caches that cache things as a function of their input do?

Vertical integration, basically. There is value in these things working, and working well. As an example of the issues with ccache etc, I've yet to find a compilation cache that works well on windows for large projects.


I've never worked on a project with a large remote ccache but I would guess it would be pretty much the same yes.

The "automation" of our in-house system is what really makes the difference but then again we have a team of developers that focus on tooling so it's not so much automated as it is maintained...


Is (3) doable with the abstractions over Git that Git{la,hu}b provide?


Coming from Darcs, Git has a horrible user interface. Something like Jujutsu has a chance to disrupt Git.

It can use the Git data store, so developers can in theory start using it without the whole team adopting it. Then it addresses the big problem with Git, the user interface:

https://github.com/martinvonz/jj

I'm not suggesting that "jj" in particular will disrupt Git, but I think eventually the a tool which supports the git data store with a better user interface could take hold.


Git usage for most developer is 3-4 commands or once in a blue moon when they fuck up badly save a copy of your changes and reset hard. There aren't enough user interface improvements possible to get people to switch.


If you go to work for Google or Facebook, it quickly becomes apparent that switching SCMs is much cheaper than trying to use ordinary Git or Mercurial at scale. (Though it is clear that Google, Facebook and Microsoft are all trying to maximize the amount of familiarity and not reinvent the wheel too much; they all have been working on tools either building on, utilizing, or based on already existing SCM tools.)


This looks like it’s supposed to be more appropriate for very, very big repos. Which current Git doesn’t support and isn’t fundamentally designed to support.

So rather than use Git + Git Annex or something like that (maybe more), you’ll just use this alternative SCM.

(I keep hearing about how Git will eventually support big repos, but it’s still not on the horizon. Big repos with Git still seems to be limited to big corps who have the resources to make something bespoke. Personally I don’t use big repos so I have no skin in this game.)


Big repos work in Git today if you are able to play the sparse checkout dance. There are definitely more improvements to be made.

- Protocol for more efficiently widening sparse checkouts.

- Virtual filesystem or other solution for automatically widening sparse checkouts to greatly improve the UX.

- Ideally changing sparse checkouts from a repo state to just some form of lazy loading. Otherwise as you touch more files your performance will slowly degrade.


A SCM like https://pijul.org/ would be a challenge to git. Eden is more a challenge for things like perforce.


Yeah I think the directions are quite different. Both are improving user experience (either commands/behaviour with pijul or speed with Eden) but pijul is distributed with some efforts to improve algorithms but a bigger focus on improving semantics (making it more natural in some sense and more correct in another) whereas Eden is more centralising (the thing large companies want for their repo is branching not decentralisation, but DVCSes give the former mostly via the latter (I get that branches are a first class thing in git but much of their technical implementation can follow from things git must have to be distributed)) with a focus on massive size.

One thing I recall was an effort from pijul to make blaming a file run in O(n log h) where n is the size of the file and h is the size of the (repo or file, I’m not sure) history. I wonder if Eden will also have improved blame performance. I noticed they mentioned large histories but maybe it is still linear in the history size of the file. (The way hg works gives I think O(nh) where h is the size of the file history, which he stores per-file rather than per-commit like git).


The biggest differentiation for me between Git and Mercurial is that Mercurial is far better for code reviews because it manages stacks of "as small as possible" changes much easier. The git workarounds I've tried to replicate 'hg histedit' and 'hg absorb' are ... not good.

Similarly, I think Git(hub) has succeeded in open source because bundling more complete changes into PRs works well for unevenly distributed contributions.


I used Meta's Mercurial, having previously used primarily git (and SVN, and CVS before that). It has a number of very cool improvements over git, and it's well integrated into the rest of their infrastructure.

You know the feeling of having to use SVN after using Git? This is what it feels like to use Git after getting used to Meta's Mercurial. I wish I could go into the details, but I don't know how much of it was ported back to Mercurial.


I don't think it's trying to compete with git, it's not decentralized or meant to support big distributed open source project development. This looks like a nice tool for Big Company to manage its internal, private code repositories.


The decentralized part of Git and Mercurial is nice (eg. no need for Github et.al), but I think most software projects using Git or Mercurial do have a centralized server/hub...


I've been keeping an eye on Sturdy [1], a more performant, high level, and opinionated version control system. As a bonus, it seems to be compatible with Git.

1: https://github.com/sturdy-dev/sturdy


I would just love it if git could handle large files, it causes a lot of hacky solutions right now


OK, slightly off-topic but maybe the right minds are here. We have been developing an introductory CS curriculum committed to thinking-with-powerful-tools, including the command line, real programming languages, and git. It's great until it isn't. We intentionally maintain a simplified workflow, but still get the occasional merge conflict or local state blocking a pull. I keep thinking there must be a simplified wrapper over git which maintains the conceptual power while avoiding the sharp edges, even if at the cost of robustness. I'd be more interested in an abstraction than a GUI, but would be interested to hear whatever others have come up with.


How about git-compatible https://github.com/martinvonz/jj ?


The git user interface just sucks. hg supposedly has a better UI, and darcs is apparently better again (except sometimes merges could run in exponential time). Pijul is meant to give you darcs-like ui with good performance. But none of those things are fit which is maybe important to teach.

One possibility could be to use some kind of git ui. I only know about magit (which is built on/in Emacs) but I’m sure others exist.


Can you show a step by step list of git commands where you find the interface to be bad?


Have you checked out Sturdy? It's a git-compatible VCS, with a much simplified UI and workflow.

https://getsturdy.com/


The biggest problem I have with Git is the strong commit ordering. This leads to lack of tracking through cherry-picks which has very real friction for a fairly common workflow.


it solves scale issues that git can't solve at the moment. fb monorepos are huge. so for most people/companies this issue is not critical to solve and git is great.


Actually Git isn't quite so entrenched as you think. Perforce is still the norm in the games industry, for example, partly because it's more artist-friendly.


I don't think them putting it out is an attempt to compete with anything, they're just putting it out to put it out.

If they are then I totally agree.


It looks like Google's Perforce/Piper, I think that would be a better comparision than Git, but I don't see any data on how the two compares. Anyways, Eden being open source is a great advantage already.


Never forget though, that being open source doesn't mean that much, when there is only _one_ party deciding, what goes into the code base.


It's GPL 2.0, on github, which makes it ridiculously easy to fork, tear apart, debug, etc, given your time and ability.

That said, I would like to see it self hosted. I think then and only then will I give it a serious look.


>that being open source doesn't mean that much, when there is only _one_ party deciding, what goes into the code base.

Being open source doesn't mean I have to maintain your code contributions.


Being opensource does mean it’s free. And can dig in and read the code. And if you want to, fork it and add your own features.


That's just cathedral style free software.


Does this do anything special for handling binary data, or is it mostly for text like git? I've heard that Perforce (another centralized SCM) does a good job with large binaries.

I love git just as much as the next guy, but git-lfs sucks.


I worked at Meta and used Eden. I remember it as a virtual, FUSE based file system for a huge monorepo. Basically, if you have GBs of data in your monorepo, and any individual user accesses < 1% of it, then there's no point in having each "hg update" update the other 99%.

But we were explicitly dissuaded from having binary data. I worked on image processing, and wanted to store test images for unit tests in hg, but the consensus was, that was a bad idea. So we stored them in a separate data store, and had a make rule to fetch them as part of the build.


Git only really has two problems with binary files:

1. The take up a lot of space because the entire history needs to be downloaded with all past versions (andany types of binary files completely change when updates so delta compression doesn't help much)

2. The are slow to update on checkout (not much of an issue if they don't chance much).

Basically any solution for large repos will also solve Git's "binary file problem" because in order to allow large repos you need to allow shallow and partial checkouts as well as efficiently updating the working copy (usually via a virtual filesystem).

TL;DR Git doesn't have a binary file problem, it just has a big repo problem. Binary files are often mentioned because they are the quickest and easiest way to get a big repo.


A similar effort was made by Microsoft a while back to scale up Git in order to use with the Windows monorepo: https://devblogs.microsoft.com/bharry/the-largest-git-repo-o...


Junior engineer here, so got isn’t used for the windows repo? I assumed git was the holy tool or all version control.


I thought this a few years ago when I was more junior as well. In open source circles git has been used for quite awhile but my company just recently moved one of our repos from TFVC to git. A Fortune 500 company I worked for was still using TFVC for some legacy products. Another product from a acquisition used Subversion and the migration to git (even with the company using GitHub Enterprise) still took almost three years.

Getting the inertia amongst developers to migrate to a different SCM can be quite the challenge.

Edit: initially said migrating to git was a challenge. But really it’s migrating from any SCM to another that’s challenging.


There are a lot of advantages to using git, in part because it was developed for a massive project (the Linux kernel) so it's very thoroughly tested, in part because it's exploded in popularity so a lot of a tools, integrations, and documentation are available for it. However, it's not the alpha and omega of version control. Some people prefer the interface of Mercurial, some people use Fossil because it integrates issues and wikis, others have stuck to older tools like Subversion, and still others are experimenting with new approaches like Pijul.


git was created in 2005. No idea what MS uses, though some history is here: https://en.wikipedia.org/wiki/Microsoft_Visual_SourceSafe#Mi...


Microsoft used their own fork of Perforce for a long time. Not sure what they use today.


Source Depot is what it was called. IIRC, Windows was moving to Git (through some virtual file system [0]) and had some major teams actively using it, but that was a while ago too.

[0] https://github.com/microsoft/VFSForGit


Unfortunately their open source support is very little to non existent.

It's great, and it doesn't work outside Facebook.


Facebook has a terrible track record when it comes to open-sourcing their internal tools. See: Phabricator, HHVM, Flow, Jest, ...

Even React, which is their most popular library, is not actually "open source." They're very transparent about the fact that their priorities are Facebook's needs -- even if they do take community input.

None of this is per-se bad, but you should definitely treat an open-source project out of Facebook with skepticism when it comes to adopting it for your own use cases (possibly making sure you're not too locked in when an incompatible v2 comes out with virtually no warning after FB's internal implementation drifts).


This is not a fair characterization of React. The tech lead of the project doesn't even work at FB anymore.


> Even React, which is their most popular library, is not actually "open source."

How do you define "open source"? It typically simply means the source code is available. By any definition I can think of, React is definitely both free and open source. How they design the software or if they take contributors isn't really relevant.


> How do you define "open source"? It typically simply means the source code is available.

I agree with you that react is definitely open source, but I'd also encourage you to use more specific wording around what "open source" means.

I think wikipedia gets this right: https://en.wikipedia.org/wiki/Open-source_software

"Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose"

The license is the key bit.

On the other hand, there's "source available" software (also on wikipedia https://en.wikipedia.org/wiki/Source-available_software ), which is what your definition equates to, and I personally don't want to see confused with open source or free software.


There's three domains where people usually use the term "open source".

- Freely licensed software (eg: MIT, GPL, etc)

- Code visibility (eg: ForgeRock)

- Community focus/contributions

Not all "open source" implements each of these, and just because they implement one and not the other doesn't mean they're not expressly open source. Just some ecosystems are more open than others.


React is MIT, which is pretty much one of the least restrictive ones, granting basically no rights to the original publisher.

That Wikipedia definition sounds more like what typically gets described as "Free and Open-Source Software"/FOSS, no?


I'm aware React is MIT and of the various licenses etc.

As I see it commonly defined, "open source software", FOSS, and FLOSS all mean the same thing more or less. That the project uses an OSI approved license, or one very close to it, whether it's MIT, Apache2, or GPL.

"Free software" is the only of the phrases that I see having two competing common definitions, the "free as in money", and "free as in Free Software Foundation's definition of free software". This seems pretty understandable, since "free" is overloaded.

I only infrequently see people mixing up "open source" and "source available", and that's the specific thing I'm trying to discourage people from mixing up. I think keeping those terms clear, and especially calling out "source available" software as _not_ being "open source" (i.e. not granting you the freedom to modify it or run your own copy in some cases) is important.


I see the opposite often argued too - that open-source is too wide a term as could also be understood as source available, and that FOSS/FLOSS should be preffered. But you're right, looking at most literature, most people seem to refer to oss and foss to be largely the same thing. I guess my biases are showing lol

Coming back to the original argument - which was that React was not truly open-source - being MIT, it 100% is, so I still don't understand it. That they prioritize their own needs for feature development is pretty much irrelevant, the source is there and you have permission to fork, tweak and publish changes on your own at any time. You legally are in your own right, but they don't have to make it easy on you.


Free Software == my project's code, to others

Open Source == other peoples' contributions, to my project's code


I have not seen this distinction commonly.

Can you link to a reference?

As I understand it, "Free Software" is the term the FSF and general hacker community settled on for licenses that preserve user's freedom to modify and redistribute source code.

O'Reilly etc shifted to the term Open Source as part of making the idea less associated with "hacker culture", and more associated with businesses (as described here: https://en.wikipedia.org/wiki/Open-source_software#End_of_19... )

From there, I see "Open source" being slightly more often associated with companies or younger developers, and "free software" more often being associated with the GNU project, copyleft projects, etc.

I'm curious if you have references or more explanation about the difference you're trying to draw, since it's one I haven't seen before.


> I'm curious if you have references or more explanation about the difference you're trying to draw, since it's one I haven't seen before.

My distinction between the two is whether outside contributions make it back into the original project. Free Software is about the rights of end users to inspect the code and make and distribute their own modifications, but then Open Source takes it a bit further by explicitly soliciting contributions with the ostensible aim of building a better project through cooperative labor than an individual programmer could build alone.

In practice though "Open Source" has turned into unpaid project management work for billion-dollar corporations, bitter disputes between contributors over conflicting standards of morality, technical visions in constant flux as contributors come and go, and endless bikeshedding about semantic version numbers / code style guides / other things that don't matter. For years I thought I was totally burned out on Free Software and walked away from all of it, but what I was actually burned out on is Open Source and have been able to love programming again by working on things that are explicitly "Free Software but not Open Source".

The `actix-web` drama a few years ago is a perfect example, when a huge crowd of onlookers felt morally justified excoriating a popular project's creator / maintainer for not managing their project to the crowd's standards: https://steveklabnik.com/writing/a-sad-day-for-rust


I don't think your distinction is actually part of a common world view.


>Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose

??? Isn't React licenced under the MIT licence? It seems to me that it tickes all boxes?


As I wrote at the top of my comment. "I agree with you that react is definitely open source" (and no, I did not edit that in, it was there when you read it)

I'm aware react is licensed under the MIT license. I was just talking about how the parent comment chose to define "open source".


nowadays, it is expected that open source is also a synonym for “community-driven”, which is a very bold assumption.

I don’t know how we got to this point but it’s interesting to notice that the terminology drifted.


By that definition, any software project that is driven by a BDFL wouldn't be open source, including Linux.

The terminology hasn't drifted that much. I think there is a small and vocal group of people who are trying to take back "open source" by reframing what it means, so as to exclude corporate projects. It doesn't make sense to me.


>any software project that is driven by a BDFL wouldn't be open source, including Linux.

There are certainly a push in that direction. And that you need a group of people, core or council to be considered as Open Source. And Linux has that.

>The terminology hasn't drifted that much.

It depends how you measure it, but Twitter and HN are at least two places where lots of developers are suggesting Open Sources equals to Community driven. And yes, there are also some movement towards MIT, BSD or Apache 2.0 as being not considered as Open Source because they do not contribute back changes. Although that hasn't gotten any traction. ( yet )


I was probably too implicit but when I wrote "which is a very bold assumption ", I also meant that I am in disagreement with the statement.

For me, open source and community-driven are not similar and I don't understand why people seem to expect it.


Everyone is going on tangents, but yes React is as open-source as they come. What people are conflating are the notion of community-driven and open-source.

React is a successful Facebook OSS project. An example of one which went poorly is Thrift. Facebook open-sourced it and then internally used fbthrift which diverged drastically. OSS Thrift isn't that popular these days any more.


Hive is another good example of the same issue.


> How do you define "open source"? It typically simply means the source code is available.

that is not what that means. "source is available" and "open source" are very different.


Disagree. We need to stop trying to cram meaning into phrases that are already defined. Open source means the source code is freely available. Adding anything else requires a different name.


"open" already has meaning.

> Disagree.

well, fortunately your agreement is not required. "open source" already has a definition and it requires openness.


Free Software is an established term, and React certainly is not Free Software, due to its patent. I very much doubt that React is Open Source either, for the same reason.


So react is MIT-licensed and has a patent. How do those two work together? If I modify React's source-code might I not infringe on the patent and then get sued by Meta?


I’m not a lawyer, but from my perspective, that’s indeed a concern. And perhaps you could get sued just by using React. It could differ between jurisdictions as well.

A standard open source license with a patent grant, like the Apache license, would have been a lot clearer, but Facebook has so far refused to license React in that way.

A problematic patent grant was offered for earlier versions of React but that’s not the case anymore (and didn’t really fix the problem anyway).


I'm not a lawyer either and I wonder about the scope of Apache patent grant. Does it give you the right to use any patents the software in question "uses" in any possible context? Or does it simply allow you to modify the software any way you like and not get sued for infringement? But then how much can I modify it and still retain the right to use those patents? I mean if I create a totally unrelated software package which however shares some code with the original work, can I keep on using those patents anyway I want?


No, open source means that the software is free software.

The source code to Windows is available.


Urgh. Typical reminder never to let engineers name things. Whoever thought using "open source" for something else than "source is open"...


What's wrong with Flow, Jest or Graphql? I think these are all fantastic projects. I mean, Flow "lost out" to Typescript, but, it's usual for one winner to emerge from competing frameworks.


Seriously. FB chose to open source these things. They could have kept them private but wanted to give back where they could.

IMO that should be celebrated, not shitting on fb for their poor track record of going out of their way to open their source code to the public.

I hate meta as much as the next, but come on.


Graphql is great. With Jest, that project feels a little abandoned because the Typescript support (ts-jest) is pretty janky and has bad performance. Meanwhile in the ecosystem in 2022, it's becoming the norm to have first-class Typescript support.


React is MIT-licensed. It is, of course, open source. If you don't like their decisions, fork it.


React is patented and Facebook is actively choosing not to offer a patent grant, so unfortunately that’s not the whole story.

A fork may not be an option. Perhaps a given organization may not even be allowed to use React, if Facebook decides against it for some reason. Jurisdictions can differ as well.

In my opinion, the best thing would be if Facebook simply made the terms clear by using the Apache license or similar. But hey, it’s Facebook, so I’m not expecting much…


>React is patented and Facebook is actively choosing not to offer a patent grant,

By that definition any "Open Source " MIT or BSD licenses that does not offer patent grant has problems.


What patents does Facebook have on React?


I'm actually curious what the strategy is here. To my knowledge only FB, Google and MS do megascale monorepo, and Google and MS already have a solution. Are there now other companies outgrowing Git that Facebook is hoping to build a community with?


Stay tuned ;)


A bit skeptical when there are relatively recent commits like

> Re-sync with internal repository


Open sourcing an internal repository with extensive, ongoing work on it is always a difficult affair, because you're creating a second source of truth. (It isn't just how you manage external contributions, but also workflows like releases and CI.)

I wouldn't consider this to be a problem.


It means they haven't (yet?) transitioned to open-first and it'll require proving themselves that they'll do open-first development before trusting them. Not willing to bet my work on a product where the governance isn't open but everything is driven by and for a single company's needs.


For what, another Phabricator, that I’ll inevitably have to migrate my company away from again?

So if history serves the next announcement to watch for is your departure from Facebook and the launch of Edenity, which will be sunset and abandoned inside a decade once it fails to IPO. Am I close?


vaporware?


Proceeds to book edenhub.com


eden.garden is for sale :)


> EdenFS speeds up operations in large repositories by only populating working directory files on demand, as they are accessed.

but if i grep through the source code then it will download all of the stuff?


Yes, generally for a large codebase you will have a separately code search tool like Google's Code Search or Sourcegraph which are super fast for large amounts of code.


Hm. I can see why they'd build some of these features, but there's some significant downsides. The VFS in particular will end up a poor experience when a transitory network problem causes apps to hang when pulling code. What happens if you 'grep -r', or if 'mlocate' indexes it?

On the build side.... holy jesus, are they really compiling 40 different dependencies from scratch every time they push code? This build has been running for 5 minutes and it's still just compiling dependencies: https://github.com/facebookexperimental/eden/runs/5997101905... Come on, ya'll. You're supposed to be the "advanced FAANG people". Cache some build deps, will ya?


I’d be surprised if it was the case that the build system there is just cobbled together for the OSS version, and likely quite different from what they actually use at FB.


One thing to keep in mind for how development at larger tech companies works is that you’re often not building on your own desktop, you’re usually building on a development server that’s on a well-connected (effectively production-quality, if not literally the same) network. You don’t see a ton of drops in those cases, so it works well. Not that there hasn’t been effort to recover from networking issues encountered in this and other build tooling - at scale, someone’s development server is going to have a bad day every day.

You also need much better tools than grep and locate for a monorepo - or any sufficiently large repo probably. Just load the full repo into memory in a few places around the world, and use an API to find the text you’re looking for. If you already have expertise with search services in your company, this is not that challenging a step - and you can get fancy by using something like Tree-sitter to make those searches more advanced than text. Hitting disk (especially for whole directory trees for “grep -r”) is a losing approach in a large repo.


Is there a FUSE filesystem for Git that takes a similar strategy to edenFS?

Only setting up files and folders as they are requested might be very helpful with various git monorepo access patterns. Maybe there is something inherit to Git's design that makes this less practical.


https://github.com/microsoft/VFSForGit

Which seems to have been superseded by https://github.com/microsoft/scalar according to the README.


tl;dr: automatic sparse checkouts for massive monorepos



How is the URL still bad in this story after 9 hours?


they use git (and github) to maintain their own source control management system?


No, they have a system similar to Google's copybara which exports this repository out of their internal monorepo.


nothing would be wrong with it if they did. A control management system suited for gigantic monorepos isn't itself necessarily a gigantic monorepo.


So, instead of breaking their repo up so git is more performance, they forked mercurial instead?

Madness


Why do you think this is madness? FB doesn't use git for these monorepos, so that's not really relevant, but I don't understand why you think that it's better to break a repo up because the SCM can't handle the size vs fixing the SCM so it can handle the size. I work for meta (nothing to do with the teams working on this though) and I can assure you people have considered the tradeoffs of breaking repos up just to accommodate existing SCMs vs improving the SCMs.. I think if you believe improving the SCM instead of breaking up the code is madness you should probably provide a bit more of a justification


You skipped a step. They actively contributed to mercurial before the full fork.


I wonder if Eden is a reference to https://en.wikipedia.org/wiki/Eden_(2021_TV_series)


The project's name goes back to 2016. Granted, the README called the project a "filesystem" back then not an SCM, but... still, the name seems to predate the series by quite a bit.


I think it's probably from Facebook's culture of pretending it's a force for good when it's a mixed bag like every other Megacorp. Here's an example of what Zuck has said:

> I believe the most important thing we can do is work to bring people closer together. It's so important that we're changing Facebook's whole mission to take this on.

No wonder they have a major project named Yoga and now this...


Pretty sure it's called yoga because it is a flexible layout engine. A joke.


It was called css-layout but got renamed to Yoga because it implemented the “flexbox” layout and not all of CSS. Flex -> Yoga was indeed the joke/reference.


Here's another one: https://facebook.github.io/prophet/

It's used to predict things, but still strikes me as an odd name, just like Yoga.


The comment you're replying to is a perfectly good reason why it's named Yoga, not sure why it'd still seem "odd" with that context. Not particularly witty, but it definitely makes sense.


> it's a mixed bag like every other Megacorp

No, you can't simply dismiss the fundamental problems with Facebook/Meta with whataboutism. Google and Apple and Microsoft are standard mixed bags. Not Facebook.

Facebook is pure evil. It has queued up the complete obliteration of western democracy, which even Rupert Murdoch couldn't quite manage on his own. You can absolutely find non-directly-evil things to at Facebook/Meta, but it all supports the evil in the end.

https://www.theatlantic.com/magazine/archive/2022/05/social-...


Nah, it's older than 2021.


If Eden the TV series was based on a book, the book could be older, but it isn't.

It's also possible, but not likely, and not the case here, that a TV show could be a sensation long before the first episode comes out. I can't think of a time when this happened except for a prequel or sequel like Better Call Saul, where it was much awaited, but I'm sure there are instances of that occurring.

Edit: from another comment, there's this, which came out in 1997 but isn't mentioned in the Wikipedia article for the TV series so I'm not sure it's related: https://en.m.wikipedia.org/wiki/Eden:_It%27s_an_Endless_Worl... There is a Manga mentioned in the article for the TV series but it came out the same year as the TV show.




Given that your link says the show came out in 2021, and the GitHub repo is at least 3 years old, probably not.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: