The SCM ecosystem at Facebook is tremendously powerful and the result of some of the best minds at Facebook working on those systems for many years. From the scaling of the monorepos to the code review workflows, nothing really matches it. The ergonomics of most of the tooling was simply top notch (which it needed to be... engineers, particularly at Meta, are an opinionated lot who don't tolerate poor tools).
After doing a round of onsite with fb tools team, I got the impression there were lots of bright engineers that wanted fb pay without having to touch fb products.
A lot of programmers would rather write tools for other programmers than for non-programmer end users. It's the target market that they know best, and it's most prestigious.
I wish more of those bright engineers would try to solve problems for other users rather than re-re-re-re-re-optimizing the life of their colleagues. But writing code for users involves, among other things, knowing something about users, which is more bogged down and less fun.
I work on developer tools at google, and it's not even remotely one of the prestigious teams. it is however the most fun and interesting time I've ever had in a job, and if I left it would only be for the same kind of work.
also you seem to have fallen for the same fallacy as the people who complain about open source devs not working on what they consider sufficiently important problems. trust me, there are no shortage of engineers willing and eager to work on end user problems, and the dev tools work enables them to deliver solutions to those problems faster, and makes those solutions work more reliably. it's a rising tide that lifts everyone's boats, not a zero sum game.
hard to say from inside the company, since you get to thinking of everyone as just colleagues who happen to be working on different parts of the system. but in terms of larger world prestige I would say the machine learning folks are definitely up there, also the internet/petabyte scale stuff that other developers know enough about the difficulties behind to be impressed by (spanner for instance), and perhaps go since it has achieved a massive amount of popularity. I honestly don't know if there are currently any end user google products "sexy" enough that working on them is extra prestigious.
You're missing webranking, and other core components of core products. Deciding in which order results pop up on google.com is both very challenging and we'll recognized. Same with people that do ad ranking, people that manage the enormous storage systems, etc. If it is core to the company it is prestigious, you never have to explain the impact of your job.
Their colleagues ARE users, they are just a different type, and a type that engineers understand more.
I would rather a good engineer put their skills towards helping others use their skills more effectively than put that same engineer in a place where they don't like the work and produce below par results.
But on the contrary.. many of the things well-built for the majority of 'non-programmer end users' are not the way I (as a 'programmer end user', whether the particular thing is a programming-adjacent tool or not) would ideally like them to be.
So I'd selfishly adjust it to 'solve other problems for the same users'! Asahi, for example, awesome stuff - I've lost count how many times I've had to explain (to both colleagues and non-engineer family/friends/etc.) 'no no, absolutely agree, love Mac hardware [...]'.
I'm very curious what made you say that building tooling for developers is "most prestigious". It doesn't really match my intuition, can you explain more?
I’d say citation is not needed when the person has experience in the field, can cite himself telling you their opinion and that will suffice. Your opinion may differ of course
Considering the state of our tooling, where documentation frequently does not show up without extensive config or is not soft-wrapped in Jetbrains products (which are only possibly second to Visual Studio in overall quality), and projects simply stop compiling and need magic tricks like invalidating caches to work again (and sometimes the trick reduces to reinstall every thing and delete all config), I don’t see how more tooling is so scorned.
For some of us it's really hard to understand / solve problems of people whose life is spent on browsing instagram most of the time - when it's not TikTok.
You’re getting downvoted, and that’s not undeserved because this is very judgmental on its face. But I get the sense there’s a sincere answer to a few sincere questions worth at least asking: have you asked them (users you’ve categorized this way) that direct question? Or asked how work you’re invested in isn’t addressing their problems/goals? If so, and if you found their answers unrelatable, did you ask further questions which might clarify for you what they want?
I know all of that is a lot to ask of anyone. But you might find you learn things you have in common with people you don’t expect to. And it might make your job less frictionful too.
It's not like I don't understand their problems/goals. Just yesterday the person I was spending time with showed a cat video from Facebook, and as I seemed half interested only, she asked me if I like animals.
I love animals in real life, but that doesn't mean that I love watching videos of animals on a screen, for me it's not the same experience.
The main goal that I see social networks addressing with many people is entertainment, but I feel that it takes away from my social life with people, that's why I have aversion to short videos for example.
Facebook didn't have these short videos forced on me, but just a few months ago it started to show video reels for me, and I couldn't stop myself watching them.
After bringing the settings back to just showing what's happening with my friends, the next time I opened it, it started showing addictive videos to me again.
Youtube disabled showing downvotes, which was the best signal for me to see if it's worth to see a 15 minute video, or if it's just a spam talking about nothing for 15 minutes just to optimize content length for the algorithm. And shorts are not a solution, as they don't have any depth either. Also it started spamming me with new channels that reupload videos of interesting people from years ago just to make me think that it's new content.
The main problem for me working on social networks is that making profit can't be separated from making people addicted to the lowest forms of entertainment (I'm a Xoogler).
> The main goal that I see social networks addressing with many people is entertainment, but I feel that it takes away from my social life with people, that's why I have aversion to short videos for example.
Having a different idea of fun than another person is very common. My partner and I, whom I share a lot of personality traits with, have some different definitions of fun. I can spend hours working through math problems and find it fun while plenty of others would find that horribly boring. To _judge_ other people for having a different idea of fun than you is to be condescending; it deifies your idea of entertainment while putting down the others'.
> The main problem for me working on social networks is that making profit can't be separated from making people addicted to the lowest forms of entertainment (I'm a Xoogler).
Is it that hard to accept that other people have different ideas for what being social means? Does everyone have to be like you or you judge them negatively? Your take isn't uncommon on certain parts of the Web (especially the anti-social-media parts) but I've always found it trite and judgemental exactly for the reasons I expressed above. Not everyone has to think and feel the way you do and that's okay. Humanity is large. Some of us like to go to hyper-commercial malls, some like to go on challenging hikes, and others are at the club dancing and imbibing their hearts out. Sometimes people even change what they like to do for fun as they age! That's okay, it's human.
>Is it that hard to accept that other people have different ideas for what being social means? Does everyone have to be like you or you judge them negatively?
I think this is the wrong question - it isn't about being different, its about not even having a customer to empathize with. I can't empathize with a product, no.
As for users, you can be so unfamiliar with a demographic that you have no conception how to help them. I haven't touched facebook seriously in over 10 years (probably close to 12 now), in the world of manufactured FB problems I wouldn't have the foggiest to start.
Now, ignoring the fact FB is absolutely not in the business of helping their users, the question becomes, can or should I become enough like something I see no value in?
I think the answer is probably "no" - if people are busy walking themselves off of cliffs, I have no moral or ethical obligation to help them walk off cliffs, and I really don't have any obligation to walk myself off of one.
Before I respond to anything specific, I want to be really really clear that I share a lot of your (dis)tastes, and I understand where you’re coming from. I’m not even necessarily interested in convincing you to change any of your options, only sharing my perspective.
My only real feedback, for whatever it might be worth, is that this comment—discussing how you feel about these social networks—invites a lot more sympathy, at least to me, than judging how others value them differently.
> It's not like I don't understand their problems/goals. Just yesterday the person I was spending time with showed a cat video from Facebook, and as I seemed half interested only, she asked me if I like animals.
> I love animals in real life, but that doesn't mean that I love watching videos of animals on a screen, for me it's not the same experience.
I’m also not very interested in animal videos, and even fairly seldom interested in animal photos on social media. Videos I pretty much only watch if I’m pressured, and I’d say I skip by probably 80% of photos too.
And I’m an enormous hypocrite about this (I don’t really think so, but I would forgive anyone for thinking so). I post photos (and again occasionally video) of my pup almost daily. Partly this is because I am absolutely in love with her and love to share insights into our life, and adorable things she does.
Another part is there’s a small but very interested subset of my friends and family who, I’m certain, gets real joy out of seeing these updates and feels more connected with me through them. They take a different pleasure and emotional experience from not just my posts, but others I see when the algorithm surfaces them. And every one of them, like all of us, needs and deserves more joy in life.
My pup and I also have, I believe, a fairly unusual origin story which I intend to write about more at length, but I’ll share briefly here. I will not be offended at all if you decide to stop reading here (if I have any point in sharing this here, that’s my point).
I have a fairly light touch on social media these days, for a lot of the same misgivings you describe. A few years ago, not so much. I was on Facebook a lot, and Twitter quite a bit more.
I’m pretty much a nobody on Twitter, but I became first friendly and then eventually romantically involved with someone who’s what they call “Twitter famous”. That fact wasn’t a particular attraction for me or a romantic goal, and still not of much interest to me other than it makes the whole thing feel still surreal to this day. In all honesty, of all of the times I’ve been star struck… well let’s just put it this way, “Twitter famous” means about as much to me as finding out some wealthy person I’ve never heard of was in the next town over. It’s just what happened in my life.
Anyway, we dated for a while and we eventually moved in together. I moved to a very cold place at the start of a very cold winter. And we adopted a puppy! Like some people do when they’re in love. It eventually didn’t work out between us (the humans, I hope it’s obvious pup is still with me and well loved). Sometimes things don’t work out with romance. That’s okay.
I’m ever grateful that she decided it was best if pup went with me. I’ve since had—and continued to raise, and grow with—the best companion of my entire life. All because I implausibly connected with someone on Twitter of all freaking places at (IMO) its absolute worst most toxic point, and completely despite a zillion reasons it would be even more implausible (some not in detail here, but some will be written when I’m ready to tell a more full story).
Why am I telling you (and others who might be reading), of all people, this abridged version? I don’t know. I hope maybe it’ll be more valuable to you than if I just found some way to send a video of my pup? And I guess maybe because if you (or anyone) finds this story interesting, I hope one of the things that stands out is… several important life events and one incredible bond came out of me just being open to and pursuing a connection that feels still so unrelatable to me.
I’m not saying you should ask me for a pup video (but you can!), but just… okay yeah the actual point. People are experiencing a much more complex world of social media than you might see. Give them some grace, please?
I was not planning to respond more in this thread, but you are trying to not argue with me, but rather being constructive, which I appreciate.
Facebook originally started as a social network where I could see my friends' lives, babies, family life, puppies, and it was awesome and interesting, as it's something that I could talk with them about.
The problem with looking at friends' lives from Facebook's point of view is that it's not as addictive, and ads don't integrate naturally.
I loved the book, ,,Chaos Monkeys'', as it explained how impossible it was to monetize the huge active Facebook userbase in the early days, while users were prioritized and making money was just a secondary afterthought that Mark didn't care about.
It turns out that making assymetric relations, where some people/content creators have tens of thousands of followers and hundreds of thousands of views, and not usually friends with (except the case you are talking about) are much more addictive for people, and much easier to monetize as well. At this point though social media gets more media than social, and gets the same problems that other type of media had: creating unhealthy expectation for people whose brains don't really distinguish between what they see there and real life.
Dealing drugs involves solving interesting problems too nevertheless IMHO it is bad for society and I try to avoid both dealers and addicts if possible.
Couldn't agree more. Anyone who survives and escapes that - including several of my close friends - is a fucking strong person. If I have to read one more comment disparaging addicts and then whining about how sOcIaL MeDiA Is a dAnGeRoUs aDdIcTiOn, I might actually blow my brains out.
I'm not sure why is it condescending. I have met people who do this all the time while I'm trying to just have a drink/dinner or get to know a person, but I'm just not able to connect with heavy instagram/facebook users. I prefer not to see them again, as for me it's very boring, and I see them as being unkind, when in all fairness they are probably just social network addicts.
It's condescending because you're focusing on the media that people use rather than what they have to say in life. There are (obviously) millions of interesting, passionate, intelligent, creative people using both of these apps.
> I got the impression there were lots of bright engineers that wanted fb pay without having to touch fb products.
I chose a job at Google Engprod (a tooling/developer productivity org) because I specifically like to work on dev tooling and I'm much less interested in working on products aimed at end users. Surely, many engineers at FB feel the same way.
Without a doubt, CitC/Piper/Blaze/Critique/Code Search is what I miss the most from Google. As you've said it really feels like a complete system. I'd pay decent money for a hosted version of it.
It is, more or less equivalent, but buck integration with piper/citc/cider makes it currently more comprehensive than fb infra. Fb is getting there but much slower. So i was more or less talking overall...
FWIW this is also true on the design side. I've researched a lot of design teams and their custom tooling (I worked on design tools at Atlassian and now am working on them at Figma). Facebooks is leagues above the rest. I've been blown away at what they do there. Some really amazing engineering and design thinking happening on Facebook's internal tooling.
I'd love to know concretely what this means. I don't feel like I see 100s of design related stuff coming out of FB so it's hard to imagine they even need design tools. Maybe I'm just not aware of all of FBs products and world class design.
It's a prototyping tool, I don't believe it has any codegen facilities. It used to be a Quartz Composer framework and then it evolved in to a standalone app.
Yeah the fbcode build system was hella slow. A minute plus just to build the "action graph." I did marvel at all the lost productivity, especially since an minute plus is enough time for me to context switch to another task, or at least go get (another) cup of tea...
Mercurial / Eden were pretty fast though. Never had complaints about them.
A Mercurial compatible SCM (not sure if it is a fork) built for their workflow (monorepo) and scale (enormous, git is not usable at their scale, at least for a monorepo). Uses Python and Rust. Designed for efficient centralization rather than decentralization.
To their credit, I don't think its use-case is limited to monorepos. I've personally had a multiple GB `.git` file due to storing many data files in a repo. In retrospect, data __shouldn't__ be version-controlled, but it's sometimes the simplest solution, e.g. having a unit-test suite intake a bunch of CSV data. Eden's "a file is checked out only if opened" and "scan only modified directories" would've allowed me to avoid decoupling data from code.
Data should be versioned if it’s one of your inputs, even if you can’t merge it. It’s just that the gut tooling for it (including git-lfs) is horrible.
Data can and should be versioned, but not by just `git add BLOAT`. Take a look at https://dvc.org/: blobs are uploaded to a S3 compatible blob storage, metadata is versioned in a config file and this one gets versioned in git
- Requires a custom kernel to run. Although most of these patches are probably floating around the kernel mailing list in some form or another.
- Requires Google's RPC and authentication system. (I guess GRPC is open source now, IDK if piper has switched, but you still need auth)
- Requires Google's group membership seefvice.
- Requires Google's storage engine. I don't remember if they have migrated to spanner yet but even then it would be using internal spanner APIs not the cloud ones.
There are probably also more less obvious ones but the point is that when Google writes software to run in Google production it is based on top if a mountain of infrastructure. I'm not sure the design of Piper is in any way novel enough to be worth it. I mean piper works and scales well but I don't think it is a fantastic VCS.
I wouldn’t think there would be many companies running repos as large as Facebook.
If you use a tool like this you would basically be on your own. If you “stick to git” (or mercurial or whatever) at least you have all that momentum behind you, and you almost definitely won’t be the first people to encounter a problem.
One very counterintuitive truth of large scale software development is that as you scale to multiple services, you are gravitationally pulled into a monorepo. There are different forces at work but the strongest at this scale are your data models and API definitions.
The problems with data models can happen at small scale. I remember the first large-ish project that I built, it had a few different components but the problem started when I tried to introduce data models to the Java and Python parts (I had it in my head after reading a book that I needed domain objects or some other nonsense in each language)...mistake, the data was still changing, it took forever to make changes, wasn't critical but I learned my lesson very quickly.
One perspective on this is that many ORM libraries don't take the DB as the source of truth (one very good ORM library that does, and which saved me in this case, was JOOQ). I think a lot of small-scale problems could be solved this way, monorepo is just another variation of this solution: moving the source of truth into the repo.
It is surprising to me how often variations of this problem come up. Obviously, there are solutions from multiple directions: having cross-language definitions (ProtoBuf), Arrow (zero-copy abstractions suitable for high performance), maybe even Swagger which comes at the problem from documentation...but I think this problem still comes up anywhere (and the DB approach is, imo, a very strong approach with a decent ORM at smaller scale).
Does your schema for data models (inception!) have a revision associated with it? If not, deployments are going to be spicy. If so, you end up having to deal with version-rot. Part of why putting this in the repo with your source code is a winning solution is that when you're working off head, you will naturally pick up and test the latest thing, and in most cases your next deploy will also just naturally roll forward as well.
Is it? I'm working for a small company and even at our scale, when the workflow is centralized, I find git a bit painful at times. I mean it's still an amazing tool, don't get me wrong, but when you have to deal with several sub-projects that you have to keep in sync and need to evolve together, I find that it gets messy real fast.
I think the core issue with git is that the submodule thing is obviously an afterthought that's cobbled together on top of the preexisting SCM instead of something that was taken into account from the start. It's better than nothing, but it's probably the one aspect of git that sometimes makes me long for SVN.
At the scale of something like Facebook you'd either have to pay a team to implement and support your split repo framework, or you'd have to pay a team to implement and support your monorepo framework. I don't have enough experience with such large codebases to claim expertise, but I would probably go for a monorepo as well based on my personal experience. Seems like the most straightforward, flexible and easier to scale approach.
If your company is small, I don't think you should be using git submodules at all.
My last place was about 10 years young, 150 engineers, and was still working within a single git repo without submodules.
There is a non-zero amount of discoverable config that goes into managing a repo like that, but it's trivial compared to the ongoing headaches of managing submodules, like you suggest.
We need to track large external projects (buildroot, the Linux kernel for instance) so the ability to include them as submodules and update them fairly easily is worth it IMO. If you're at the scale of Google it probably makes vastly more sense just including the code in your monorepo and pay a bunch of engineers to merge back and forth with upstream and have the rest of your team not worry about it, but for us it would take a lot of time and effort to maintain a clone of these projects in a bespoke repository.
We have customer IP that not everyone is allowed to access and has to be deleted after the project is done. We use submodules and IMO it sucks but I don't see a way around it considering the restrictions.
For extremely small companies (N == 1) git submodules can be neat though. It’s a great way to create small libs without having to bother distribution through LuaRocks, npm, RubyGems and the like.
Submodules are a great way to break out libraries in a language-agnostic way without having them really be broken out. This is independent of team size.
For a little more color on your modulo, the major omission in google3 I can recall from ~9 years ago was Android. For Reasons, I think legal.
The others weren’t “oh huh” enough to be easily recalled writing this comment, which probably speaks to their interestingness. But yes, you can chdir from search to calendar to borg and their dependencies, internal and vendored. It’s pretty much all there. It was pretty splendid, actually, and influences my thoughts on monos to this day.
This doesn't surprise me. It's a fork, nobody bothered to update the readme, and as much as folks wanted to update things like documentation, improving the software was a higher priority.
Facebook/Meta’s dev tooling, libraries, and infrastructure has always impressed me. React, Hack, jest, and now this. I am very surprised they haven’t tried to monetize this aspect of their business like Microsoft and Amazon did.
They could be a very strong contender in that space instead of being known as a terrible time suck for humanity.
I think it's a tall order for another SCM to challenge git. I can't imagine how it could be any more entrenched in the industry.
Further, I'm happy with git. I played with Mercurial years ago, long enough to work with it day-to-day, and just didn't find any relevant advantages versus git.
I love that people are still out there trying to improve things, and certainly don't want that to stop, but it's difficult for me to imagine switching at this point.
I'm also happy with git, but there's 3 main things that could improve on git IMO:
1) Better handling of large files than git-lfs. As in 10+ GB repos. This is needed for game development (currently they tend to use Perforce or PlasticSCM)
2) Sparse checkout via file system integration (like Eden has)
3) Build system integration, so unchanged files and modules don't even need to be fetched to be compiled, because cached builds can be fetched from a build server instead (requires proper modularization, so e.g. C++ macro expansion doesn't just prevent anything from being cacheable)
These are all features that primarily have value for repos that push the limits on size, like big monorepos (with a huge amount of files) or game development (with big asset files). But get it right, and you could massively cut down the time it takes to check out a branch and build it.
This is by no means a perfect match for your requirements, but I'll share a CLI tool I built, called Dud[0]. At the least it may spur some ideas.
Dud is meant to be a companion to SCM (e.g. Git) for large files. I was turned off of Git LFS after a couple failed attempts at using it for data science work. DVC[1] is an improvement in many ways, but it has some rough edges and serious performance issues[2].
With Dud I focused on speed and simplicity. To your three points above:
1) Dud can comfortably track datasets in the 100s of GBs. In practice, the bottleneck is your disk I/O speed.
2) Dud checks out binaries as links by default, so it's super fast to switch between commits.
3) Dud includes a means to build data pipelines -- think Makefiles with less footguns. Dud can detect when outputs are up to date and skip executing a pipeline stage.
I hope this helps, and I'd be happy to chat about it.
I'd be curious to see if you've tried git-annex, I use it instead of git-lfs when I need to manage big binary blobs. It does the same trick with a "check out" being a mere symlink.
I haven't used it, no. Around the time Git LFS was released, my read from the community was that Git LFS was favored to supersede git-annex, so I focused my time investigating Git LFS. Given that git-annex is still alive and well, I may have discounted it too quickly :) Maybe I'll revisit it in the future. Thanks for sharing!
Neither is favored, git-annex solves problems that git LFS doesn't even try to address (distributed big files), at the cost of extra complexity.
Git LFS is intended more for a centralized "big repo" workflow, git annex's canonical usage is as a personal distributed backup system, but both can stretch into other domains.
In this case git-annex seems to have a feature that git LFS doesn't have that would be useful to you.
I work in games, and we use PlasticSCM with Unreal Engine.
> Better handling of large files than git-lfs.
PlasticSCM does really really well with binary files.
> Sparse checkout via file system integration
Windows only [0] but Plastic does this. I've been working through some issues but it's usable as a daily driver with a sane build system.
> Build system integration
UnrealBuildTool at the same time is both the coolest and most frustrating build systems I've ever used. It's _not_ a general purpose build system for anyone and everyone, it's tailored to Unreal, it's slow to run the actual build system, the implementation is shaky at times, but some of it's features are incredible. Two standout features are Unity Builds, and Adaptive Unity builds. Unity builds are common now [1], but adaptive unity is a game changer. It uses the Source control integrations to check what files are modified and removes them from the unity blobs, meaning that you're only ever rebuilding + relinking what's changed, and get the best of both worlds.
ClearCase (claimed to) support points two and three way back in the 90s. It never really worked right; it was always trying to "wink-in" someone else's binary, despite the fact that it was built using a different version of the source.
> These are all features that primarily have value for repos that push the limits on size, like big monorepos (with a huge amount of files) or game development (with big asset files). But get it right, and you could massively cut down the time it takes to check out a branch and build it.
Would this be a good fit for large machine learning data sets?
Every time a new model comes out that touts having been trained on something like a quarter of a billion images, I ask myself "how the heck does someone manage and version a pile like that?"
would you ever need to version individual images? at a high level, you could version the collection as a whole by filtering by timestamp, and moving deleted files to a "deleted" directory, or maintaining filename and timestamps in a database. I'm sure there are lots of corner cases that would come up when you actually tried to build such a system, but I don't think the overall scheme needs to be as conceptually complex as source code version control
Not handling large binary files is a feature from my perspective. Git is for source code and treating it like a disk or a catch all for your project is how people get into scaling trouble. I don't see any reason why versioned artifacts can't be attached to a build, or better, in a warm cache. I get that it's easier to keep everything together, but we can look at the roots of git and it becomes fairly obvious, Linus wasn't checking in 48GB 4K mpeg files for his next FPS.
> I don't see any reason why versioned artifacts can't be attached to a build, or better, in a warm cache
Because now you've got two versioning systems with different quirks and tools that behave slightly different that _must_ be kept in sync.
> I get that it's easier to keep everything together,
Easier is a bit of an understatement. Putting binary files outside of the project in another versioning system means you need two version control systems, _plus_ custom tooling to glue the two together. Also, the people who interact with these assets primarily are not technical at all, they're artists or designers, and they're the most likely to have issues with falling between the cracks.
If you want an SCM with direct build system integration there's always GNU make with its support for RCS :)
More seriously, can you describe what "build system integration" would look like? Basically like what GNU make does with RCS? I.e. considering the SCM "files" as sources.
How would such a system build a dumb tarball snapshot?
At my work we have such a system for our monorepo. Basically for each compilation unit (basically subdirectories), it takes the compilation inputs and flags etc and checks them against a remote cache. If it matches, it just pulls the results (binaries but also compiler warnings) from the remote instead of compiling anything. A new clean build is made every few commits to the main branch to populate the cache.
In practice it means that if I clone the repo and build it doesn't ever invoke the compiler, so that it takes only a few minutes instead of an hour.
Bazel has a similar system but I haven't used it myself,
That seems really useful, but how is this different from what ccache and various other compilation caches that cache things as a function of their input do?
The GP talks about "build system integration" being "a game changer", and I can see that being a useful thing for e.g. "priming" your cache, but that's surely just a matter of say:
ccache-like-tool --prime $(git rev-parse HEAD:)
I.e. give it an arbitrary "root" key and it would pre-download the cached assets for that given key, in this case the key is git-specific, but it could also be a:
> how is this different from what ccache and various other compilation caches that cache things as a function of their input do?
Vertical integration, basically. There is value in these things working, and working well. As an example of the issues with ccache etc, I've yet to find a compilation cache that works well on windows for large projects.
I've never worked on a project with a large remote ccache but I would guess it would be pretty much the same yes.
The "automation" of our in-house system is what really makes the difference but then again we have a team of developers that focus on tooling so it's not so much automated as it is maintained...
Coming from Darcs, Git has a horrible user interface. Something like Jujutsu has a chance to disrupt Git.
It can use the Git data store, so developers can in theory start using it without the whole team adopting it. Then it addresses the big problem with Git, the user interface:
I'm not suggesting that "jj" in particular will disrupt Git, but I think eventually the a tool which supports the git data store with a better user interface could take hold.
Git usage for most developer is 3-4 commands or once in a blue moon when they fuck up badly save a copy of your changes and reset hard. There aren't enough user interface improvements possible to get people to switch.
If you go to work for Google or Facebook, it quickly becomes apparent that switching SCMs is much cheaper than trying to use ordinary Git or Mercurial at scale. (Though it is clear that Google, Facebook and Microsoft are all trying to maximize the amount of familiarity and not reinvent the wheel too much; they all have been working on tools either building on, utilizing, or based on already existing SCM tools.)
This looks like it’s supposed to be more appropriate for very, very big repos. Which current Git doesn’t support and isn’t fundamentally designed to support.
So rather than use Git + Git Annex or something like that (maybe more), you’ll just use this alternative SCM.
(I keep hearing about how Git will eventually support big repos, but it’s still not on the horizon. Big repos with Git still seems to be limited to big corps who have the resources to make something bespoke. Personally I don’t use big repos so I have no skin in this game.)
Big repos work in Git today if you are able to play the sparse checkout dance. There are definitely more improvements to be made.
- Protocol for more efficiently widening sparse checkouts.
- Virtual filesystem or other solution for automatically widening sparse checkouts to greatly improve the UX.
- Ideally changing sparse checkouts from a repo state to just some form of lazy loading. Otherwise as you touch more files your performance will slowly degrade.
Yeah I think the directions are quite different. Both are improving user experience (either commands/behaviour with pijul or speed with Eden) but pijul is distributed with some efforts to improve algorithms but a bigger focus on improving semantics (making it more natural in some sense and more correct in another) whereas Eden is more centralising (the thing large companies want for their repo is branching not decentralisation, but DVCSes give the former mostly via the latter (I get that branches are a first class thing in git but much of their technical implementation can follow from things git must have to be distributed)) with a focus on massive size.
One thing I recall was an effort from pijul to make blaming a file run in O(n log h) where n is the size of the file and h is the size of the (repo or file, I’m not sure) history. I wonder if Eden will also have improved blame performance. I noticed they mentioned large histories but maybe it is still linear in the history size of the file. (The way hg works gives I think O(nh) where h is the size of the file history, which he stores per-file rather than per-commit like git).
The biggest differentiation for me between Git and Mercurial is that Mercurial is far better for code reviews because it manages stacks of "as small as possible" changes much easier. The git workarounds I've tried to replicate 'hg histedit' and 'hg absorb' are ... not good.
Similarly, I think Git(hub) has succeeded in open source because bundling more complete changes into PRs works well for unevenly distributed contributions.
I used Meta's Mercurial, having previously used primarily git (and SVN, and CVS before that). It has a number of very cool improvements over git, and it's well integrated into the rest of their infrastructure.
You know the feeling of having to use SVN after using Git? This is what it feels like to use Git after getting used to Meta's Mercurial. I wish I could go into the details, but I don't know how much of it was ported back to Mercurial.
I don't think it's trying to compete with git, it's not decentralized or meant to support big distributed open source project development. This looks like a nice tool for Big Company to manage its internal, private code repositories.
The decentralized part of Git and Mercurial is nice (eg. no need for Github et.al), but I think most software projects using Git or Mercurial do have a centralized server/hub...
I've been keeping an eye on Sturdy [1], a more performant, high level, and opinionated version control system. As a bonus, it seems to be compatible with Git.
OK, slightly off-topic but maybe the right minds are here. We have been developing an introductory CS curriculum committed to thinking-with-powerful-tools, including the command line, real programming languages, and git. It's great until it isn't. We intentionally maintain a simplified workflow, but still get the occasional merge conflict or local state blocking a pull. I keep thinking there must be a simplified wrapper over git which maintains the conceptual power while avoiding the sharp edges, even if at the cost of robustness. I'd be more interested in an abstraction than a GUI, but would be interested to hear whatever others have come up with.
The git user interface just sucks. hg supposedly has a better UI, and darcs is apparently better again (except sometimes merges could run in exponential time). Pijul is meant to give you darcs-like ui with good performance. But none of those things are fit which is maybe important to teach.
One possibility could be to use some kind of git ui. I only know about magit (which is built on/in Emacs) but I’m sure others exist.
The biggest problem I have with Git is the strong commit ordering. This leads to lack of tracking through cherry-picks which has very real friction for a fairly common workflow.
it solves scale issues that git can't solve at the moment.
fb monorepos are huge.
so for most people/companies this issue is not critical to solve and git is great.
Actually Git isn't quite so entrenched as you think. Perforce is still the norm in the games industry, for example, partly because it's more artist-friendly.
It looks like Google's Perforce/Piper, I think that would be a better comparision than Git, but I don't see any data on how the two compares. Anyways, Eden being open source is a great advantage already.
Does this do anything special for handling binary data, or is it mostly for text like git? I've heard that Perforce (another centralized SCM) does a good job with large binaries.
I love git just as much as the next guy, but git-lfs sucks.
I worked at Meta and used Eden. I remember it as a virtual, FUSE based file system for a huge monorepo. Basically, if you have GBs of data in your monorepo, and any individual user accesses < 1% of it, then there's no point in having each "hg update" update the other 99%.
But we were explicitly dissuaded from having binary data. I worked on image processing, and wanted to store test images for unit tests in hg, but the consensus was, that was a bad idea. So we stored them in a separate data store, and had a make rule to fetch them as part of the build.
Git only really has two problems with binary files:
1. The take up a lot of space because the entire history needs to be downloaded with all past versions (andany types of binary files completely change when updates so delta compression doesn't help much)
2. The are slow to update on checkout (not much of an issue if they don't chance much).
Basically any solution for large repos will also solve Git's "binary file problem" because in order to allow large repos you need to allow shallow and partial checkouts as well as efficiently updating the working copy (usually via a virtual filesystem).
TL;DR Git doesn't have a binary file problem, it just has a big repo problem. Binary files are often mentioned because they are the quickest and easiest way to get a big repo.
I thought this a few years ago when I was more junior as well. In open source circles git has been used for quite awhile but my company just recently moved one of our repos from TFVC to git. A Fortune 500 company I worked for was still using TFVC for some legacy products. Another product from a acquisition used Subversion and the migration to git (even with the company using GitHub Enterprise) still took almost three years.
Getting the inertia amongst developers to migrate to a different SCM can be quite the challenge.
Edit: initially said migrating to git was a challenge. But really it’s migrating from any SCM to another that’s challenging.
There are a lot of advantages to using git, in part because it was developed for a massive project (the Linux kernel) so it's very thoroughly tested, in part because it's exploded in popularity so a lot of a tools, integrations, and documentation are available for it. However, it's not the alpha and omega of version control. Some people prefer the interface of Mercurial, some people use Fossil because it integrates issues and wikis, others have stuck to older tools like Subversion, and still others are experimenting with new approaches like Pijul.
Source Depot is what it was called. IIRC, Windows was moving to Git (through some virtual file system [0]) and had some major teams actively using it, but that was a while ago too.
Facebook has a terrible track record when it comes to open-sourcing their internal tools. See: Phabricator, HHVM, Flow, Jest, ...
Even React, which is their most popular library, is not actually "open source." They're very transparent about the fact that their priorities are Facebook's needs -- even if they do take community input.
None of this is per-se bad, but you should definitely treat an open-source project out of Facebook with skepticism when it comes to adopting it for your own use cases (possibly making sure you're not too locked in when an incompatible v2 comes out with virtually no warning after FB's internal implementation drifts).
> Even React, which is their most popular library, is not actually "open source."
How do you define "open source"? It typically simply means the source code is available. By any definition I can think of, React is definitely both free and open source. How they design the software or if they take contributors isn't really relevant.
"Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose"
The license is the key bit.
On the other hand, there's "source available" software (also on wikipedia https://en.wikipedia.org/wiki/Source-available_software ), which is what your definition equates to, and I personally don't want to see confused with open source or free software.
There's three domains where people usually use the term "open source".
- Freely licensed software (eg: MIT, GPL, etc)
- Code visibility (eg: ForgeRock)
- Community focus/contributions
Not all "open source" implements each of these, and just because they implement one and not the other doesn't mean they're not expressly open source. Just some ecosystems are more open than others.
I'm aware React is MIT and of the various licenses etc.
As I see it commonly defined, "open source software", FOSS, and FLOSS all mean the same thing more or less. That the project uses an OSI approved license, or one very close to it, whether it's MIT, Apache2, or GPL.
"Free software" is the only of the phrases that I see having two competing common definitions, the "free as in money", and "free as in Free Software Foundation's definition of free software". This seems pretty understandable, since "free" is overloaded.
I only infrequently see people mixing up "open source" and "source available", and that's the specific thing I'm trying to discourage people from mixing up. I think keeping those terms clear, and especially calling out "source available" software as _not_ being "open source" (i.e. not granting you the freedom to modify it or run your own copy in some cases) is important.
I see the opposite often argued too - that open-source is too wide a term as could also be understood as source available, and that FOSS/FLOSS should be preffered. But you're right, looking at most literature, most people seem to refer to oss and foss to be largely the same thing. I guess my biases are showing lol
Coming back to the original argument - which was that React was not truly open-source - being MIT, it 100% is, so I still don't understand it. That they prioritize their own needs for feature development is pretty much irrelevant, the source is there and you have permission to fork, tweak and publish changes on your own at any time. You legally are in your own right, but they don't have to make it easy on you.
As I understand it, "Free Software" is the term the FSF and general hacker community settled on for licenses that preserve user's freedom to modify and redistribute source code.
From there, I see "Open source" being slightly more often associated with companies or younger developers, and "free software" more often being associated with the GNU project, copyleft projects, etc.
I'm curious if you have references or more explanation about the difference you're trying to draw, since it's one I haven't seen before.
> I'm curious if you have references or more explanation about the difference you're trying to draw, since it's one I haven't seen before.
My distinction between the two is whether outside contributions make it back into the original project. Free Software is about the rights of end users to inspect the code and make and distribute their own modifications, but then Open Source takes it a bit further by explicitly soliciting contributions with the ostensible aim of building a better project through cooperative labor than an individual programmer could build alone.
In practice though "Open Source" has turned into unpaid project management work for billion-dollar corporations, bitter disputes between contributors over conflicting standards of morality, technical visions in constant flux as contributors come and go, and endless bikeshedding about semantic version numbers / code style guides / other things that don't matter. For years I thought I was totally burned out on Free Software and walked away from all of it, but what I was actually burned out on is Open Source and have been able to love programming again by working on things that are explicitly "Free Software but not Open Source".
The `actix-web` drama a few years ago is a perfect example, when a huge crowd of onlookers felt morally justified excoriating a popular project's creator / maintainer for not managing their project to the crowd's standards: https://steveklabnik.com/writing/a-sad-day-for-rust
>Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose
??? Isn't React licenced under the MIT licence? It seems to me that it tickes all boxes?
As I wrote at the top of my comment. "I agree with you that react is definitely open source" (and no, I did not edit that in, it was there when you read it)
I'm aware react is licensed under the MIT license. I was just talking about how the parent comment chose to define "open source".
By that definition, any software project that is driven by a BDFL wouldn't be open source, including Linux.
The terminology hasn't drifted that much. I think there is a small and vocal group of people who are trying to take back "open source" by reframing what it means, so as to exclude corporate projects. It doesn't make sense to me.
>any software project that is driven by a BDFL wouldn't be open source, including Linux.
There are certainly a push in that direction. And that you need a group of people, core or council to be considered as Open Source. And Linux has that.
>The terminology hasn't drifted that much.
It depends how you measure it, but Twitter and HN are at least two places where lots of developers are suggesting Open Sources equals to Community driven. And yes, there are also some movement towards MIT, BSD or Apache 2.0 as being not considered as Open Source because they do not contribute back changes. Although that hasn't gotten any traction. ( yet )
Everyone is going on tangents, but yes React is as open-source as they come. What people are conflating are the notion of community-driven and open-source.
React is a successful Facebook OSS project. An example of one which went poorly is Thrift. Facebook open-sourced it and then internally used fbthrift which diverged drastically. OSS Thrift isn't that popular these days any more.
Disagree. We need to stop trying to cram meaning into phrases that are already defined. Open source means the source code is freely available. Adding anything else requires a different name.
Free Software is an established term, and React certainly is not Free Software, due to its patent. I very much doubt that React is Open Source either, for the same reason.
So react is MIT-licensed and has a patent. How do those two work together? If I modify React's source-code might I not infringe on the patent and then get sued by Meta?
I’m not a lawyer, but from my perspective, that’s indeed a concern. And perhaps you could get sued just by using React. It could differ between jurisdictions as well.
A standard open source license with a patent grant, like the Apache license, would have been a lot clearer, but Facebook has so far refused to license React in that way.
A problematic patent grant was offered for earlier versions of React but that’s not the case anymore (and didn’t really fix the problem anyway).
I'm not a lawyer either and I wonder about the scope of Apache patent grant. Does it give you the right to use any patents the software in question "uses" in any possible context? Or does it simply allow you to modify the software any way you like and not get sued for infringement? But then how much can I modify it and still retain the right to use those patents? I mean if I create a totally unrelated software package which however shares some code with the original work, can I keep on using those patents anyway I want?
What's wrong with Flow, Jest or Graphql? I think these are all fantastic projects. I mean, Flow "lost out" to Typescript, but, it's usual for one winner to emerge from competing frameworks.
Graphql is great. With Jest, that project feels a little abandoned because the Typescript support (ts-jest) is pretty janky and has bad performance. Meanwhile in the ecosystem in 2022, it's becoming the norm to have first-class Typescript support.
React is patented and Facebook is actively choosing not to offer a patent grant, so unfortunately that’s not the whole story.
A fork may not be an option. Perhaps a given organization may not even be allowed to use React, if Facebook decides against it for some reason. Jurisdictions can differ as well.
In my opinion, the best thing would be if Facebook simply made the terms clear by using the Apache license or similar. But hey, it’s Facebook, so I’m not expecting much…
I'm actually curious what the strategy is here. To my knowledge only FB, Google and MS do megascale monorepo, and Google and MS already have a solution. Are there now other companies outgrowing Git that Facebook is hoping to build a community with?
Open sourcing an internal repository with extensive, ongoing work on it is always a difficult affair, because you're creating a second source of truth. (It isn't just how you manage external contributions, but also workflows like releases and CI.)
It means they haven't (yet?) transitioned to open-first and it'll require proving themselves that they'll do open-first development before trusting them. Not willing to bet my work on a product where the governance isn't open but everything is driven by and for a single company's needs.
For what, another Phabricator, that I’ll inevitably have to migrate my company away from again?
So if history serves the next announcement to watch for is your departure from Facebook and the launch of Edenity, which will be sunset and abandoned inside a decade once it fails to IPO. Am I close?
Yes, generally for a large codebase you will have a separately code search tool like Google's Code Search or Sourcegraph which are super fast for large amounts of code.
Hm. I can see why they'd build some of these features, but there's some significant downsides. The VFS in particular will end up a poor experience when a transitory network problem causes apps to hang when pulling code. What happens if you 'grep -r', or if 'mlocate' indexes it?
On the build side.... holy jesus, are they really compiling 40 different dependencies from scratch every time they push code? This build has been running for 5 minutes and it's still just compiling dependencies: https://github.com/facebookexperimental/eden/runs/5997101905... Come on, ya'll. You're supposed to be the "advanced FAANG people". Cache some build deps, will ya?
I’d be surprised if it was the case that the build system there is just cobbled together for the OSS version, and likely quite different from what they actually use at FB.
One thing to keep in mind for how development at larger tech companies works is that you’re often not building on your own desktop, you’re usually building on a development server that’s on a well-connected (effectively production-quality, if not literally the same) network. You don’t see a ton of drops in those cases, so it works well. Not that there hasn’t been effort to recover from networking issues encountered in this and other build tooling - at scale, someone’s development server is going to have a bad day every day.
You also need much better tools than grep and locate for a monorepo - or any sufficiently large repo probably. Just load the full repo into memory in a few places around the world, and use an API to find the text you’re looking for. If you already have expertise with search services in your company, this is not that challenging a step - and you can get fancy by using something like Tree-sitter to make those searches more advanced than text. Hitting disk (especially for whole directory trees for “grep -r”) is a losing approach in a large repo.
Is there a FUSE filesystem for Git that takes a similar strategy to edenFS?
Only setting up files and folders as they are requested might be very helpful with various git monorepo access patterns. Maybe there is something inherit to Git's design that makes this less practical.
Why do you think this is madness?
FB doesn't use git for these monorepos, so that's not really relevant, but I don't understand why you think that it's better to break a repo up because the SCM can't handle the size vs fixing the SCM so it can handle the size.
I work for meta (nothing to do with the teams working on this though) and I can assure you people have considered the tradeoffs of breaking repos up just to accommodate existing SCMs vs improving the SCMs.. I think if you believe improving the SCM instead of breaking up the code is madness you should probably provide a bit more of a justification
The project's name goes back to 2016. Granted, the README called the project a "filesystem" back then not an SCM, but... still, the name seems to predate the series by quite a bit.
I think it's probably from Facebook's culture of pretending it's a force for good when it's a mixed bag like every other Megacorp. Here's an example of what Zuck has said:
> I believe the most important thing we can do is work to bring people closer together. It's so important that we're changing Facebook's whole mission to take this on.
No wonder they have a major project named Yoga and now this...
It was called css-layout but got renamed to Yoga because it implemented the “flexbox” layout and not all of CSS. Flex -> Yoga was indeed the joke/reference.
The comment you're replying to is a perfectly good reason why it's named Yoga, not sure why it'd still seem "odd" with that context. Not particularly witty, but it definitely makes sense.
No, you can't simply dismiss the fundamental problems with Facebook/Meta with whataboutism. Google and Apple and Microsoft are standard mixed bags. Not Facebook.
Facebook is pure evil. It has queued up the complete obliteration of western democracy, which even Rupert Murdoch couldn't quite manage on his own. You can absolutely find non-directly-evil things to at Facebook/Meta, but it all supports the evil in the end.
If Eden the TV series was based on a book, the book could be older, but it isn't.
It's also possible, but not likely, and not the case here, that a TV show could be a sensation long before the first episode comes out. I can't think of a time when this happened except for a prequel or sequel like Better Call Saul, where it was much awaited, but I'm sure there are instances of that occurring.
Edit: from another comment, there's this, which came out in 1997 but isn't mentioned in the Wikipedia article for the TV series so I'm not sure it's related: https://en.m.wikipedia.org/wiki/Eden:_It%27s_an_Endless_Worl... There is a Manga mentioned in the article for the TV series but it came out the same year as the TV show.
It's great to see this out in the wild now.