Hacker News new | past | comments | ask | show | jobs | submit login

We use one repository at google.

We've found that "Removing the ability to make large scale changes easy and thus increasing reliability." isn't actually correct.

As an example, most of your codebase uses an RPC library. You discover an issue with that library that requires an API change which will reduce network usage fleetwide by an order of magnitude.

With a single repo it's easy to automate the API change everywhere run all the tests for the entire codebase and then submit the changes thus accomplishing an API change safely in weeks that might take months otherwise with higher risk.

Keep in mind that the API change equals real money in networking cost so time to ship is a very real factor.

It sounds like facebook also has a very real time to ship need for even core libraries.




What you've got here is a different kind of tradeoff.

Normally, libraries are written under the assumption that clients cannot be modified or updated when the library changes. This brings in the concept of a breaking change, and a set of design constraints for versioning. For example, modifying interfaces becomes verboten, final methods start becoming preferable to virtual methods, implementation detail classes require decreased visibility, etc.

The advantage is that library producers are decoupled from consumers. Ideally the library is developed with care, breaking changes between major versions are minimized, and breakage due to implementation changes are minimal owing to lack of scope (literally) for clients to depend on implementation details.

But under the model you describe, you're leveraging the Theory of the Firm as much as possible - specifically, reducing the transaction costs of potentially updating clients of libraries simultaneously with the library itself.

The downside is the risk of unnecessary coupling between clients and libraries - the costs of a breaking change aren't so severe, so the incentive to avoid them is lessened, and so the abstraction boundaries between libraries is weakened. If the quality of engineers isn't kept high, or they don't know enough about how and why to minimize coupling, there's a risk of a kind of sclerosis that increases costs of change anyway.


The risk you describe is a real risk. But we mitigate it at google with strong code review and high bars for our our core libraries teams.

All of our core libs are owned by a team and you can't make changes to them without permission and a thorough code review. Our perforce infrastructure allows us to prevent submits that don't meet this criteria so we get the benefits of ownership only we use ACL's instead of seperate repo's. It has so far worked very well for us.


You don't need a single repo to be able to run tests across all existing tools. If you have proper dependency management set up, you make the change, push it and a CI server goes off and builds it, then all dependent projects get rebuilt and tested...


No, but you do need a single repo if you want to make the API change and update all the dependencies in one fell swoop.


Not just make the change in one fell swoop but also rollback the change in one fell swoop if it turns out to have broken something.


Does this matter that much if you version everything?

Commit in library -> successful library build, failed client build. Commit in client -> successful library build, successful client build.

In both situations you don't really care about the intermediary broken build - you still have the previous library/client versions and can use those. Once everything is committed & fixed you have the new versions and you can upgrade.

The only problem I see if there's a long delay before the second commit. But this can be prevented by a fast CI cycle (always a good idea) and sending notifications for failures across teams (i.e. the library committer is notified that the client build for his commit failed).


If you care about how long it takes to make the change in all your code bases and you care that you identified all the code that needed to change did change then it matters.


This is what facebook means when they say atomic commits.


What version control system do you use, or is it a secret? Perforce? Git? I've seen Linus's talk that he gave about Git at the GooglePlex so perhaps you use Git. If so, how have you not run into Facebook's scaling issues?


This question is actually really complicated to answer. The short answer is that we run perforce.

The long answer is that we run perforce with a bunch of caching servers and custom stuff in front of it and some special client wrappers. In fact there is more than one client wrapper. One of them uses git to manage a local thin client of the relevant section of the repo and sends the changes to perforce. This is the one I typically use since I get the nice tooling of git for my daily coding and then I can package all the changes up into a single perforce change later.

Google has invested a lot of infrastructure into code management and tooling to make one repo work. We've reaped a number of benefits as a result.

As others have mentioned though there are trade offs. We made the tradeoff that best suited our needs.


Interesting, has Google made the client parts open source?

It would be great if someone used that to write a high performance git server.

Could they?


There is no such thing as a high performance git anything. People at Google who use git (I used to be one of them) suffer from far worse performance than the people who just use perforce as such. In particular, the cost of a git gc, which invariably occurs right when you're on the verge of an epic hack, or fighting a gigantic outage, is unreasonable and perforce has no analogous periodic process.


a wild Perforce shill appears

We actually have an open source tool that allows you carve off parts of your Perforce server as Git repos. The repos can overlap in Perforce allowing you to share code seamlessly between different Git repos. You can generate new repos from existing code easily and can even generate shallow git repos that are usage for development.

Details are at: http://www.perforce.com/product/components/git-fusion

I'm happy to answer questions here or on Twitter: @p4mataway


Google runs the largest Perforce server out there.

http://www.perforce.com/blog/110607/how-do-they-do-it-google...


From what i heard, mostly Perforce.

The story i got from a Googler was that there are many separate projects in a single repository (which is normal for Perforce), and dependencies are handled by making your libraries subprojects (or whatever they're called - like subrepos in Git, essentially symlinks), rather than using an artifact-centred approach.


thanks for this remark. i've been struggling with the choice of splitting into multi-repos, but have similar thoughts to what you just confirmed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: