Hacker News new | past | comments | ask | show | jobs | submit login
Notes from Facebook's Developer Infrastructure at Scale F8 Talk (gregoryszorc.com)
162 points by cpeterso on Mar 28, 2015 | hide | past | web | favorite | 46 comments



The most interesting part for me was

    > They appear to use code coverage to determine what tests  
    > to run. "We're not going to run a test unless your diff 
    > might actually have broken it."
This seems like such an obvious optimization in hindsight, I'm surprised it's not more common. Does anybody know of other CIs that can do this?


With Bazel, it's pretty easy to make this transformation:

    - Get a list of source files changed.
    - Use bazel query with rdeps to get a list of tests that depend on that source file.
    - Run those tests.
A little less implicit than the code coverage approach, but roughly the same effect.


I have used that (only running a few tests that match the diff) in the past to great success as the first level of testing that would occur on the users computer before committing code. It reduced the amount of build failures by 99% and test failures by a similarly high amount. On the server we would run those tests first and we would continuously sort the order of tests so tests that failed in the past would be run first to reduce the build+test time and find failures faster.

There are lots of tweaks that can be done, but these days having automatic build+test before a commit gets into the main branch feels like it is the norm. Even open source github projects can use something like TravisCI to prevent bad changes from getting into a project.


We have a home grown solution for this. We use pants (https://pantsbuild.github.io/) for our Python builds and phabricator for code reviews. Pants gives us a dependency graph so from which files get touched in a given diff, we can feed that back into pants to get the targets to build and the test suites to run. It's not as granular as running individual tests in a suite for small changes, but it definitely helps cut most runs down to just a small percentage of all the tests. The ability to pants to build test suites in isolation is also pretty key since it means we can run a whole bunch of suites in parallel so in many cases the total test run time can be just the time of the longest individual suite.

As others have mentioned, kicking off full runs on a regular basis is important to make sure there aren't weird cases that break and aren't caught in this approach.


I'm not sure how Facebook does it, but there are a lot ways to get something like this up and running.

I have experience with using Facebook's pfff [0]. Its codegraph feature builds up a graph of code that you can query. Doing something like 'get me a list of all classes this node depends on' on Java code is as simple as executing 'Graph.use(node name)' where nodename can be a class, package, method... Pfff supports many more languages, and its built on OCaml.

Facebook has a history of open sourcing stuff, I hope they open source their tools mentioned in the talk.

[0] https://github.com/facebook/pfff


I'd be hesitant to rely 100% on this approach (and from the notes it sounds like FB still runs the other tests - just less frequently in some cases), especially where the code coverage tools aren't guaranteed to account for unintended interactions between seemingly unrelated pieces of code. In theory, well-written code/tests shouldn't have those kinds of problems in the first place, but larger projects often don't deserve that level of trust (in my opinion).

Still, I think it's a great idea for giving quick, "preliminary" feedback to developers - as long as the full test suite is still run periodically.


Yeah, absolutely. You'd want to run your full test suite (which probably includes heavier integration tests as well) before deploying, but I think in a lot of cases it's valuable to provide, say, 95% accurate feedback quickly.


Another way to put it would be I will happily only take 95% if the other option is we don't do anything and people can even commit stuff that doesn't even build let alone pass tests.


This seems like a good optimization in theory.

In practice? I think it may end up selectively pruning for hard-to-find bugs, which is not a desirable feature of a test suite.


@tenderlove was experimenting with this idea for ruby/rails:

http://tenderlovemaking.com/2015/02/13/predicting-test-failu...

Here's hoping someone will make a gem for this stuff.


This talk and my experiences at Microsoft are why I tend to try to steer our projects at Mozilla Research to partner with external offerings rather than doing too much building of our own. It's hard to imagine a world where we build and continue to improve a source control / issue tracking system better than GitHub; a continuous integration solution better than Travis CI; a build system; etc.

Even if we can imagine building a better product today, the ongoing maintenance and extension of those services quickly become full-time multi-person investments with many hundreds of dedicated machines, neither of which even our core infra teams have, much less the little tiny research org.


Except it isn't internal only. Many of Facebook's tools are open sourced. And not in the "throw it over the wall" sense. Their open sourced projects tend to gain traction and get a significant amount of community contributors. By turning a number of their core tools into successful open source projects, they are leveraging the community effect to lesson the ongoing maintenance burden for these projects.

Yes, they are paying a high initial cost to develop these offerings. But since they tend to produce high quality products that attract nearly-free-to-Facebook labor via open source, the investment tends to pays off in the long run. It's a savvy business move. And one that can arguably only be pulled off by a talented engineer organization.


> Yes, they are paying a high initial cost to develop these offerings.

I certainly agree that for organizations who can afford these high initial costs internally this is a great development model. My challenge (as a manager at Mozilla) is that, given we don't have such resources available, how do we build partnerships with groups or other projects that do in order to ensure such offerings are built?

Of course, another interesting approach would be to say that GitHub, Travis, Heroku, etc. show that DevOps-as-company is a viable business model and really what we should be doing is spinning up companies that raise funding and build products around each of the areas that are still lacking products that serve enterprises (read: customers with money) today.


It's not hard to imagine a world in which someone builds or continues to improve an issue tracking system better than GitHub, because GitHub seems to be actively uninterested in improving its issue tracking system in various dimensions that matter to a lot of issue tracking system users (ability to attach testcases, dependency tracking, and release management are the big ones for me personally). It's not just that GitHub is lacking in those areas; they've repeatedly rejected suggestions for improving them.


> they've repeatedly rejected suggestions for improving them.

I certainly agree that it's easy to imagine better issue management or patch review than GitHub has today - like many other projects we use other systems. When I've talked with GitHub engineers, all of our issues are well understood and they have plans to address them.

My skepticism is around the strategy of building and supporting our own things vs. partnering with places like GitHub that make a living building and supporting such software. In the Mozilla-specific case, they have proven open to working with us in the past, by pony-trading enhancements to Firefox for enhancements to GitHub, and the inability to deliver has been more on our side :-/


Interesting. I haven't seen any requests for enhancements to Firefox (or more precisely Gecko) from GitHub, and I watch all incoming Gecko bug reports... Were they asking for these enhancements from Firefox itself, not Gecko, or through some sort of opaque channels or something?

Past that, partnering with people makes sense to me if they're flexible enough to add things when we need them. My point is simply that historically GitHub has not been adding the things we need.


If it makes you feel any better out there is a really good issue tracking system that has a bolted on git repository system that actively is rejecting improvements for their repo stuff.

If GitHub didn't have issues people would still be there. If GitHub didn't have Git people would leave. Issues are a accessory to the main event like AppleTv is to the iphone/macbook


I'd rather have you steering it to opensource based systems/services, you know in the spirit of open web.


On the topic of having one IDE that just works for multiple languages/platforms, you should check out srclib (https://srclib.org), an open-source language analysis tool (I'm one of the creators) that abstracts away the language-specific parts of IDE support so that IDEs/editor plugins can hook into one API and automatically get support for a bunch of different languages. This way, you don't have to build MxN plugins (M editors, N languages) to support all combos of editors/languages people might want to use.

The currently supported languages are Go, Java, Python, JavaScript, and Ruby, so this wouldn't work yet for iOS, but Objective-C and Swift are high on our priority list. Editor plugins exist for Emacs, Sublime, and Atom. Would love to hear anyone's feedback from trying it out.


>On the topic of having one IDE that just works for multiple languages/platforms,

I've recently given Eclipse another chance after 4 years. It has impressed me with its C, PHP and Haskell support so far. I really recommend people give it a shot. Of course I installed a half decent GTK theme to make it look nice :)


Please oh please make one for Rust.


This sounds a lot like Steve Yegge's Grok:

http://bsumm.net/2012/08/11/steve-yegge-and-grok.html

It's unlikely that his version will ever be open sourced or usable outside of Google though.


IIRC http://kythe.io is at least related by heritage to grok.


Yep, Kythe is the open-source version of grok. It's awesome, and the srclib team is in touch with the Kythe team at Google to build on top of their analysis libs (for C++ and Java).

Long term vision is to get rid of the "my editor X doesn't support language Y" problem as well as build a lot of other multi-language analysis tools on top of it. Shameless plug: if you care about developer productivity, join us!


"We made mercurial up to 50x faster than git, more than 2000 improvements" on big repos with lot of history.

This would be their blog post about this: https://code.facebook.com/posts/218678814984400/scaling-merc... and the code at http://selenic.com/repo/hg/shortlog/


Interesting info about XCode and the FB app. I wonder if Apple is collaborating with FB on this or if it's mostly deaf ears...

"So they built their own IDE. Set of plugins on top of ATOM. Not a fork. They like hackable and web-y nature of ATOM.

The demo showing iOS development looks very nice! "

Seems that they have the right idea about "optimizing for Developers"


Although I don't really like XCode but use it all the time, what the hell takes 5 minutes to open a project????? The FB app isn't that big. Maybe it includes downloading an entire repo or something? I've worked on big iOS apps and this was never an issue.


Facebook has a 54 GB monolithic repo (as of almost a year ago, April 2014).

Primary source: https://twitter.com/feross/status/459259593630433280

HN article: https://news.ycombinator.com/item?id=7648237


A lot of nice summaries and points, but one caught my eye - the mention about building Nuclide on top of Atom. My experience with Atom is that it gets painfully slow quickly, with me experiencing a crash every few days.

I would like to hear more thoughts from FB folks on their experiences.


Java desktop applications have shown that with enough resources thrown at the problem, slow and flaky things can become fast and stable. JavaScript desktop applications seem to be at the "slow and flaky but has tons of resources being thrown at them" stage.

The VM, as with Java ten years ago, is also seeing remarkable improvements.


"Push-pull-rebase bottleneck: if you rebase and push and someone beats you to it, you have to pull, rebase, and try again. This gets worse as commit rate increases and people do needless legwork. Facebook has moved to server-side rebasing on push to mostly eliminate this pain point. (This is part of a still-experimental feature in Mercurial, which should hopefully lose its experimental flag soon.)"

What if there are merge conflicts? I don't know about Mercurial, but in Git there are tons of cases where rebasing cannot happen automatically.


The process is optimistic. You submit your commit and it runs async in the background with success or failure sent to you via email and SMS. You still run into merge conflicts, especially if you are touching a frequently tweaked bit of core, but not having to babysit the process is almost always a win.


Although I love reading how big companies solve developmental problems I am also glad I don't work there. Programming at FB seems more like working in a salt mine.


Does anyone have a .torrent of all the videos from F8? It's a shame that they are self-hosting otherwise it would be possible to youtube-dl it :-)


The engineering sessions are in this playlist: https://www.youtube.com/playlist?list=PLb0IAmt7-GS1_7FcSupSJ...


Thankyou.


Here's a bash script that will download all of the videos:

    $ cat download-facebook-f8-2015-videos.sh

    #!/usr/bin/env bash
    youtube-dl "https://www.facebook.com/video.php?v=10152795258318553"
    youtube-dl "https://www.facebook.com/video.php?v=10152795258318553"
    youtube-dl "https://www.facebook.com/video.php?v=10152795404193553"
    youtube-dl "https://www.facebook.com/video.php?v=10152795420103553"
    youtube-dl "https://www.facebook.com/video.php?v=10152795617423553"
    youtube-dl "https://www.facebook.com/video.php?v=10152795634488553"
    youtube-dl "https://www.facebook.com/video.php?v=10152795636318553"
    youtube-dl "https://www.facebook.com/video.php?v=10152795737378553"
    youtube-dl "https://www.facebook.com/video.php?v=10152795739268553"
    youtube-dl "https://www.facebook.com/video.php?v=10152795771278553"
    youtube-dl "https://www.facebook.com/video.php?v=10152795771278553"
    youtube-dl "https://www.facebook.com/video.php?v=10152795779043553"
    youtube-dl "https://www.facebook.com/video.php?v=10152795794003553"
    youtube-dl "https://www.facebook.com/video.php?v=10152797088108553"
    youtube-dl "https://www.facebook.com/video.php?v=10152797098293553"
    youtube-dl "https://www.facebook.com/video.php?v=10152797107658553"
    youtube-dl "https://www.facebook.com/video.php?v=10152797148263553"
    youtube-dl "https://www.facebook.com/video.php?v=10152797156538553"
    youtube-dl "https://www.facebook.com/video.php?v=10152797345373553"
    youtube-dl "https://www.facebook.com/video.php?v=10152797350653553"
    youtube-dl "https://www.facebook.com/video.php?v=10152797720553553"
    youtube-dl "https://www.facebook.com/video.php?v=10152797736763553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800459113553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800485138553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800517193553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800539128553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800550043553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800554083553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800569888553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800582978553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800594638553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800597428553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800611948553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800614133553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800617653553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800624928553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800744793553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800779103553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800781003553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800789943553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800795178553"
    youtube-dl "https://www.facebook.com/video.php?v=10152800797663553"


Anyone have a mirror? Getting gateway timeouts.


The site is hosted via GitHub pages. Maybe it's not available to you due to the lingering GitHub DoS. If you care to read HTML and can manage to get through to GitHub: https://github.com/indygreg/indygreg.github.com/blob/master/...


* Yep, another "better than ever"/"last one you'll ever need" IDE. Thank you facebook. (see you soon for the next one)

* A DVCS that rely on a central server for merging (sandcastle) is no longer.. distributed... (and you cannot have distributed team work here, this is wrong in multiple way)

* I think i'll never let a centralized, monolithic repository to be set up in my company. All great stuffs/talent i ever learned came from differents sources, differents independant projets (from git, hg or SVN). Loose that and i think i'll get narrow minded.

* All those fancy stuffs make facebookers better "facebook developers" but less prone to share with others (we don't share the same language, culture or tools anymore, even the design of the monolithic repo cannot help here)

* This is more a lesson on "how we made X thousands lambda devs work together" than actual tools i might need to use someday.

* I'll bet that facebook "infra" developers are less likely to use the tools they describe, and that is ironic. Those tools mostly apply to some hidden mass (100k commit / week.. ) of uniform developers that I'll never meet.


"A DVCS that rely on a central server for merging (sandcastle) is no longer.. distributed... (and you cannot have distributed team work here, this is wrong in multiple way)"

Yet doing things like rebasing or merging pull requests properly, requires you have an up-to-date master, making it a centralized operation.

People seem quite happy with this.


>Yet doing things like rebasing or merging pull requests properly

Rebasing is a smell of bad source code management. Merging pull requests is fine whether or not you are up to date. I don't know what you mean by 'properly'.


Why do you think rebasing is a bad code smell? Most devs where I work don't rebase and our source tree ends up looking like this https://i.imgur.com/r0QCw7F.png


I don't know what you think happens if you merge against not tip of tree, and then try to commit against tip of tree -

hint: It has to merge against tip of tree again if it wants to get back to a single head.


It is much easier to bisect linear history than history with merges.


How is it any different? I do it all of the time with no problems.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: