Hacker News new | past | comments | ask | show | jobs | submit login
Bazel – Correct, reproducible, fast builds for everyone (bazel.io)
625 points by drivebyubnt on Mar 24, 2015 | hide | past | web | favorite | 174 comments

Working at Google, Blaze is one of the technologies that amazes me most. Any engineer can build any Google product from source on any machine just by invoking a Blaze command. I may not want to build GMail from source (could take a while) but it's awesome to know that I can.

I think this could be hugely useful to very large open source projects (like databases or operating systems) that may be intimidating for contributors to build and test.

Standard caveat: I don't speak for my employer.

Using Bazel (aka. Blaze) every day is one of the things that has made me dread ever leaving Google. Fast, reproducible builds are amazing. Once you have used this tool, it is very hard to go back. Personally, I'm thrilled that it has been open sourced.

Having recently left Google, GRPC (stubby) was my biggest concern; I spent about two weeks hacking together a good code generator for GoRPC before GRPC came out and obviated the time. Now, I'm glad I haven't bothered with a build system, which was going to be next.

Nice to see a bunch of projects that've been generalizable and heavily used internally finally see the light of the outside world. Now, to start evangelizing them.

A huge +1 on this as well.

I left Google a couple of years ago and we ended up building our own rpc around protos (Thrift just doesn't cut it), and our Make/maven based build has the standard problems with such things, so I'm really looking forward to using grpc and bazel in the near future. A huge thumbs up to Google!

Can you give a (very quick) pointer/explanation to what about Thrift didn't cut it for you?

The description language is oddly bulky and can't decide what order it's in.

The api to open a connection is, again, needlessly bulky: I make a socket, I wrap it in a buffer, I wrap it in a protocol, I create a client using it, then I connect? There's the same level of complexity offered in grpc, but it's offered through an options object with sane defaults.

There's no security protocol; or, if there is, nobody seems to use it. Is there an async call structure? If there is, nobody's ever heard of it. All the code I can find seems to be written at a preschool level. This may be due to working at a company that was an early adopter, or simply because the company is staffed by preschoolers.

As someone that used to work at Google, and currently works at Facebook, there's a lot of legacy API that you can mostly ignore. Async is there, but more or less just works, and uses C++11 lambdas quite nicely.

I think I preferred Google RPC, but it's not a huge difference to me.

It wasn't invented in Google and is therefore inferior. Google has a massive incentive to develop projects specific to their requirements and then open source and evangelize them to stomp out approaches not optimized for them.

Don't forget that there is a constant inflow of developers to Google. They all bring the tools and practices that they know, and will definitely leverage what they can.

It's not like we're run through a brainwashing machine when walking through the door :P

There just happen to be a lot of challenges to make things run on Google's infrastructure - and not all the tools we know and love happen to work well on it. Some of that is legacy, some of that doesn't have many parallels externally.

So, we write solutions that work well on Google's infrastructure. Some of that gets open sourced, in the hopes that it's also useful to the greater community.

But I definitely would not consider Google-written code to be superior to other solutions out there - just an alternative option to choose from - and certainly not the best for many situations.

Well, that's also largely due to all the source (transitive dependencies) being present in one monolithic repo.

Yea what's up with that? Sounds like a pretty terrible practice to me. Is that something that c++ forces you to do?

Just philosophical. I'm honestly not sure which approach I like more after having done it both ways: highly isolated projects (open source world, at Amazon), and monolithic (at Google).

It all boils down to dependency management in the end.


For the monolithic world:

* You're always developing against the latest version of your dependencies (or very near it).

* This comes at the cost of a continuous, but minimal, maintenance burden as upstream folk make changes.

* * However, because things are monolithic, upstream projects can change _your_ code, as well. You can be confident that you know exactly who is affected by your API change.

* * Similarly, being able to make an API change and run the tests of _everyone that depends on you_ is a huge benefit.

* You have to be more diligent when testing things that go out to users, as your code is constantly evolving.


For the isolated world:

* You can develop without fear of interruptions; your dependencies are pinned to specific versions.

* You get to choose when to pay the cost of upgrading dependencies (but typically, the cost is pretty high, and risks introducing bugs).

* * Security patches can be particularly annoying to manage, though (if you let your dependencies drift too far from current)

* During deployment, you can be extremely confident about the bits that go out.

* You can get away with less rigorous infrastructure (and maintenance costs related to that)

No, it's something that Perforce allows (because it scales sufficiently). It has nothing to do with C++.

You can impose order within the monolithic repo by partitioning projects into their own branches or directories and only pulling down the necessary pieces.

Whether this is better than a bunch of small repos is debateable.

Perforce (the product) doesn't really scale sufficiently for the Googles/Amazons/Microsofts of the world, sadly.

I think they've all moved to custom forks/implementations due to the insane SPOF that Perforce servers are (and their hardware requirements). But up til that point, heck yeah!

Well, Perforce in its default state may not scale sufficiently, but at least two of the companies on that list have managed to make it work (presumably with a lot of investment, though). :)

I didn't know Amazon was using Perforce. I interviewed someone from Amazon recently and he indicated they were on Git for most things now.

Amazon has phased out perforce a year ago. There is very little left, most repositories (perforce, svn, etc) were migrated to git repositories.

I've started at Google 4 months ago, and it's one of the best things to discover. Now open-sourced :)

Though not really open source ;)

Whoops, I am an idiot.

But you're our idiot!

> Any engineer can build any Google product from source on any machine

A little too optimistic :) You can't build Android, Chrome, ChromeOS, iOS apps, etc. via blaze.

When I worked at Google I built a Blaze extension to be able to build Android apps. It worked really well, though I'm not sure how well it was maintained after I left in 2010. Internally at Google, Blaze was extremely customizable, and I hope Bazel too, so one can easily add support for building iOS apps etc.

EDIT #1: I see support for building Objective-C apps is already present in Bazel. EDIT #2: Bazel uses Skylark, a Python-like language, which could be used to implement all sorts of extensions, including the one I was referring to.

There's an extension language in bazel named Skylark, which will be familiar to you if you wrote build_defs internally: http://bazel.io/docs/skylark/concepts.html

The Chromium tool chain is pretty insane. ninja, fetch, etc... There's really no excuse for this since Google has such a strong (and now open source) build system.

Look at Nix & NixOS: http://nixos.org/ It would be interesting to see a comparison of Bazel/Blaze to Nix.

Wow, sounds like enterprise Gentoo, in a good way.

So the builds are reproducible automatically?

See http://bazel.io/docs/FAQ.html, "Will Bazel make my builds reproducible automatically?

For Java and C++ binaries, yes, assuming you do not change the toolchain. If you have build steps that involve custom recipes (eg. executing binaries through a shell script inside a rule), you will need to take some extra care:

Do not use dependencies that were not declared. Sandboxed execution (–spawn_strategy=sandboxed, only on Linux) can help find undeclared dependencies.

Avoid storing timestamps in generated files. ZIP files and other archives are especially prone to this.

Avoid connecting to the network. Sandboxed execution can help here too.

Avoid processes that use random numbers, in particular, dictionary traversal is randomized in many programming languages."

Specifically, people should note that many code generators are not carefully designed for strict reproducibility, and will stick time stamps in generated output.

Even if you undo that, code generation tools are liable to at some point traverse a dictionary without caring about whether the result is deterministic. I spent some time at Google fighting with antlr to try to get it to have deterministic output and I still think that I left some corner case uncovered.

When you say Java, does that include Android? I see that Android is supported, but couldn't find anything about reproducibility.

Reproducible Android builds would be very interesting.


It looks like they only have iOS and not Android in this first release, but they are planning on adding Android support ~June of this year.

Yes, Blaze and a hojillion computers will give you a spiffy build system. The public now has the former, but not the latter :)

One piece at a time! Also, who is to say that Google's way of orchestrating those hojillion computers is best? Separating the two pieces, as has been done here, makes it possible for others to create different (and maybe better) orchestrations.

I've been burned by so many build tools over the years. I've finally settled (for C/++/asm) on the combination of Make + ccache: I build a _very_ paranoid Makefile that recompiles everything if it feels like anything changes. For instance, every rule that compiles a C/++ file is invoked if _any_ header/inc/template file changes. I let ccache do the precise timestamp/check-sum based analysis. The result is that (for large builds < 10MMLOC) I rarely wait for more than a few hundred milliseconds on incremental, _and_ I have confidence that I never miscompile.

I just wish that I had a high-performance replacement for linking that was cross-platform (deterministic mode for ar), and for non-C/++ flows. Writing a deterministic ar is about 20 lines of C-code, but then I have to bake that into the tool in awkward ways. For generalized flows, I've looked at fabricate.py as a ccache replacement, but the overhead of spinning up the Python VM always nukes performance.

> I build a _very_ paranoid Makefile that recompiles everything if it feels like anything changes.

Do you have some kind of way to verify that your makefile dependencies conform to your source dependencies? Is clang/gcc tracking sufficient for your use case? What about upgrading the compiler itself, does your makefile depend on that? If so, how?

Have you considered tup[0]? Or djb-redo[1]? Both seem infinitely better than Make if you are paranoid. tup even claims to work on Windows, although I have no idea how they do that (or what the slowdown is like). Personally, I'm in the old Unix camp of many-small-executables, non of which goes over 1M statically linked (modern "small"), so it's rarely more than 3 secs to rebuild an executable from scratch.

> (deterministic mode for ar)

Why do you care about ar determinism? Shouldn't it be ld determinism you are worried about?

[0] http://gittup.org/tup/

[1] https://github.com/apenwarr/redo

> Do you have some kind of way to verify that your makefile dependencies conform to your source dependencies?

Nope. I explicitly use a conservative approximation—this guarantees correctness, over speed. Building everything every time with a clean tree is where I begin; I start optimizing after that.

> Is clang/gcc tracking sufficient for your use case? What about upgrading the compiler itself, does your makefile depend on that? If so, how?

Self-rewriting Makefiles (to consume the .d files), combined with the cleaning necessary for them, become a large technical debt—especially given the complexity of the Makefile needed to generate them. Modern CCen just aren't capable of this. Perhap Doug Gregor's module system will land in C21/C++21, and we'll see some good, then.

> Have you considered tup[0]? Or djb-redo[1]?

Yes. They are both don't provide significantly better correctness guarantees combined with sufficiently better performance to justify the cost to porting to older Unixen. (This is a consensus opinion at my shop; I, personally, enjoy tup.)

> Why do you care about ar determinism? Shouldn't it be ld determinism you are worried about?

Determinism let's me cache *.o/a/so/dylib/exe/whatnot without getting false-positives due to time-stamp changes and owner/group permissions in the obj/ar files (see ar(1)). ld is deterministic under all the CCen I use by setting the moral-equivalent of -frandom-seed.


> this guarantees correctness, over speed.

Wouldn't "promotes" be a better word? what guarantee do you have?

> Self-rewriting Makefiles (to consume the .d files), combined with the cleaning necessary for them, become a large technical debt—especially given the complexity of the Makefile needed to generate them. Modern CCen just aren't capable of this.

Haven't needed it in a long time, but back when I did generating one for me was all of running the compiler with "-MD" in the compile phase, and including it in the Makefile - no special "make depend" phase, no noticeable slowdown. What technical debt are you ref

> Yes. They are both don't provide significantly better correctness guarantees combined with sufficiently better performance to justify the cost to porting to older Unixen.

Interesting. It is my experience that redo (from apenwarr) is trivial to run and use anywhere there's Python and isn't Windows -- it's almost as fast as Make, and it makes correctness guarantees that Make cannot (e.g., .o file replacement is atomic).

Maybe... 'prefer'. I'm more confident that a really conservative Makefile will build my code correctly.

My issue with -MD was not that it didn't provide precise (and correct!) dependencies; my issue was that the build system's most mysterious breakages are when modules (and dependencies) are changed. In that case, there are three situations:

1. Your .d files are out-of-date, and thus your build is broken;

2. You have to have a policy of "updating the .d files"; or,

3. Your makefile has to be .d savvy.

The last option is the one I see most often taken, but with rare success.

> it makes correctness guarantees that Make cannot (e.g., .o file replacement is atomic).

I wish Make wasn't so entrenched.

Correct, reproducible, fast builds for everyone not running Windows

Convince your employer to ship a half decent unix environment with its OS and it will run on windows too. It's mostly a choice by microsoft to ship a half-baked command line interface with its products, you can't blame google for that.

By 'half-baked' you mean 'not * nix compatible' command line. Powershell is amazing.

I'm a Microsoft fanboy and Powershell is bad Python.

How 'bout we meet in the middle and y'all make the changes the Cygwin guys need? And hey, where'd that POSIX subsystem go? Finally got C99 support almost shipped now how 'bout you reverse that other horrible decision, too?

How's it go with Cygwin or mingw?

I just want to boost big time for Cygwin, it really is amazing how much it improves the experience of using a Windows computer. It gives me the tools I want/need (gcc and python) and the programs I write run without modification in Cygwin, Linux and OS X!

Also, the default terminal emulator that comes with Cygwin, called MinTTY, is fantastic. It doesn't have the flashy features of many native Linux clients, but it has all the basic features one could want (full color support, nice fonts, very durable, no 'gotchas').

The only problem is that support outside of C-based languages is touch-and-go. Mainline Python works flawlessly since it's implemented in C, and all it's tools also work flawlessly (virtualenv, numpy, scipy, matplotlib, django, flask, etc), but other languages like Go or Rust don't really support Cygwin. I'm unaware how well other languages are supported (e.g. Ruby).

So Cygwin will definitely "give you that Unix feeling" on a Windows machine, but it can't always replace a virtual machine.

Whenever I'm stuck on Windows, Cygwin is essential. (I use ssh from the Cygwin terminal.)

It's also great in production - tolerable command-line remote administration, and great for e.g. Nagios plugins that are bash scripts (vastly better than attempting to find a Windows-native plugin that someone else wrote).

Not really sure why this is getting downvoted - it's kind of an important detail when choosing a build system if you have to do multiplatform deployment.

I didn't downvote him but it sounds like "he's driving angry". He does work for that "evil" company in Redmond. :-). Maybe he can help make F# a great cross-platform language and increase the goodwill? Microsoft did an incredible job with F# but it really only runs well on Windows.

If he works for MS he has zero right to even be mildly annoyed, since we have for decades been living in a world where "only runs on Windows" isn't even in the small print.

I've run into some (but def. not all) current and former MS folks who carry this ironic shoulder-chip. One such indignantly complained about a major open-source project's janky Windows support something like "well, that's just because they choose not to support the platform!"

I found this really interesting, as after a bit of conversation, the speaker was clearly unaware of how MS' technical and business models around Windows have impeded open source work. A for-pay operating system with a profit-center development toolchain presents a very large barrier to Unix-centric OSS projects. Not to mention the numerous technical impedance mismatches between the $unix and Windows worlds. (And these days we have tools like libuv to help with that, but still.)

What particular problems on non-Windows do you have with F#? I've been running F# applications in telecom, in production, on Mono, for several years.

F# doesn't really seem to be the factor there at all, it's just general .NET support. In fact, F# can do a bit better than C#, as F# actually includes a static linker.

Strange, most blogs that I've read discuss the pain involved.


I guess it's the same with any new product or language: when there are a lot more great testimonials than cries of pain, then it's actually safe to use that product.

Those complaints mostly seem to be about the developer experience on Mac, which, yeah might suck. I use VS + vim, build on Windows, then copy over to Linux for deployment.

The biggest issue is with complicated frameworks, like ASP.NET, since there could be all sorts of runtime things missing. Fortunately with MS's new open source kick, this should be a thing of the past relatively soon,

So, you just told me to buy a Windows machine. Kind of funny because our little subthread started because the Microsoft F# developer didn't want to be told to buy a Linux or Mac.

The world we're all looking for is one where we'd all like to mix and match our software as much as possible, and not be told to buy a different computer.

Fair enough; I went off track just about running (executing) F# code.

But they're already working on doing that.

I didn't say they weren't. I imagine the project needs lots of help. Is there an ETA? Will the ports be kept in sync with the Windows versions so we don't need to wait years for updates? It's a lot of work.

> Is there an ETA?

I think their goal was a year for everything under .Net to get ported but I don't know off hand. You can already use it via Mono if you wanted to play with it today.

> Will the ports be kept in sync with the Windows versions so we don't need to wait years for updates?

It's all being opened sourced. Every week Microsoft open sources more of their .Net platform and language tools. It will be compilable on all platforms. So yes.

And every court case fights to close off other things.

I am pretty sure it has nothing to do with the average population being biased.

s/not running Windows/not running Windows or refusing to install a free VM/

5 or 6 years ago I had to have Windows to run CAD software, but I found it easier to have a virtualbox install w/ Ubuntu in it for software development than trying to write code on Windows. The performance was good enough an the usability was pretty good. I imagine it has only gotten better since then.

One company I work with has a firewall of a certain brand. It only works with windows or a mac, you can't connect to it from a command line it needs some stupid app that you download and install just to make a VPN connection.

Not having a unix specific build system work on windows seems to be pretty much the expected behaviour. As opposed to a firewall that runs linux internally requiring windows or OS/X to talk to it...

I struggled with Cisco AnyConnect's linux client until I eventually found the open-source OpenConnect replacement. The Cisco website even detected that I ran linux and offered the linux client for download, whereupon it dutifully pushed the Windows client... The crappy linux client I did have had to come from our cloud vendor.

OpenConnect 'just worked', thankfully.

The one I'm battling is a Fisher-Price based firewall from a company called Palo-Alto. It makes the complex simple and the simple complex.

Using a VM would only work if the individual build tools themselves run under Unix. That said, no open source project owes anybody anything.

Not quite. You can map shared folders to the guest and edit the code in the VM with your favorite editor, but build it on Windows.

...Yet. It is open source: you are welcome to port it to Windows and I'm sure they would be happy to accept your patch.


If you've got the expertise to add it, sounds like it would be welcomed.

a good point - which I why I will be sticking with CMake for my cross-platform build needs.

While running bazel isn't supported on Windows, you might be able to generate Windows binaries by cross-compiling from Linux.

Or Plan 9, Haiku, OS/2, Amiga OS...

With newer Windows servers you can use HyperV to run *nix apps

This is an open sourcing of Google's internal build tool.

I know it as Blaze, which Bazel is an anagram of. Many files in the source have references to Blaze.

Is Google departing from just throwing white papers over the wall and let community figure out the implementation details? blaze white paper was dropped a while ago and there are already two clones in Pants and Buck at Twitter and FB. It would be interesting to see how far off clones are from original implementation.

Do you have a link to that white paper? A quick search on their research site doesn't really yield any results.

I'm a developer on Bazel, and AFAIK there is no white paper. We definitely don't want to "throw it over the wall," we're going to try to push more and more development into the open over time.

There's not a white paper, but there's this series of posts: http://google-engtools.blogspot.com/2011/06/build-in-cloud-a...

(and an accompanying presentation)

Ah, OK, thanks. I do get a kick out of reading various Google white papers (GFS, Spanner, and the Multi-Paxos implementation for Chubby come to mind), so I'd definitely be interested in the prospect of a future white paper :).

Getting rid of the timestamps in jar files is a huge improvement. I really hate it that when I recompile some huge java project I can't run a checksum on the jar to verify that the build is identical to a previous run (or when being dumped into some project that my current source tree is an accurate reflection of what is running in production).

I had a bit of a read but I didn't find where it explains (code or doc) how it achieves reproducible builds.

It seems like a stricter, huge make-like harness (in fact it reminds me of the mozilla firefox python build system a bit).

It's not bad by any means, but it seems like to me it doesn't "magically" fix the "be reproducible" problem at all (which is what it seem to claim)

Am I missing something?

You are absolutely correct: Bazel by itself does not make your builds reproducible. If a tool calls rand() or bakes the current time into its output, reproducibility goes out of the window.

What Bazel does, however, is to make it possible to run build steps in a sandbox (although the current one is kinda leaky) so that your build is isolated from the environment and thus behaves in the same way on any computer. It also tracks dependencies correctly so that it knows when a specific action needs to be re-run.

This makes it possible to diagnose non-reproducible build steps easily. At Google, the hit rate of our distributed build cache usually floats around 99%, and this would be impossible without reproducible build steps.

Does work done by Debian to make Linux packages build reproducibly help Bazel?


Would Bazel help with the remaining long tail of packages in Debian?

Conceptually, your build results should be a pure function of your source tree. If I understand correctly, within Google, the cross-compilers are actually checked in to the source tree, so that the distributed jobs will use the same compiler to build your code. It seems like currently bazel only uses whatever is in /usr/bin though[0]. For Java compilations, bazel additionally has its own jar builder that sorts the filenames and zeros the timestamps within the zip file[1].

[0]: https://github.com/google/bazel/tree/master/tools/cpp [1]: https://github.com/google/bazel/tree/master/src/java_tools/b...

You're right - it doesn't magically solve build reproducibility. Bazel pushes you towards a build configuration where you have to describe (in a terse way) the entire dependency graph of what is being built. It allows Bazel to be smart about where in the graph things are stale.

If you run a script that outputs intermediate files, Bazel needs to know about that scripts inputs and outputs. And it works better if it knows them ahead of time.

I invented the Python bits of the Firefox build system (moz.build files). I learned after I implemented them that Google's internal approach with Blaze was very similar. It felt reassuring that I independently reinvented a similar solution :)

There are a handful of Blaze derivatives built by Xooglers. Pants and Buck come to mind. They also share the trait of using sandboxed Python to define a build configuration. I'll take it over make syntax any day!

It's not magic; you have to work at it. (For example, make sure that zip doesn't put timestamps in the file.) But it's designed so that code generators should act as pure functions from input files to output files, and many generators actually are, especially the built in ones. If you do this then the build system will help you.

Writing generators to run this way is kind of a pain, actually, sort of like writing code to run in a sandbox. Also, the generators themselves must be checked in, and often built from source. But we consider the results worth it.

I gather that it runs builds inside a chroot where the only available files are the dependencies you specified explicitly (including the compiler[1]), at least in "strict" mode[2]. Or else it must monitor what files are opened during the build step and fails the build if it saw an unexpected file being opened.

It never explains any of this explicitly, but there are hints. [1], [2], [3].

[1] "Many rules also have additional attributes for rule-specific kinds of dependency, e.g. 'compiler'" -- http://bazel.io/docs/build-ref.html#types_of_dependencies

[2] http://bazel.io/docs/build-encyclopedia.html#cc_binary.hdrs_...

[3] "The build system runs tests in an isolated directory where only files listed as 'data' are available" -- http://bazel.io/docs/build-ref.html#data

Edit: A comment below seems to suggest that this is not the case: "Within Google we use a form of sandboxing to enforce that" (emphasis mine). -- https://news.ycombinator.com/item?id=9259147

Surprisingly, significant parts of the code is not open source. According to this page, http://bazel.io/docs/governance.html,

   Is Bazel developed fully in the open?

   Unfortunately not. We have a significant amount of code
   that is not open source; in terms of rules, only ~10% of 
   the rules are open source at this point. We did an 
   experiment where we marked all changes that crossed the
   internal and external code bases over the course of a few 
   weeks, only to discover that a lot of our changes still 
   cross both code bases.

I don't think you're interpreting that section quite right. That section is talking about whether or not Bazel is fully _developed_ in the open, and the answer is "Unfortunately not".

What they mean is that changes to the internal source of Blaze often involve changes to both the open sourced part, which is Bazel, and the closed parts, which are additional rules that are neither open sourced, nor included in Bazel (Blaze has about 5x as many rules as Bazel).

It's best to make atomic changes, so rather than split the changes, review and submit the open source changes externally, and the closed rules changes internally (which would complicate reviews, testing, syncing and rollbacks), then pull in the external changes, they submit these cross-code-base changes internally, then dump the change into the external repo. The next paragraph on that page makes it clear that the code is open, even if not all of the development process is.

To be clear, all of Bazel is open source and the source is available here: https://github.com/google/bazel

Can you explain or give an example of a "rule", it's unclear what this means to me.

http://bazel.io/docs/build-ref.html#rules For example, cc_binary is a rule. Rules are the things that know how to take whatever is specified as it's inputs, do something to them, then produce some specified set of outputs

Google has a large number of rules (IE far far more than just the rules you see in bazel). As part of open sourcing, they have stared out by open sourcing about 10% of those rules.

Some of this is because they are google-entangled. Some of them don't make sense to the open source community. etc

Currently about 60% of our code (in terms of lines of Java code, excluding tests) is open sourced. The rest is glue logic to internal Google systems or build rules that we haven't open sourced. Some of these rules, we are planning to open source in the future, and some others are specific to Google, so they don't really make much sense in the open source tree.

What about skyframe? http://bazel.io/docs/skyframe.html looks like an overview without any examples. Couldn't find any references to it in the bazel code at github too.

https://github.com/google/bazel/tree/master/src/main/java/co... is the implementation of the skyframe engine (the general-purpose memoizing, incremental, functional evaluation framework) and https://github.com/google/bazel/tree/master/src/main/java/co... contains the implementation of bazel-on-top-of-skyframe.

Do they mean that 10% of the original Blaze rules are now open source or that 10% of the Bazel rules they've released are open source?

The former.

What would be needed to get this to work with Haskell?

I read in the "Getting started":

> You can now create your own targets and compose them.

So does this mean it is a replacement for `make`? => Yes

Found the answer here: http://bazel.io/docs/FAQ.html

If you're interested in adding rules for a new language, check out Skylark: http://bazel.io/docs/skylark/concepts.html.

It's another impressive feat from Google and reading the comments I've kind of established that

1. Binaries are checked in to source 2. It's more structured than Gradle 3. It's for very large code bases 5. It's nix only


1. We've already had the "chuck it in a lib directory" approach. The distributed approach maven/ivy etc seems to be working for the millions of developers out there who just have to get through the end of the day without production going up in flames. I suppose it's like moving a portion maven central into your code base. Checked in. Feels very odd, and kinda against one of the pillars of JVM: Maven. Love it or hate it it's one of most mature build/repository types out there. npm, bower anyone?

2. Got to agree with astral303. This isn't really something to shout about. Better reproducibility? Gradle/SBT have had incremental builds for quite a while. We all know there's no silver bullet, if you don't declare your inputs and outputs to gradle/blaze tasks or seed with random values then you're only going to get unrepoduceable builds.

3. Very large, I get that.

4. Very large code bases tend to enterprise systems. Enterprise systems tend to have a plethora of platforms/OSs so it being nix only is a drawback. However I suppose that if in charge of 10MLOC code base then I could mandate nix only builds? However in my experience they also tend to gravitate towards standards that seem to have longevity.

I'm yet to give it a go so I'll reserve final judgement. However I will say that I do wonder how far we'd be if Googles through their brightest minds at and worked with Maven/Gradle/SBT etc to scale their builds. (Yes I realise it's multi-lang - so is gradle). Perhaps the whole community would benefit from performance benefits.

Anyway hats off Google guys. It looks impressive and no doubt I'll jumping all over it in 12 months. In the mean time I'm off to go read up on Angular 2.0, or Typescript or ES6 or ES7 or whatever else I need* to know to get me through the day.

Really I'm just jealous I don't have 10MLOC code base :D

I don't know about Bazel, but Blaze doesn't "check in binaries". Build artifacts are cached, but not "checked in".

The problem with maven and gradle is that their build actions/plugins can have have unobservable side effects.

This approach is more 'pure functional'. You have rules which take inputs, run actions, produce outputs and memoize them. If inputs don't change, then you use memoized outputs and don't run the action.

As long as your actions produce observable side effects in the outputs (and don't produce side effects which are not part of the outputs, but product state which depended upon in some manner), then you can do a lot of optimizations on this graph.

In my experience with maven and gradle, they are way way slower, and that's on relatively small projects

Apologies for comment- I'd just gotten home from the pub was drunk :D

I look forward to trying it out. The ObjectiveC rules sound interesting especially given the state of XCode which is a laughable IDE.

If i'm sticking to primarily Java; is there a benefit to using Bazel as opposed to Maven / Gradle / Sbt ?

At first impression, unless you have a single gigantic source code base, unlikely. From their FAQ:

>> "Gradle: Bazel configuration files are much more structured than Gradle's, letting Bazel understand exactly what each action does. This allows for more parallelism and better reproducibility"

The value of "more parallelism" depends on the complexity of your Java source code base. I can easily imagine why this extra structure can lead to more parallelism.

However, I am not buying "better reproducibility" without justification or explanation. I've had very reproducible Maven builds for years (and I don't see how Gradle would be different). So I would love to know which aspects are improved upon with this structure, if someone could expand or explain.

Finally, I'm very wary of "much more structure". The worst thing about Maven is its extreme insistence on structure and schema and very specific architecture of your build tasks and components. In contrast, with Gradle, you can freely shape your build scripts to reflect the "build architecture" of your source tree in a minimal, maintainable way. Furthermore, when your application's needs change, refactoring your build is far easier in Gradle, thanks to its internal-DSL style (the build script is code).

If the structure isn't "free", you pay for structure with reduced build script development speed. For Google, it's a tradeoff worth having with that massive source tree.

I work on Bazel.

We've put a bunch of work into making sure that we know about every file that goes into the Java compilation, and if any of them changes (and only then) do we recompile. Within Google, we use a form of sandboxing to enforce that.

You're also right that it isn't free - we have reason to believe that larger projects and larger teams will see benefits from using Bazel. Use your best judgement.

blaze is nothing remotely like the wall of cruft that maven forces you to climb for everything you do. I would describe it as "almost entirely unlike maven".

The Bazel query language has a far nicer syntax than Maven's XML without the risk of Gradle's full procedural language Groovy.

Oh, but my favourite option "blaze menu" is missing :)

Huh. I never knew that was there. I'll remember this next time I'm around Charleston.

A couple of questions:

* If I have a Maven-based project with heavy reliance on pre-built jars from Maven Central, what's the recipe to port it to Bazel?

* Related, if I have multiple github repos, say a couple open source libraries and a couple private repos, what's a good recipe in conjunction to Bazel?

Check out http://bazel.io/docs/build-encyclopedia.html#maven_jar. In the root of your build, specify the jars you want from maven and then add them as dependencies in your BUILD files. The first time you run "bazel build", they'll be downloaded and cached from then on. It's somewhat limited in functionality at the moment, but should work for basic "download and depend on a jar".

For multiple Github repos, use http://bazel.io/docs/build-encyclopedia.html#http_archive or http://bazel.io/docs/build-encyclopedia.html#new_http_archiv... (depending on if it's a Bazel repository or not). Let us know if you have any questions or issues!

Thanks for the tips. I'm super-hyped that blaze was open sourced, it is one of the best systems I've ever had the pleasure to work with.

A couple more questions :)

* Any pointers for adding Scala (sbt?) support? I'd start here: http://bazel.io/docs/skylark/rules.html.

* Suppose I develop using multiple repos and http_archive. I'd like to make changes both to a library and to a project that depends on it simultaneously, without committing the library patches to master github repo just yet. Is there a way to configure the http_archive, let's say by saying "bazel --mode=local", and have it customize the remote archive http to use a different url (say, my github's fork instead of the master github) for that build?

Scala support: yes, add using Skylark. Definitely let us know if you run into any rough edges, Skylark is a work-in-progress.

For multiple repos: there's no command line flag, but you could change the WORKSPACE file to use http://bazel.io/docs/build-encyclopedia.html#local_repositor.... Unfortunately, this may be of limited use to you. At the moment it's optimized/bugged to assumed that your local repos don't change, so it won't rebuild them (this is great for things like the JDK and gcc, but not so much for actual in-development repos). Feel free to file feature requests for any functionality you need, I'll be working on this a lot over the next couple months.

Regarding Maven: - how do you resolve artifacts (eg. are you using Aether)? - are you supporting classifier and type for dependencies?

Here's the Gradle Team's perspective on Bazel


Any reason the python support was ripped out? I've got my suspicions about not wanting/not being able to properly release the python packaging method in use internally, but I'm curious if I'd be tilting at windmills to try and get it to output pexes.

I suspect the reason was: "They need to start with something and go from there".

So they started with the use cases likely to be the most popular.

Additionally, there are definitely cases where the implementations of rules at Google are a morass, and rather than dump it on the open source community, it makes more sense to clean them up when they get rebuilt.

The same question about JS. Closure Compiler never made much sense for me without blaze.

If only our code search and code review systems were public too.

BTW, do you have blaze build for gwt? ant seems unwieldy for me.

Internally to google, gwt_application, gwt_module, gwt_test is a built-in rule. GWT itself is built with blaze internally (not ant) as well.

Do you have plans to open this stuff?


How does it compare with Java 9's sjavac (http://stackoverflow.com/a/26424760/750563)?

EDIT: I fully understand that this is a build tool for multiple languages. But its raison d'etre is speed. So I'm asking what techniques does Bazel use to accelerate builds and how do they differ from those used by sjavac, which is also designed to accelerate builds of huge projects?

I work on Bazel.

Bazel also builds other languages, such as C++ and Objective-C.

We do invoke the Java compiler through a wrapper of our own. We think we can make that work as a daemon process to benefit from a hot JVM, but haven't gotten round to that.

Any plans on supporting Windows? That will definitely increase the adoption of Bazel.

http://bazel.io/docs/FAQ.html - "What about Windows?

We have experimented with a Windows port using MinGW/MSYS, but have no plans to invest in this port right now. Due to its Unix heritage, porting Bazel is significant work. For example, Bazel uses symlinks extensively, which has varying levels of support across Windows versions."

In other words: it's a lot of work, and frankly, our team doesn't know enough about windows to be very good at porting it. We would welcome contributions to make it work on Windows, of course.

That would be great. I'm not a Windows user, but having Windows support is a pre-requisite for adoption in many corporate environments, and proper symlinks are available since Windows Vista.

Thanks for the feedback! Hopefully the community will contribute..

Do you also use timestamps like sjavac or some other mechanism, like hashing?

Bazel uses checksums to determine if builds are up-to-date, but shortcuts the checksumming if metadata (timestamp, filesize) have not changed between builds.

Your FAQ says this:

Bazel configuration files are much more structured than Gradle's, letting Bazel understand exactly what each action does. This allows for more parallelism and better reproducibility.

Could you please elaborate on that (i.e. with regards to both parallelism and reproducibility)?

I'm not a Gradle expert, so take this with a grain of salt.

Gradle a single-threaded execution in some parts (I believe it may be configuration?), because the build-rules are written in a full blown language with access to the filesystem, and internal parts of Gradle.

In Bazel, rules have to declare inputs and outputs, and this can be enforced with sandboxing. This allows to predict that two rules do not interfere with each other, so we know we can run them in parallel. Our extension language disallows direct access to the file system, and also forbids access to other sources of non-determinism, such as hash tables and the clock.

Ah, OK. BTW, while Gradle doesn't enforce it, even tasks that directly touch the filesystem may (and should) declare their inputs (that can be files or other tasks) and outputs[1] so that their freshness can be evaluated. All default tasks (like compilation), do that out of the box.

[1]: 15.9 in https://www.gradle.org/docs/current/userguide/more_about_tas...

I wonder if this could be taken even further, like using inotify-like change detection on custom srcfs/objfs like filesystems so that you filter down the number of files you need to rehash.

I don't think they're even related, Bazel is a general build tool, sjavac looks like a smarter Java compiler ?

... that exploits parallelism and caching (and a hot VM) to accelerate build of huge projects, and supports build clusters.

Inside Google we do that as well, but it's actually distinct from bazel. Bazel is a build language, and I believe they provide a reference implementation. It's not unimaginable (and as I mentioned, it has been done internally) to use caching, incremental, distributed builds. In fact, it was originally designed with those goals in mind.

huge Java projects...why would you compare a general build tool to a language-specific one?

What difference does that make? I am not saying you could replace this tool with sjavac, but as they both tout speed as their main feature (and justification), I am wondering whether they both employ the same techniques or different ones.

Is this the tool that Google uses to build its Golang source? Or is that something else which is not available?

The Golang source code for the server code at google is built with this tool. The rules that accomplish this are rather complex due to their interactions with our C++ libraries, and predates the open source "Go" tool. The experience with the Google internal rules, motivated some of the choices in the "go" tool, I believe.

If you're interested, hanwen wrote a bunch rules with similar semantics as the internal rules, see https://github.com/google/bazel/tree/master/base_workspace/e... .

It would be nice to make these semantics match the external ones better, but it requires us to open up more tooling, so people won't need to write BUILD files.

There's a typo in your link. Should be:


In what cases would using Bazel make sense to build Go projects? If they're extremely large? If they have a lot of dependencies on code in other languages? If you need sophisticated build/release tooling?

BTW, thanks for the release! Will have a fun time digging through this over the next few days. I heard some murmurs that Blaze was going to be open sourced from around the watercooler but didn't think it'd be so soon.

I guess if you want to integrate Go tools with builds in other languages. If you are using pure Go for your entire ecosystem, there is not much point in using Bazel, as the "go" tool is very capable for that scenario.

You can build golang from source pretty easily. If I remember right, it's just downloading the tarball and running ./all.bash or something like that.

> Why doesn't Google use …? Make, Ninja: These tools give very exact control over what commands get invoked to build files, but it's up to the user to write rules that are correct.

> Users interact with Bazel on a higher level. For example, it has built-in rules for "Java test", "C++ binary", and notions such as "target platform" and "host platform". The rules have been battle tested to be foolproof.

But does it give the optional custom level of control that for example CMake + Ninja provide? Or it's only high level rules?


You can [at least internally] define custom rules to handle pretty much anything, in almost-but-not-quite-python.

From the FAQ :

Multi-language support: Bazel supports Java, Objective-C and C++ out of the box, and can be extended to support arbitrary programming languages.

c'mon, not even the Go language from Google itself ?

Maven doesn't work so well when there are loads of small self contained 'micro-libraries' (yes, sub-projects, but they are so involved to set up they almost defeat the purpose). Was considering pants -- which doesnt seem like it has great adoption? -- but this seems like its substantially more fully featured.

Presumably will also make opensourcing internal projects easier. That can't be a bad thing :)

WRT to Java support: Since it doesn't appear to generate poms or publish to maven repositories it doesn't seem very useful on the open source part of things. It seems explicitly for generating internal, proprietary software from a monolithic source tree. I would have much rather seen the incremental compiler and jar generator integrated to maven than replacing the entire build system.

Actually Maven 3.3 was released recently which has a smart builder for building separate parts in parallel, and using Takari plugins you can use the Eclipse complier which is parallelising in itself. See http://takari.io for more details.

I worked at Ning for a couple of years (http://www.ning.com/) and the internal codename of our create-your-own social network was Bazel.

When I first saw the headline I thought they'd open-sourced it.

The "b"-with-leaves-sprouting-from-it logo is also used by http://beanstalkapp.com/

Will GYP/GN be deprecated in favor of Bazel?

What, if any, does the convergence among these projects look like longevity-wise?

Will there ever be Windows support?

This seems very promising. Does anyone know if this would this work with the OSGI framework?

Fast - compared to what?

Wohoo! This is awesome :)

depends what you mean with reproducible: build a jar twice, and its md5sum will change because there are timestamps in the archive.

What is this lameness? https://github.com/google/bazel/tree/master/third_party - why not use gradle repos to download jars with known hashes? Sticking all those jars in the git repo is just... well, I expected better from Google.

Try not to be so rude.

The FAQ is pretty clear about their reasons. It talks about tools, not other dependencies, but I'm sure the reasoning is the same: "Your project never works in isolation... To guarantee builds are reproducible even when we upgrade our workstations, we at Google check most of these tools into version control, including the toolchains and Bazel itself."

It's a sensible policy and one I use myself. Do you have a better reason for disliking this policy than a knee-jerk "yuck?"

Right, I'll try not to be so grouchy :-D

Some reasons are the bloat, the possibility of "accidental" forks when a non-upstream version is compiled and checked-in binary-only, crufty old versions hanging around, and security problems. It adds extra work for downstream packagers having to pick it apart for distros.

Bundling gets particularly bloaty for git repos, since the history is always included in each clone. For perforce or SVN it doesn't matter so much as you only get the latest version of everything. In git each time there's a dependency update, it will pretty much add the size of the new jar to the .git directory. Over time it's going to grow huge. If at a later date the repository owner decides on a new policy where the third party files are not bundled, then even removing the directory from the current head doesn't shrink the repo size.

There are binaries in there for Mac, Linux and Windows (.exe file at least). You either need one or the other, not all at the same time.

This sort of thing is fine for proprietary software used in a controlled environment, but for open source it looks kludgy.

An alternative could be to have a "dependencies" repository that would be shallow-cloned as needed. At least that way the source code repo only would have source in it, not jars or executables. It'd ensure separation was enforced and you could still track requirements per version or change the policy later.

Tbh, if your company relies on this software, I would also make sure that it cannot just vanish - and thats the most efftive solution. Artifacts can disappear from the internet and you don't know if the downloaded stuff is still the same as before. Especially, if you look outside of the maven ecosystem, but even there you have to rely on apache and their partners. An outage can mean that you cannot deploy critical bugfixes to your platform.

This is why you should have repository manager like JFrog Artifactory or Sonatype Nexus which can transparently proxy third-party repositories (like Maven Central).

This is true for Maven. The Maven repository project is very impressive and outstanding in its kind anyways. But leaving this great repository you are often on your own. Look at what happens to Google Code. How much stuff will be lost when its shut down? Hopefully nothing that you depend on. Its nice to have the binaries in a backup maybe but without access to the code, maintenance will be a nightmare.

Both of those products also handle other kinds of package repositories as well. You're right that they don't cover the full spectrum, however.

If you have a centralized version control system such as Clearcase or SVN, it's not such a grief to have binaries in VCS, whereas its kind of a problem for git & co.

Google has a legendarily awesome centralized version control system.

"legendarily awesome centralized version control system"

I thought it was just perforce.

It started as Perforce. Then they built better wrappers around it, then they stopped using perforce commands at all, then they rewrote the wrappers without reference to Perforce at all.

So there's no longer any perforce.

It is not. It is awesome, warts and all.

Well keeping dependencies in source means no third party dependency at build time, right?

Right... but this will be an anchor on adoption. I can see why e.g. the Android build system does the same thing since it's all off in its own world anyway. I doubt you'd be popular with Linux distro packagers if you required bazel for some C library.

We realize that not everyone wants to check in all of their dependencies, and that's cool too. Check out the build encyclopedia for rules to talk to 'plain http' and maven repositories.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact