I think this could be hugely useful to very large open source projects (like databases or operating systems) that may be intimidating for contributors to build and test.
Using Bazel (aka. Blaze) every day is one of the things that has made me dread ever leaving Google. Fast, reproducible builds are amazing. Once you have used this tool, it is very hard to go back. Personally, I'm thrilled that it has been open sourced.
Nice to see a bunch of projects that've been generalizable and heavily used internally finally see the light of the outside world. Now, to start evangelizing them.
I left Google a couple of years ago and we ended up building our own rpc around protos (Thrift just doesn't cut it), and our Make/maven based build has the standard problems with such things, so I'm really looking forward to using grpc and bazel in the near future. A huge thumbs up to Google!
The api to open a connection is, again, needlessly bulky: I make a socket, I wrap it in a buffer, I wrap it in a protocol, I create a client using it, then I connect? There's the same level of complexity offered in grpc, but it's offered through an options object with sane defaults.
There's no security protocol; or, if there is, nobody seems to use it. Is there an async call structure? If there is, nobody's ever heard of it. All the code I can find seems to be written at a preschool level. This may be due to working at a company that was an early adopter, or simply because the company is staffed by preschoolers.
I think I preferred Google RPC, but it's not a huge difference to me.
It's not like we're run through a brainwashing machine when walking through the door :P
There just happen to be a lot of challenges to make things run on Google's infrastructure - and not all the tools we know and love happen to work well on it. Some of that is legacy, some of that doesn't have many parallels externally.
So, we write solutions that work well on Google's infrastructure. Some of that gets open sourced, in the hopes that it's also useful to the greater community.
But I definitely would not consider Google-written code to be superior to other solutions out there - just an alternative option to choose from - and certainly not the best for many situations.
It all boils down to dependency management in the end.
For the monolithic world:
* You're always developing against the latest version of your dependencies (or very near it).
* This comes at the cost of a continuous, but minimal, maintenance burden as upstream folk make changes.
* * However, because things are monolithic, upstream projects can change _your_ code, as well. You can be confident that you know exactly who is affected by your API change.
* * Similarly, being able to make an API change and run the tests of _everyone that depends on you_ is a huge benefit.
* You have to be more diligent when testing things that go out to users, as your code is constantly evolving.
For the isolated world:
* You can develop without fear of interruptions; your dependencies are pinned to specific versions.
* You get to choose when to pay the cost of upgrading dependencies (but typically, the cost is pretty high, and risks introducing bugs).
* * Security patches can be particularly annoying to manage, though (if you let your dependencies drift too far from current)
* During deployment, you can be extremely confident about the bits that go out.
* You can get away with less rigorous infrastructure (and maintenance costs related to that)
You can impose order within the monolithic repo by partitioning projects into their own branches or directories and only pulling down the necessary pieces.
Whether this is better than a bunch of small repos is debateable.
I think they've all moved to custom forks/implementations due to the insane SPOF that Perforce servers are (and their hardware requirements). But up til that point, heck yeah!
I didn't know Amazon was using Perforce. I interviewed someone from Amazon recently and he indicated they were on Git for most things now.
A little too optimistic :) You can't build Android, Chrome, ChromeOS, iOS apps, etc. via blaze.
EDIT #1: I see support for building Objective-C apps is already present in Bazel.
EDIT #2: Bazel uses Skylark, a Python-like language, which could be used to implement all sorts of extensions, including the one I was referring to.
For Java and C++ binaries, yes, assuming you do not change the toolchain. If you have build steps that involve custom recipes (eg. executing binaries through a shell script inside a rule), you will need to take some extra care:
Do not use dependencies that were not declared. Sandboxed execution (–spawn_strategy=sandboxed, only on Linux) can help find undeclared dependencies.
Avoid storing timestamps in generated files. ZIP files and other archives are especially prone to this.
Avoid connecting to the network. Sandboxed execution can help here too.
Avoid processes that use random numbers, in particular, dictionary traversal is randomized in many programming languages."
Even if you undo that, code generation tools are liable to at some point traverse a dictionary without caring about whether the result is deterministic. I spent some time at Google fighting with antlr to try to get it to have deterministic output and I still think that I left some corner case uncovered.
Reproducible Android builds would be very interesting.
It looks like they only have iOS and not Android in this first release, but they are planning on adding Android support ~June of this year.
I just wish that I had a high-performance replacement for linking that was cross-platform (deterministic mode for ar), and for non-C/++ flows. Writing a deterministic ar is about 20 lines of C-code, but then I have to bake that into the tool in awkward ways. For generalized flows, I've looked at fabricate.py as a ccache replacement, but the overhead of spinning up the Python VM always nukes performance.
Do you have some kind of way to verify that your makefile dependencies conform to your source dependencies? Is clang/gcc tracking sufficient for your use case? What about upgrading the compiler itself, does your makefile depend on that? If so, how?
Have you considered tup? Or djb-redo? Both seem infinitely better than Make if you are paranoid. tup even claims to work on Windows, although I have no idea how they do that (or what the slowdown is like). Personally, I'm in the old Unix camp of many-small-executables, non of which goes over 1M statically linked (modern "small"), so it's rarely more than 3 secs to rebuild an executable from scratch.
> (deterministic mode for ar)
Why do you care about ar determinism? Shouldn't it be ld determinism you are worried about?
Nope. I explicitly use a conservative approximation—this guarantees correctness, over speed. Building everything every time with a clean tree is where I begin; I start optimizing after that.
> Is clang/gcc tracking sufficient for your use case? What about upgrading the compiler itself, does your makefile depend on that? If so, how?
Self-rewriting Makefiles (to consume the .d files), combined with the cleaning necessary for them, become a large technical debt—especially given the complexity of the Makefile needed to generate them. Modern CCen just aren't capable of this. Perhap Doug Gregor's module system will land in C21/C++21, and we'll see some good, then.
> Have you considered tup? Or djb-redo?
Yes. They are both don't provide significantly better correctness guarantees combined with sufficiently better performance to justify the cost to porting to older Unixen. (This is a consensus opinion at my shop; I, personally, enjoy tup.)
> Why do you care about ar determinism? Shouldn't it be ld determinism you are worried about?
Determinism let's me cache *.o/a/so/dylib/exe/whatnot without getting false-positives due to time-stamp changes and owner/group permissions in the obj/ar files (see ar(1)). ld is deterministic under all the CCen I use by setting the moral-equivalent of -frandom-seed.
> this guarantees correctness, over speed.
Wouldn't "promotes" be a better word? what guarantee do you have?
> Self-rewriting Makefiles (to consume the .d files), combined with the cleaning necessary for them, become a large technical debt—especially given the complexity of the Makefile needed to generate them. Modern CCen just aren't capable of this.
Haven't needed it in a long time, but back when I did generating one for me was all of running the compiler with "-MD" in the compile phase, and including it in the Makefile - no special "make depend" phase, no noticeable slowdown. What technical debt are you ref
> Yes. They are both don't provide significantly better correctness guarantees combined with sufficiently better performance to justify the cost to porting to older Unixen.
Interesting. It is my experience that redo (from apenwarr) is trivial to run and use anywhere there's Python and isn't Windows -- it's almost as fast as Make, and it makes correctness guarantees that Make cannot (e.g., .o file replacement is atomic).
My issue with -MD was not that it didn't provide precise (and correct!) dependencies; my issue was that the build system's most mysterious breakages are when modules (and dependencies) are changed. In that case, there are three situations:
1. Your .d files are out-of-date, and thus your build is broken;
2. You have to have a policy of "updating the .d files"; or,
3. Your makefile has to be .d savvy.
The last option is the one I see most often taken, but with rare success.
> it makes correctness guarantees that Make cannot (e.g., .o file replacement is atomic).
I wish Make wasn't so entrenched.
How 'bout we meet in the middle and y'all make the changes the Cygwin guys need? And hey, where'd that POSIX subsystem go? Finally got C99 support almost shipped now how 'bout you reverse that other horrible decision, too?
Also, the default terminal emulator that comes with Cygwin, called MinTTY, is fantastic. It doesn't have the flashy features of many native Linux clients, but it has all the basic features one could want (full color support, nice fonts, very durable, no 'gotchas').
The only problem is that support outside of C-based languages is touch-and-go. Mainline Python works flawlessly since it's implemented in C, and all it's tools also work flawlessly (virtualenv, numpy, scipy, matplotlib, django, flask, etc), but other languages like Go or Rust don't really support Cygwin. I'm unaware how well other languages are supported (e.g. Ruby).
So Cygwin will definitely "give you that Unix feeling" on a Windows machine, but it can't always replace a virtual machine.
It's also great in production - tolerable command-line remote administration, and great for e.g. Nagios plugins that are bash scripts (vastly better than attempting to find a Windows-native plugin that someone else wrote).
I found this really interesting, as after a bit of conversation, the speaker was clearly unaware of how MS' technical and business models around Windows have impeded open source work. A for-pay operating system with a profit-center development toolchain presents a very large barrier to Unix-centric OSS projects. Not to mention the numerous technical impedance mismatches between the $unix and Windows worlds. (And these days we have tools like libuv to help with that, but still.)
F# doesn't really seem to be the factor there at all, it's just general .NET support. In fact, F# can do a bit better than C#, as F# actually includes a static linker.
I guess it's the same with any new product or language: when there are a lot more great testimonials than cries of pain, then it's actually safe to use that product.
The biggest issue is with complicated frameworks, like ASP.NET, since there could be all sorts of runtime things missing. Fortunately with MS's new open source kick, this should be a thing of the past relatively soon,
The world we're all looking for is one where we'd all like to mix and match our software as much as possible, and not be told to buy a different computer.
I think their goal was a year for everything under .Net to get ported but I don't know off hand. You can already use it via Mono if you wanted to play with it today.
> Will the ports be kept in sync with the Windows versions so we don't need to wait years for updates?
It's all being opened sourced. Every week Microsoft open sources more of their .Net platform and language tools. It will be compilable on all platforms. So yes.
5 or 6 years ago I had to have Windows to run CAD software, but I found it easier to have a virtualbox install w/ Ubuntu in it for software development than trying to write code on Windows. The performance was good enough an the usability was pretty good. I imagine it has only gotten better since then.
Not having a unix specific build system work on windows seems to be pretty much the expected behaviour. As opposed to a firewall that runs linux internally requiring windows or OS/X to talk to it...
OpenConnect 'just worked', thankfully.
If you've got the expertise to add it, sounds like it would be welcomed.
I know it as Blaze, which Bazel is an anagram of. Many files in the source have references to Blaze.
(and an accompanying presentation)
It seems like a stricter, huge make-like harness (in fact it reminds me of the mozilla firefox python build system a bit).
It's not bad by any means, but it seems like to me it doesn't "magically" fix the "be reproducible" problem at all (which is what it seem to claim)
Am I missing something?
What Bazel does, however, is to make it possible to run build steps in a sandbox (although the current one is kinda leaky) so that your build is isolated from the environment and thus behaves in the same way on any computer. It also tracks dependencies correctly so that it knows when a specific action needs to be re-run.
This makes it possible to diagnose non-reproducible build steps easily. At Google, the hit rate of our distributed build cache usually floats around 99%, and this would be impossible without reproducible build steps.
Would Bazel help with the remaining long tail of packages in Debian?
If you run a script that outputs intermediate files, Bazel needs to know about that scripts inputs and outputs. And it works better if it knows them ahead of time.
There are a handful of Blaze derivatives built by Xooglers. Pants and Buck come to mind. They also share the trait of using sandboxed Python to define a build configuration. I'll take it over make syntax any day!
Writing generators to run this way is kind of a pain, actually, sort of like writing code to run in a sandbox. Also, the generators themselves must be checked in, and often built from source. But we consider the results worth it.
It never explains any of this explicitly, but there are hints. , , .
 "Many rules also have additional attributes for rule-specific kinds of dependency, e.g. 'compiler'" -- http://bazel.io/docs/build-ref.html#types_of_dependencies
 "The build system runs tests in an isolated directory where only files listed as 'data' are available" -- http://bazel.io/docs/build-ref.html#data
Edit: A comment below seems to suggest that this is not the case: "Within Google we use a form of sandboxing to enforce that" (emphasis mine). -- https://news.ycombinator.com/item?id=9259147
Is Bazel developed fully in the open?
Unfortunately not. We have a significant amount of code
that is not open source; in terms of rules, only ~10% of
the rules are open source at this point. We did an
experiment where we marked all changes that crossed the
internal and external code bases over the course of a few
weeks, only to discover that a lot of our changes still
cross both code bases.
What they mean is that changes to the internal source of Blaze often involve changes to both the open sourced part, which is Bazel, and the closed parts, which are additional rules that are neither open sourced, nor included in Bazel (Blaze has about 5x as many rules as Bazel).
It's best to make atomic changes, so rather than split the changes, review and submit the open source changes externally, and the closed rules changes internally (which would complicate reviews, testing, syncing and rollbacks), then pull in the external changes, they submit these cross-code-base changes internally, then dump the change into the external repo. The next paragraph on that page makes it clear that the code is open, even if not all of the development process is.
To be clear, all of Bazel is open source and the source is available here: https://github.com/google/bazel
Google has a large number of rules (IE far far more than just the rules you see in bazel).
As part of open sourcing, they have stared out by open sourcing about 10% of those rules.
Some of this is because they are google-entangled. Some of them don't make sense to the open source community.
I read in the "Getting started":
> You can now create your own targets and compose them.
So does this mean it is a replacement for `make`? => Yes
Found the answer here: http://bazel.io/docs/FAQ.html
1. Binaries are checked in to source
2. It's more structured than Gradle
3. It's for very large code bases
5. It's nix only
1. We've already had the "chuck it in a lib directory" approach. The distributed approach maven/ivy etc seems to be working for the millions of developers out there who just have to get through the end of the day without production going up in flames. I suppose it's like moving a portion maven central into your code base. Checked in. Feels very odd, and kinda against one of the pillars of JVM: Maven. Love it or hate it it's one of most mature build/repository types out there. npm, bower anyone?
2. Got to agree with astral303. This isn't really something to shout about. Better reproducibility? Gradle/SBT have had incremental builds for quite a while. We all know there's no silver bullet, if you don't declare your inputs and outputs to gradle/blaze tasks or seed with random values then you're only going to get unrepoduceable builds.
3. Very large, I get that.
4. Very large code bases tend to enterprise systems. Enterprise systems tend to have a plethora of platforms/OSs so it being nix only is a drawback. However I suppose that if in charge of 10MLOC code base then I could mandate nix only builds? However in my experience they also tend to gravitate towards standards that seem to have longevity.
I'm yet to give it a go so I'll reserve final judgement. However I will say that I do wonder how far we'd be if Googles through their brightest minds at and worked with Maven/Gradle/SBT etc to scale their builds. (Yes I realise it's multi-lang - so is gradle). Perhaps the whole community would benefit from performance benefits.
Anyway hats off Google guys. It looks impressive and no doubt I'll jumping all over it in 12 months. In the mean time I'm off to go read up on Angular 2.0, or Typescript or ES6 or ES7 or whatever else I need* to know to get me through the day.
Really I'm just jealous I don't have 10MLOC code base :D
The problem with maven and gradle is that their build actions/plugins can have have unobservable side effects.
This approach is more 'pure functional'. You have rules which take inputs, run actions, produce outputs and memoize them. If inputs don't change, then you use memoized outputs and don't run the action.
As long as your actions produce observable side effects in the outputs (and don't produce side effects which are not part of the outputs, but product state which depended upon in some manner), then you can do a lot of optimizations on this graph.
In my experience with maven and gradle, they are way way slower, and that's on relatively small projects
I look forward to trying it out. The ObjectiveC rules sound interesting especially given the state of XCode which is a laughable IDE.
>> "Gradle: Bazel configuration files are much more structured than Gradle's, letting Bazel understand exactly what each action does. This allows for more parallelism and better reproducibility"
The value of "more parallelism" depends on the complexity of your Java source code base. I can easily imagine why this extra structure can lead to more parallelism.
However, I am not buying "better reproducibility" without justification or explanation. I've had very reproducible Maven builds for years (and I don't see how Gradle would be different). So I would love to know which aspects are improved upon with this structure, if someone could expand or explain.
Finally, I'm very wary of "much more structure". The worst thing about Maven is its extreme insistence on structure and schema and very specific architecture of your build tasks and components. In contrast, with Gradle, you can freely shape your build scripts to reflect the "build architecture" of your source tree in a minimal, maintainable way. Furthermore, when your application's needs change, refactoring your build is far easier in Gradle, thanks to its internal-DSL style (the build script is code).
If the structure isn't "free", you pay for structure with reduced build script development speed. For Google, it's a tradeoff worth having with that massive source tree.
We've put a bunch of work into making sure that we know about every file that goes into the Java compilation, and if any of them changes (and only then) do we recompile. Within Google, we use a form of sandboxing to enforce that.
You're also right that it isn't free - we have reason to believe that larger projects and larger teams will see benefits from using Bazel. Use your best judgement.
* If I have a Maven-based project with heavy reliance on pre-built jars from Maven Central, what's the recipe to port it to Bazel?
* Related, if I have multiple github repos, say a couple open source libraries and a couple private repos, what's a good recipe in conjunction to Bazel?
For multiple Github repos, use http://bazel.io/docs/build-encyclopedia.html#http_archive or http://bazel.io/docs/build-encyclopedia.html#new_http_archiv... (depending on if it's a Bazel repository or not). Let us know if you have any questions or issues!
A couple more questions :)
* Any pointers for adding Scala (sbt?) support? I'd start here: http://bazel.io/docs/skylark/rules.html.
* Suppose I develop using multiple repos and http_archive. I'd like to make changes both to a library and to a project that depends on it simultaneously, without committing the library patches to master github repo just yet. Is there a way to configure the http_archive, let's say by saying "bazel --mode=local", and have it customize the remote archive http to use a different url (say, my github's fork instead of the master github) for that build?
For multiple repos: there's no command line flag, but you could change the WORKSPACE file to use http://bazel.io/docs/build-encyclopedia.html#local_repositor.... Unfortunately, this may be of limited use to you. At the moment it's optimized/bugged to assumed that your local repos don't change, so it won't rebuild them (this is great for things like the JDK and gcc, but not so much for actual in-development repos). Feel free to file feature requests for any functionality you need, I'll be working on this a lot over the next couple months.
So they started with the use cases likely to be the most popular.
Additionally, there are definitely cases where the implementations of rules at Google are a morass, and rather than dump it on the open source community, it makes more sense to clean them up when they get rebuilt.
EDIT: I fully understand that this is a build tool for multiple languages. But its raison d'etre is speed. So I'm asking what techniques does Bazel use to accelerate builds and how do they differ from those used by sjavac, which is also designed to accelerate builds of huge projects?
Bazel also builds other languages, such as C++ and Objective-C.
We do invoke the Java compiler through a wrapper of our own. We think we can make that work as a daemon process to benefit from a hot JVM, but haven't gotten round to that.
We have experimented with a Windows port using MinGW/MSYS, but have no plans to invest in this port right now. Due to its Unix heritage, porting Bazel is significant work. For example, Bazel uses symlinks extensively, which has varying levels of support across Windows versions."
In other words: it's a lot of work, and frankly, our team doesn't know enough about windows to be very good at porting it. We would welcome contributions to make it work on Windows, of course.
Bazel configuration files are much more structured than Gradle's, letting Bazel understand exactly what each action does. This allows for more parallelism and better reproducibility.
Could you please elaborate on that (i.e. with regards to both parallelism and reproducibility)?
Gradle a single-threaded execution in some parts (I believe it may be configuration?), because the build-rules are written in a full blown language with access to the filesystem, and internal parts of Gradle.
In Bazel, rules have to declare inputs and outputs, and this can be enforced with sandboxing. This allows to predict that two rules do not interfere with each other, so we know we can run them in parallel. Our extension language disallows direct access to the file system, and also forbids access to other sources of non-determinism, such as hash tables and the clock.
: 15.9 in https://www.gradle.org/docs/current/userguide/more_about_tas...
If you're interested, hanwen wrote a bunch rules with similar semantics as the internal rules, see https://github.com/google/bazel/tree/master/base_workspace/e... .
It would be nice to make these semantics match the external ones better, but it requires us to open up more tooling, so people won't need to write BUILD files.
BTW, thanks for the release! Will have a fun time digging through this over the next few days. I heard some murmurs that Blaze was going to be open sourced from around the watercooler but didn't think it'd be so soon.
> Users interact with Bazel on a higher level. For example, it has built-in rules for "Java test", "C++ binary", and notions such as "target platform" and "host platform". The rules have been battle tested to be foolproof.
But does it give the optional custom level of control that for example CMake + Ninja provide? Or it's only high level rules?
You can [at least internally] define custom rules to handle pretty much anything, in almost-but-not-quite-python.
Multi-language support: Bazel supports Java, Objective-C and C++ out of the box, and can be extended to support arbitrary programming languages.
c'mon, not even the Go language from Google itself ?
Presumably will also make opensourcing internal projects easier. That can't be a bad thing :)
When I first saw the headline I thought they'd open-sourced it.
What, if any, does the convergence among these projects look like longevity-wise?
The FAQ is pretty clear about their reasons. It talks about tools, not other dependencies, but I'm sure the reasoning is the same: "Your project never works in isolation... To guarantee builds are reproducible even when we upgrade our workstations, we at Google check most of these tools into version control, including the toolchains and Bazel itself."
It's a sensible policy and one I use myself. Do you have a better reason for disliking this policy than a knee-jerk "yuck?"
Some reasons are the bloat, the possibility of "accidental" forks when a non-upstream version is compiled and checked-in binary-only, crufty old versions hanging around, and security problems. It adds extra work for downstream packagers having to pick it apart for distros.
Bundling gets particularly bloaty for git repos, since the history is always included in each clone. For perforce or SVN it doesn't matter so much as you only get the latest version of everything. In git each time there's a dependency update, it will pretty much add the size of the new jar to the .git directory. Over time it's going to grow huge. If at a later date the repository owner decides on a new policy where the third party files are not bundled, then even removing the directory from the current head doesn't shrink the repo size.
There are binaries in there for Mac, Linux and Windows (.exe file at least). You either need one or the other, not all at the same time.
This sort of thing is fine for proprietary software used in a controlled environment, but for open source it looks kludgy.
An alternative could be to have a "dependencies" repository that would be shallow-cloned as needed. At least that way the source code repo only would have source in it, not jars or executables. It'd ensure separation was enforced and you could still track requirements per version or change the policy later.
Google has a legendarily awesome centralized version control system.
I thought it was just perforce.
So there's no longer any perforce.