I would encourage anyone in doubt to give Bazel a try, especially for C/C++ projects. I find it to be very robust nowadays, and much saner than CMake or Makefiles. Even if you don’t have a huge monorepo, the reproducibility and speed are great.
Did a crazy thing and adopted Bazel for our two person company. Took a while to get everything working, but the average build takes about 30 seconds to run, across Go / Rust / TypeScript.
For languages like go and rust there's not a lot of added benefits. For C++ tho it's much easier in my opinion. You can also manage dependencies through it.
It has that "worst, except for all the others" quality. It's getting much easier to use with bzlmod, at least for the C++ stuff and especially if you have no weird requirements. I also strongly prefer its toolchain model over others, having recently needed to convert a large C++ project from host GNU tools to hermetic LLVM tools. From time to time I ponder converting projects to cmake but I no longer perceive any advantages.
Any advantage for using Bazel for newer languages that have a standardized dependency setup like Go or Rust? Or even for JS/TS, where almost everyone uses npm/pnpm/yarn?
Cargo workspaces do not support compiling two subprojects for two different targets in a single build invocation. The workaround was (still is?) to use each subproject as a standalone project just for builds. The dependency between these targets is now expressed by the Makefile invoking cargo. This also breaks IDE integrations because your IDE does not know how to ask cargo to pick the right target for each workspace member.
Bazel makes this consistent with platforms [0]. I have not used bazel with Rust but it worked fine for my multi-target C++ builds.
Other folks have said “mixed languages” and “monorepos” and I agree; I’ll also add that it incorporates steps like building Docker/OCI images, tarballs, potentially even deb/rpm packages all in one place, and with the same sandboxing and deterministic reproducibility as you get with the base language rules.
Like, “I want an OCI image with these contents” gets you an image with the exact files bit-for-bit from the same source, every time. At least aspirationally.
I personally don't use it for Go or Rust, so maybe the answer is "no"? There's no native builder for C++ so the utility of Bazel for that language is more apparent.
Talk about Bazel in the context of a specific language is missing the point. Bazel is for huge monorepos that aren't based on a single language so that you can have a common interface for all your code in the monorepo regardless of what language individual bits are written in.
Bazel does a lot of non-obvious things that other build tools don't, that become important in larger codebases (100-10,000 active developers):
- Automatic caching of everything.
In most build tools, caching is opt in, so the core build/compile steps usually end up getting cached but everything else is not and ends up being wastefully recomputed all the time. In Bazel, caching is the default so everything is cached. Even tests are cached so if you run a test twice on the same code and inputs (transitively), the second time it is skipped
- Dependency based test selection: You can use bazel query to determine the possible targets and tests affected by a code change, allowing you to trivially set up CI to only run tests downstream of a PR diff and skip unrelated ones. Any large codebase that doesn't use Bazel ends up re-inventing this manually, e.g. consider this code in apache/spark that re-implements this in a Python script that wraps `mvn` or `sbt` (which are build tools that do not provide this functionality) https://github.com/apache/spark/blob/290b4b31bae2e02b648d2c5...
- Automatic sandboxing of your build steps in CGroup/NS containers, to ensure your build steps do not make use of un-declared files. In most build tools, this kind of mistake results in confusing nondeterministic parallelism and cache invalidation problems down the road, where e.g. your build step may rely on a file on disk but not realize it needs to re-compute when the file changes. In Bazel, these mis-configurations result in a deterministic error up front
- At my last job we extended these cgroups to limit CPU/Memory usage as well, which eliminates the noisy neighbour problem and ensures a build step or test gets the same compute footprint whether run alone during development or 96x parallel on a CI worker (https://github.com/bazelbuild/bazel/pull/21322). Otherwise it's common for tests to pass when run during manual development then timeout or OOM when run in CI under resource pressure due to other tests hogging the CPU or RAM
- Built in support for seamless shared caches: e.g. I compile something on my laptop, you download it to your laptop for usage. This also applies to tests: e.g. if TestFoo was run in CI on master, if i pull master and run all tests without changing the Foo code, TestFoo is skipped and uses the CI result
- Built in support for shared compute clusters: e.g. I compile stuff and it automatically happens in the cloud on 96 core machines, or i run a lot of tests (e.g. after a big refactor) on my laptop and it automatically gets farmed out to run 1024x parallel on the cluster (which despite running faster shouldn't cost any more than running 1x parallel, since it costs 1024x as much per-second but should finish in 1024x fewer seconds!)
- Support for deep integration between multiple languages: e.g. building a Go binary and Rust library which are both used in a Python executable, which is then tested using a Bash script and deployed as part of a Java backend server. Most build tools support one language well and others not at all, Bazel supports them all (not perfect but adequately)
If you never hit these needs, you don't need Bazel, and you probably shouldn't use it because it is ferociously complicated. But if you do hit these needs, most other build tools simply do not cut it at all.
We're trying to support some of these use cases and provide a simpler alternative in https://mill-build.org, but Bazel really is a high bar to reach in terms of features that support large monorepos
Woah!! You say "If you never hit these needs, you don't need Bazel", but even for a small-ish project, I would like to have Automatic caching of everything, Dependency based test selection, Automatic sandboxing of build steps, or seamless shared caches!
I guess for a small team (say less than 10 people) the complexity of Bazel is too much to make it worth it, but those features you mention seem like universally useful for any project. I bet there are lots and lots of reinvented wheels out there trying to do just a subset of all these.
Empirically speaking, unless you are already a Bazel expert or are lucky enough to hire one, most orgs I've seen need at least 3-5 full-time engineers to become experts in Bazel and about ~1 person decade of engineering to roll it out (i.e. it takes those 3-5 guys 2-3 years). Adopting Bazel is not something you do over a weekend hackathon; the cost starts becoming reasonable once you have 100-200 engineers who can benefit from the 3-5 guys maintaining their build tool
For your small-ish project I'm guessing you have wants, but if the project grows at some point those will transition into needs and then you may have to make hard choices. That's my experience rolling it out Bazel myself and maintaining it for a growing organization and codebase over 7 years
I'm hoping the Mill build tool I'm working on can become an easier/friendlier alternative that provides many of the same benefits, even if it can't support the extreme Google scale that Bazel does. So if you think these things sound cool, but you don't want to spend 1 person decade on your build tool, you should check out Mill!
You make this sound considerably more involved than it actually is from my experience observing/supporting Bazel infra at three companies ranging from 10 to 1K engineers. Adopting Bazel for C, C++, Java and Go has been pretty straight-forward for years now unless you want power features like remote execution, custom rules/macros, etc
I don't doubt your experience, I can only provide mine. I've seen one rollout from the inside (~1000 engineers), performed one rollout myself (100-1000 engineers), and talked to ~10 other companies trying to do their own rollouts (100-1000 engineers). Everyone has their own unique circumstances, and I can only speak for what I have seen myself (first or second hand)
I agree. We moved to Bazel at my last job and it probably took about 6 person months. I was most of that person and it includes other tooling not related to Bazel. Some extremely competent engineers also moved over our frontend code (including stuff like Cypress) and Python code (which needs to run against 3 different versions of Python). They had no Bazel experience beforehand and asked me maybe a couple hours worth of questions and just got it done. So I don't think you need to be a Bazel genius to get this done, but it helps to have someone with a Vision, which was me in this case. All in all, I'd do it again. I'm in the process of moving all my open source code to a monorepo (jrockway/monorepo, which really should have been called jrockway/jrockway) because the development experience is so much better.
The biggest goal for me in doing the project at work was because new employees couldn't run the code they were working on. We hired people. They tried. They weren't very productive. That's my fault, and I wanted to fix that while supporting our policy of "you can use any Linux distribution you want, or you can use an arm64 Mac". Many people suggested things like "force everyone to use NixOS" which I would be in favor of, but it wasn't the solution that won. (I honestly prefer Debian myself and didn't think that my preference should dictate how the team works. The fact that I disagreed with the proposed solution is a good indicator that people would be unhappy with anything I declared by fiat.) Rather, using Bazel to provide a framework for retrieving third-party tooling and also building our code was a comfortable compromise.
A secondary goal was test caching. If you edit README.md, CI doesn't need to rerun the Cypress tests. (As a corollary, if you edit "metadata_server.go", the "pfs_server.go" tests don't need to run, as the PFS server does not depend on the Metadata server.)
The biggest piece of slowness and complexity in the workflow would be building our code into a container to run in k8s. We used goreleaser and that involved building the code twice, one for each architecture, to build into an OCI image index which was our main release artifact. The usual shell scripts for local development just reused this and it was terribly slow. Throw in docker to do the builds, which deletes the Go build cache after every build, and you have a recipe for not getting anything done. Bazel is a much better way to build containers. Containers are just some JSON files and tar files. Bazel (rules_oci) just assembles your build artifacts into the necessary JSON files and tar files. To build a multi-architecture image index, you build twice and add a JSON file. Bazel handles this with platform transitions; you make a rule to build for the host architecture (technically transitioned to linux on a macos host), and then the image index rule just builds that target with two configurations (cross-compiled, not emulated with binfmt_misc like "docker buildx") and assembles the two artifacts into the desired multi-arch image. When running locally, you skip 2 of the 3 steps, just build for the host machine. Combined with proper build caching (thanks BuildBuddy!), this means that making an image to run in k8s takes about 10 seconds instead of 6 minutes. With the previous system you could try your code 80 times a day. With the new system, you could try it 2880 times ;) This increased productivity.
I also wrote a bunch of tools to make setting up k8s easier, which would have been perfectly possible without Bazel, but it helped. (Before, everyone pushed their built image to DockerHub and then reconfigured k8s to pull that. Now we have a local registry, and if two people do a build at the same time, you always get yours and not theirs. I did not design this previous system, I merely set out to fix it because it's Wrong.) Bazel makes vendoring tools pretty easy. For our product we needed things like kubectl, kind, skopeo, postgres, etc. These are all in //tools/whatever and can be run for your host machine with `bazel run //tools/whatever`. So once you ran my program to create and update your environment, you automatically had the right version of the tool to interact with it. We upgraded k8s regularly, nobody noticed. They would just get the new tool the next time they tried to run it. (A centrally managed linux distribution would do the same thing, but it couldn't revert you to an old tool when you checked out an old version to debug. A README with versions would work, but I learned that nobody really reads the READMEs until you ask them to. "How do I do X" "See this section of the README" "Oh damn I wish I thought of that" "Me too." ;)
The biggest problem I had with Bazel in the past was dealing with generated files. Editor support, "go get <our thing>", etc. I got by when I used Blaze at Google, but realistically, there was no editor support for Go at that time, so I didn't notice how badly it worked. There is now GOPACKAGESDRIVER, which technically helps, but it didn't work well for me and I wasn't going to inflict it upon my team. I punted this time and continued to check in generated files. We have a target //:make_proto that rebuilds the proto files, and a test that checks that you did it. You check in the generated protos when you change the protos. It works and I have a general rule for all generations like this. (We also generate a bunch of JSON from Jsonnet files; this mechanism helps with that.)
All in all, you can get a fresh machine, install git and Bazelisk, check out our repo, and get "bazel test ..." to succeed. That, to me, is the minimum dev experience that your employer owes you, and if you joined my team, you'd get it. That made me happy and it wouldn't be as good without Bazel. I'd do it again!
Just as an aside, after the Bazel conversion, I did a really complicated change, and Bazel didn't make it any harder. We made our main product depend on pg_dump, and adjusting the container building rules from "distroless plus a go binary" to "debian plus postgres plus a go binary" was pretty easy. rules_debian is very nifty, and it gives me a sense of reproducibility that I never got from "FROM debian:stable@sha256...; RUN apt-get update && apt-get install ....". Indeed, the reproducibility is there. You can take any release tag, "bazel build //oci:whatever" and see that the resulting multi-arch image has the same sha256 of what's on DockerHub. I couldn't have done that without Bazel, or at least not without writing a lot of code.
I don't work there anymore but I'm really happy about the project. I don't even do Build & Release stuff. I just add features to the product. But this needed to be done and I was happy to wear the hat.
As someone who is somewhat experienced with build systems in general (though not with Bazel) and has had to solve a lot of the issues you mentioned in different ways (i.e. without Bazel), I have been interested in learning Bazel for a long time as its building principles seem very sound to me. However, the few times I looked into it I found it rather impenetrable. In particular, defining build steps "declaratively" in Starlark to me just seemed to be a slightly less bad way of writing magic incantations in YAML. In other words, you still had to understand what exactly every magic encantation did under the hood and how to configure it, and documentation generally didn't seem great.
Is there some resource (blog/book/…) you can recommend for learning Bazel?
I feel like I got the basics from using Blaze for years at Google. Things like "oh yeah, buildifier will autoformat my BUILD files" and the basic flow of how a build system is supposed to work.
Figuring out how to complete a large project with Bazel involved a few skills that one should be ready to employ.
1) Programming. The stuff out there can't do things exactly the way you want. I wanted to use a bunch of golangci-lint checks with "nogo", so I opened up the golangci-lint source code and copy-pasted their code into my project to adapt the checks to how nogo works. People have tried fixing this problem generically before, but their solutions ended up not working and there are just a bunch of half-abandoned git repositories floating around that don't work. Write it yourself. (I had to write a lot of code for this project; compiling protos the way we want, producing reproducible tar files with more complex edits than I wanted to do with mtree -> awk -> bsd tar, installing built binaries, building "integration test" go coverage binaries, etc. Lots of code.)
2) Debugging. A lot happens behind the scenes and you always need to be situationally aware of what's being done for you. For example, I was pretty sure our containers would be "reproducible" i.e. have the same sha256 no matter the configuration of the build machine. That was ... not true. I tested it and it wasn't happening. So I had to dive into the depths of the outputs and see which bytes were "wrong" in which place, and then debug the code involved to fix the problem. (It was a success, and oddly I sent the PR to fix it about 5 seconds before someone else sent the exact same PR.)
3) Depth. There probably isn't a way to be functional where you pick something out of your search results, follow the quickstart, and then happily enjoy the results. Rather you should expect to read all of the documentation, then read most of the code, then check out the code and add a bunch of print statements... with each level of this involving some recursion to the same step for a sub-dependency. For example, I never really knew how "go build" worked, but needed to learn when I suspected linking time was too high. (Is it the same for 'go build'? Yes. Why? It's spending all of its time in 'gold'. What's gold, the go linker? No, it's the random thing Debian installed with gcc. Is there an alternative? Yes, lld and mold. Are those faster? Yes. How do I use one of those with Bazel? I'll add some print statements to rules_go and use that copy instead of the upstream one.)
With all that in mind, I never figured out "everything". There is a lot of stuff I took at face value, like configuration transitions for multi-arch builds. The build happens 3 times but we only build for 2 platforms (the third platform is the host machine). I don't know why or how to prevent the host build. (I did figure out how to do this for some platform-independent outputs, though, like generating static content with Hugo.) I also wrote a bunch of toolchains but never used Bazel's toolchain stuff. I had my works-with-5-lines-of-code way of running vendored tools for the host machine and never saw the need to type in 50 lines of boilerplate to do things the "right" way. I'm sure this will burn someone someday.
In the end, I guess motivation was the key. People on my team couldn't get their work done, and CI was so slow that people spent half their day in that cycle "I'm going to go read Reddit until CI is done". Hacks had been attempted in the past, and had a lot of effort put into them, and they still didn't work. So we had to rebuild the Universe from first principles, doing things the "right" way. The results were good.
I will always prefer this approach to the simpler ones. For one thing, Bazel always gives the "right answer" when it's set up correctly. It doesn't rely on developers to be experts at managing their dev machines; you include all the tools that they need and you can update them whenever you want a new feature, and they get it for free. That's the big selling point for me. I also can't deal with stuff that is obviously unnecessary, like how Dockerfile-based container builds require an ARM64 emulator to run "mkdir" in a Dockerfile. You're just generating a stack of tar files and some JSON. Let me just tell you where the tar files and the JSON is. We do not need a virtual machine here.
I migrated my team of ~20 developers to Bazel, and I have to say I don’t think it is that complex. Not only don’t we have 3 full time engineers devoted to Bazel, as I do everything myself, but in fact it’s not even my main role to maintain the build system.
We do not use some of the more complex features like remote execution, but we do enjoy all the other features including remote caching. We reduced build times by 92% after the migration.
So my advice would be, try it and see for yourself if you think it’s worth the hassle. For my team, it definitely has been.
It is really not that hard to grok and use. I think cmake is actually a bit harder. It doesn’t hurt that bazel builds are implemented in starlark instead of a zany state manipulation language-that-isn’t-a-language.
I personally like using it for my monorepo projects. At least for Java the Maven dependencies for all of the services get defined in a single top-level MODULE.bazel file which prevents drift in the dependency versions I'm using across the project.
I'm sure it's possible to do the exact same thing in <NAME ANOTHER BUILD TOOL>, but I found it to be easy enough in Bazel without the configuration being overly verbose.
On one hand, exactly, anything that's grabbed from the distro is actually an issue to reproducibility, but I guess it also means there could be some shim to make installing things easier (assuming `apt/pacman/whatever install blaze` isn't enough to get going, which I guess may be the case since bootstrapping toolchains might not be straightforward and needs configuration/smashing-your-head-against-the-wall for each language, especially if it's not already supported like C++/Java).
I've been using it to package the opentelemetry-c++ sdk for us, with single command:
bazel run make_otel_sdk
it compiles debug, fastdebug, release, then it places the .dll, include/ folders, etc. into single zip. I'm also invoking sentry-cli to obtain all source code used and also place them in
At work in a top level .bazelrc I have caching on the intranet, but on the github I'm relying (not ideal) on the actions/cache through disk_cache blobs - it works, but storing back all "blobs" makes it a bit worse than expected.
Neat setup. I would point out that this shows some of the problems with the way bazel is managed as a project. You had to fix your build for bazel 8, which isn't great. But the bigger issue is you were forced to adopt rules_cc, which because of their weird adherence to "semantic versioning" means you are now subject to unannounced breaking changes, because rules_cc hasn't hit 1.0.
Rules_pkg went through the same problem: it was built in to bazel, then they moved it out into rules_pkg 0.0.1, then the maintainer of rules_pkg felt free to break everything multiple times, mostly for unjustifiable aesthetic reasons.
Yup! I'm using this project as playground to get better at Bazel, and also to evaluate for myself - "is this really a tool I can use at work" - "Not yet, because someone else must be mad as me to love it to use it, instead of use it to solve problems" - especially on Windows :)
I think it's still very representative.
https://earthly.dev/blog/bazel-build/
reply