
Microsoft Build Accelerator – open-source build engine for large systems - shanselman
https://github.com/Microsoft/BuildXL
======
mikece
"BuildXL runs 30,000+ builds per day on monorepo codebases up to a half-
terabyte in size with a half-million process executions per build... You may
find our technology useful if you face similar issues of scale."

I know this wasn't supposed to be a humorous announcement but I couldn't help
laughing out loud at that! Kudos to the managers at Microsoft who now seem to
be asking "Why not?" instead of "Why should we?" when the topic of releasing
code to the community is raised.

~~~
algorithmsRcool
I'm a bit confused.

Are you critical of the size of the monorepo, or the number of builds?

~~~
nickpeterson
I believe the joke is that very few companies are the size of Microsoft, and
as such very few would find this useful.

~~~
hermitdev
You'd be surprised at the volume of code a smaller company can produce.

Former employer was a big C++ shop in finance. Of around 1000 employees,
roughly 3/4 of those were developers. They definitely could take advantage of
something like this. I dont know how many 100s of million of LOC they have
between C++ and later C#, but I was responsible for around 3 million alone
(largely generated). A full coordinated firm wide rebuild could take weeks.

~~~
nickpeterson
I'm always stunned by these sorts of stories. Was the opinion that the scale
of code was, low, high, or appropriate given the problems being tackled?

~~~
cryptonector
I've a similar story or three of finance companies that in just ten years
produced enormous amounts of legacy code. It's really not that hard. Solaris
was enormous too, with just 2k devs for all the time I was there, and about
30-40 years of history, depending on how you count it. If your 1k developers
each write 10Kloc/year on average, then after a decade you can expect to have
10Mloc, but since a lot of code will be forked external open source (or even
not forked, but just imported to freeze at a particular version, or for some
other reason) you might find your devs building and looking after many tens
more Mloc than that. If you hire lots of 5x and 10x engineers, that too leads
to a sizeable increment.

There are many many companies out there that have huge megarepos.

~~~
bryanrasmussen
Ok but, what's the byte size of 10MLoc, and how many process executions per
build - since these were actually the metrics used. My experience is that
lines of code don't actually take up that much space.

~~~
hermitdev
Depends largely on how the code is structured with C++.

There's the number of compilation units within a lib vs overall. Typically you
can parallelize within a module, but not externally unless you have some
smarts.

Edit: I use module in this sense as a producible result, not the future
language concept of modules.

------
pianoben
Congratulations to the BuildXL team! Domino, incidentally, was/is the internal
for BuildXL; there are a few papers published where the system is described
under that name. I had the privilege to see it gradually rolled out in the
Office codebase over the course of a year or two. It was a massive
improvement, and the lengths to which the team went to see it through are
really beyond description here.

I _have_ to wonder why we didn't just pick up Bazel, which is Google's open-
source distributed build engine for large systems, which also happens to have
been stable for years. Perhaps its Windows support wasn't up to snuff at the
time, but it feels like that would have been easier to fix that than to build
a whole new build system.

Regardless, congrats again! So cool to see this out in the open.

~~~
bazza451
Don’t know if it’s just me but looked at Bazel a few weeks ago. Insane levels
of complexity for just building multiple NodeJS projects.

Anyone know of anything similar but without that steep learning curve?

~~~
sterlind
Nix is similar to Bazel, supports distributed cache + build + CI (via Hydra),
and has a huge amount of existing build support tools. It's not typically used
as the only build system though; it's more like glue that wires all your build
systems together deterministically. Build steps (derivations) are simple bash
scripts executed within a sandbox.

The only catch is that while the language and tools are simple, Nix is really
a pure functional language and you have to treat your build process like code
rather than config. It's easy to go down rabbit holes...

(I considered Bazel but it's very half-baked and lacks all of Blaze's
proprietary toolchain support.)

~~~
bazza451
Cheers, will check it out

------
mikerg87
The why and how of this can be found here :

[https://github.com/Microsoft/BuildXL/blob/master/Documentati...](https://github.com/Microsoft/BuildXL/blob/master/Documentation/Wiki/WhyBuildXL.md)

Seems what drove this was a 90 build times of 90 hours for and end to end
build of Office. What I can gather this has a means of capturing all
read/write operations for a build step and placing it in a cache store to
determine if a change necessitates the rebuild of a component. Since it can
hook in at a lower level, it isn't sensitive to time stamps for building.
Actually quite interesting

------
azhenley
I interned with TSE in 2016 and had a blast. It is nice seeing one of their
projects get open sourced.

------
whalesalad
“Its own internal scripting language, DScript, an experimental TypeScript
based format used as an intermediate language by a small number of teams
inside Microsoft”

Anyone else notice this?

~~~
daemin
Yes, but really is it any different to what every build system does by
inventing its own DSL or making a new DSL in some other language?

Ant did it in XML, Premake uses Lua, CMake has its own, etc

------
scanr
Anyone know a bit more about DScript?

------
pojntfx
No Linux support, so who cares?

~~~
pjmlp
All of us that don't use Linux daily.

