This computes a single hash based on the contents of all files. I suppose it's appropriate for languages/build systems that do not have file-level compilation. Otherwise, it would be more efficient to keep hashes per file and re-build only those that have changed.
A criticism on this implementation: The "state" directory is never cleaned; new states are always added. Therefore if you go back to a state (hash) that you had already built in the past, you will not be able to build it again.
And there is no need to create separate shell scripts, when you can have all the relevant code inside your Makefile. Presumably those are not going to be called independently anyway.
As I've written on previous discussions:
Make by default uses the file change timestamp to trigger actions. But this is definitely not the only way, and you can code your Makefile so that rebuilds happen when a file's checksum changes. IIRC, the GNU Make Book has the code ready for you to study...
Or, you might get more clever and say "when only a comment is changed, I don't want to rebuild"; file checksums are not the correct solution for this, so you can code another trigger.
I wrote "the GNU Make Book has the code ready for you to study."
I'm currently traveling, without access to my books, but a quick online search at the contents of this book https://nostarch.com/download/GNU_Make_dTOC.pdf shows that p.82 has the code for "Rebuilding When a File's Checksum Changes".
The GNU Make Book by John Graham-Cumming, No Starch Press, April 2015, 256 pp., ISBN-13: 978-1-59327-649-2
Correct, the state directory isn't being cleared. As this is mostly aimed at making ephemeral build agents faster, this shouldn't matter much in practice.
Having said that, there is definitely some work to do on keeping the remote storage somewhat clean.
I paid full price for this book on No Starch, but I recently purchased it again as part of the "Linux by No Starch"[1] bundle that's going on now. I think that bundle would be appreciated by a lot of the crowd here. It's in the $10 and up tier.
Bravo! I haven't read the code carefully yet to fully understand the details of the implementation, but this is brilliant hackery.
As someone who has recently developed a focused interest on build systems, I have noticed the large vacuum in the space for a content-based Make-like that has a similarly low of a barrier of entry. HN user bobsomers said it well the other day:
"There is a much tighter, cleaner, simpler build system within Bazel struggling to get out. A judicious second take at it with a minimal focus, while taking some of the best ideas, could be wildly successful I think."
This is clever but truly content based change detection in make would seriously fix a bunch of issues. I'm sort of surprised it hasn't been done already.
I expect you could get rid of most of the non-POSIX features by using more complex recipes, probably exiling them into separate shell scripts. (The main limitation is that V7/POSIX/BSD-style suffix rules don’t let you specify a rule for producing ANYTHING.foo from ANYTHING, whereas V8/GNU-style pattern rules do.)
We use all kinds of tools that require a database that is effectively hidden from us. (e.g. git). I don't think this is a significant blocker or problem.
I might be misunderstanding but I think this fails when you revert to a previous revision of a file as the hashing will revert back to an older timestamp, i.e.
make
# compiled foo.ts
vim foo.ts
make
# compiled foo.ts
git stash
make
# nothing to do
Run all your scripts through https://www.shellcheck.net/ (you can install it locally too) and correct all errors it finds, click the explanation pages to understand why. In future, improve your style so you don't generate errors.
How would that work in the C/C++ world? For example, a.cpp #include's <a.h>; a.h doesn't produce any compilation artifacts. We change a.h but not a.cpp; since a .cpp is unchanged, there's nothing to do in the build?
Normally when manually writing Makefiles people assume the host system doesn't change as it's painful and the only alternative is to sacrifice portability. If you want a more reliable/portable/stable build the only real option is use a build system like meson/cmake/autotools/basel/... to generate the appropriate Makefile for the current machine (and once you start doing that you might as well stop using make and use ninja instead).
Normally you ask your compiler to generate the dependency graph on first run, then include it from your Makefile (make already knows to rerun itself when the Makefile deps change).
ccache https://ccache.dev/ is great for C/C++, but it would be excellent if it exposed a more generic interface for use in the way this article describes.
Inotify (and various equivalent change notification mechanisms on other systems) is probably ideal, but needs a fallback for a cold start. If a cold start isn’t the common case (as it is in Make), using slower cryptographic hashing instead of less reliable mtimes might be a good tradeoff. But I don’t think you could marry this to Make without major surgery. (Tup, mentioned elsethread, is this + automatic dependencies based on syscall tracing.)
This is not what the article is about, however,—it’s about building a build-input-addressable artifact cache that can be shared across machines, à la Bazel or Nix. (That Nix is a build system—and NixOS is a distro built on said build system, the same way that Gittup is built on Tup, the BSDs on Make, or Gentoo on its thing—seems to be a well-kept secret. The distinguishing idea that in that case “build artifact cache” = “binary repo” is brilliant, though.)
What I don’t get is what purpose Make serves in the context of the article: when the build process turns a whole bunch of files into one with no intermediate steps or separate compilation, it seems a plain shell script would do just as well as Make + auxiliary shell scripts. You could perhaps obtain some more structure (less strict source ordering?) with Make, but the article doesn’t seem to do that.
To address the purpose of make, this was written with a large multi workspace typescript project in mind: in the real project, there are many more make targets with dependencies on each other, and we don't want to waste time rebuilding the different workspaces if they haven't changed.
I wasn't sure whether including all that extra information in the post was worth it or not, so I hope this answers your question
A criticism on this implementation: The "state" directory is never cleaned; new states are always added. Therefore if you go back to a state (hash) that you had already built in the past, you will not be able to build it again.
And there is no need to create separate shell scripts, when you can have all the relevant code inside your Makefile. Presumably those are not going to be called independently anyway.
As I've written on previous discussions:
Make by default uses the file change timestamp to trigger actions. But this is definitely not the only way, and you can code your Makefile so that rebuilds happen when a file's checksum changes. IIRC, the GNU Make Book has the code ready for you to study... Or, you might get more clever and say "when only a comment is changed, I don't want to rebuild"; file checksums are not the correct solution for this, so you can code another trigger.