Hacker News new | past | comments | ask | show | jobs | submit login

Why not just use a complete checksum of the file?

The average project is what, a few MB? Less? Most of which is going to be cached after the first compile anyway.

Even on an enormous codebase, as long as you have an SSD or a nontrivial amount of RAM I can't see this being an issue.

You don't care if a file is newer - you care if it's different!




This is discussed near the end of the article. It works best when the file system itself stores a checksum in its metadata so it does not have to be calculated for each file for each build. It's not appropriate when your build may include dependencies based on other side effects besides file content. For example, sometimes you depend on the timestamp of an empty file, or the success or failure of another step based on e.g. a log message, to trigger other actions


Is it common to build over NFS? I can't immediately see a use case - collaborative editing or something? Even in that case, wouldn't it be easier to build on the box?

The other case seems valid in a sort of 'if your build intentionally makes use of mtime, you'll need to look at mtime'. It seems like an odd thing to do in the first place - I guess Makefile as deployment rather than for building?

A 470K LoC project I have here with >1000 files takes 0.04 seconds to do a full sha256sum traversal on my box from the cache. That's single-threaded.

If I drop caches, it takes approximately 1 second (from spinning rust, not SSD).


I do a lot of nfs builds at work as part of development on a proprietary OS. Some things will only nicely build on-OS, which for dev purposes is generally running in a non-local VM. The relevant git repos are massive enough (I'll frequently be building a small part of the tree, but still have to clone the whole thing), and the VMs disposable enough, that dealing with slower builds via an nfs mount from my workstation and/or homedir server is faster than repeatedly cloning.

You can probably argue that this is a consequence of bad tooling rather than any strength of nfs builds, but it is an example of a non-trivial number of developers frequently building over nfs.


For the median project, you could just rebuild everything from scratch every time and avoid the problem.

It's the huge projects with millions of files and tens of gigabytes of source and assets that need these optimizations the most, and that's also where checksumming is the most painful.

It's not as unrealistic or monstrous as it sounds. It happens in monorepos when you include all of a project's thousand dependencies (down to things like openssl and libpng).


At work, we frequently build in a Linux VM (using VirtualBox through Vagrant) from a macOS host, and the default shared folder does not support symlinks. We use NFS as a workaround.


NFS is getting so rare that some systems aren't even organized to accommodate "mount -o ro /usr" anymore.


NFS is widely used in HPC to mount user home directories on compute nodes.


I use NFS a decent amount, just not for anything like this, because even on a link with e.g. 5ms latency you end up with issues all over the place.

It seems like solving a problem that could be fixed more easily by just rsyncing or cloning the codebase. Storage is cheap.


hm! i wonder if ZFS exposes the hash to the outside world..


Check out the source sizes for Firefox or Unreal Engine. A project that's so small is a project that isn't putting big demands on the build system anyway, so that exception proves the rule.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: