If you squint, mtime/inode etc. behave like a weak content signature of the input. And once you have that perspective, you say "if mtime != mtime I had last time, rebuild", without caring about their relative values, and that sidesteps a lot of clock skew related issues. It does "the wrong thing" if someone intentionally pushes timestamps to a point in the past (e.g. when switching branches to an older branch) as an attempt to game such a system, but playing games with mtime is not the right approach for such a thing, totally hermetic builds are.
One nice trick is that you can even capture all the "inputs" with a single checksum that combines all the files/command lines/etc., and that easily transitions between truly looking at file content or just file metadata. The one downside is that when the build system decides to rebuild something, it's hard to tell the user why -- you end up just saying "some input changed somewhere".
Does Ninja use a database like sqlite? It seems like it has to if it does something better than Make's use of mtimes. (e.g. the command line, which Make doesn't consider.)
I looked at redo (linked in the article) and it uses sqlite to store the extra metadata.
Ninja does use some database-like things, but they are just in a simple text/binary format. It's actually been long enough that I have forgotten the details.
https://ninja-build.org/manual.html#ref_log (contains a hash of the commands used) / https://ninja-build.org/manual.html#_deps (database-like thing with some mtimes, see https://github.com/ninja-build/ninja/blob/master/src/deps_lo... )
The sqlite3 database used in redo was just something I threw together in the first few minutes. sqlite was always massive overkill for the problem space, but because it never caused any problems, it's been hard to justify working on it. I'd like to port redo from python to C, though, and then the relative size of depending on a whole database will matter a lot more.
Ninja isn't too big though. It looks like about 13K of non-test code, which is great for a mature and popular project. Punting build logic to a higher layer seems to have been a big win :)
* On precision, he notes "almost no filesystems provide that kind of precision" (nanoseconds), but I would honestly say the exact opposite statement. ext4, xfs, btrfs, ZFS are some of the very common file systems that support this. He cites that his ext4 system only has 10ms granularity, which is most certainly not the default, but likely a result of upgrading from ext2/ext3 to ext4. As an aside, NTFS only has a granularity of 100ns.
* It is unclear what he means by "If your system clock jumps from one time to another...". If this is talking about NTP, it's probably accurate. My first reading was "daylight saving" or time zone changes, in which case, everyone uses UTC internally and such changes don't affect the actual mtimes. (You might get strange cases where a file listing regards a file modified at 01:45 to be older than a file modified at 01:20, but if you display in UTC, you can see it's just DST nonsense)
Ext4 has been stable for over a decade. It's been a default filesystem on many distributions. It was the default on RHEL 6 which was first released over 8 years ago, and the default for the ext variants after that. It's been in use in Debian since 6.0/Squeeze or later, which was 2011. It's been in use in Ubuntu since 9.10, released in late 2009.
To be clear, your argument is that it's not a rare edge case to have a filesystem that was originally only in common use as the default variant 6-8 years ago or more for the vast majority of installations, which has persisted and since been upgraded?
In-place upgrades do have the potential to leave some non-default options for the final ext4 file system, such as 128B inodes instead of the 256B ones, which is where certain features like reduced timestamp granularity comes in.
I don't see what the user has to do with how time is internally kept on the system.
I don't know what he's talking about either. Most popular NTP clients (ntpd, chrony) will try very hard to make sure this never happens by simply slowing down or speeding up time. You don't know what will break if you just gap time like that.
ntpd and chrony might do it (I'm not sure), but systemd's NTP implementation (which is widely used, even though it does have many other issues -- such as not implementing the spec properly IIRC) does just jump time when you enable NTP on a system where it was disabled. From memory, back when I used ntpd, it did the same thing but I could be mistaken.
Interesting, might you have any links to these discussions?
Not sure about chrony, since I haven't used it (or heard of it, admittedly).
The whole article is about edge cases. It doesn’t really matter if they aren’t super common: the result is that mtimes do act weird sometimes, and if you build a system that depends on them, it will also act weird sometimes.
Python 2.7.13 (default, Sep 26 2018, 18:42:22)
[GCC 6.3.0 20170516] on linux2
>>> 4e9 + 0.000001
Apparently, according to https://stackoverflow.com/questions/14392975/timestamp-accur..., the ext4 driver just uses the cached kernel clock value without the counter correction that gives you a ns-precise value.
One could perhaps have an LD_PRELOADED fsync (or whatever) that updates the mtime with clock_gettime() to store it in its full nanosecond precision glory but it's probably not worth the performance penalty. That wouldn't address the mmap issue of course...
I use Perl and found this to be a problem. Like Python, it uses a double for st_mtime, and the nanoseconds value is truncated, so it fails equality tests with nanoseconds recorded by other programs (e.g. in a cache).
It even fails equality tests against itself, when timestamp values are serialised to JSON or strings with (say) 6 or 9 digits of precision and back again. Timestamps serialised that way don't round trip reliably due to double truncation.
What is the granularity of your file system? Mine appears to be 3.33ms.
I was surprised and disappointed to find Linux sets mtime to the nearest clock tick (250Hz on my laptop) on filesystems whose documentation says they provide nanosecond timestamps.
It's not obvious because the numbers actually stored still have 9 random looking digits. But the chosen mtime values actually go up only on clock ticks. If you're running those filesystems on Linux, try it yourself:
(n=1; while [[ $n -le 10000 ]]; do > test$n; n=$((n+1)); done)
ls -ltr --full-time
That's why some of my programs on Linux now set the mtime explicitly with a call to clock_gettime() followed by futimens(), after writing the file. To make sure the timestamps do change each times files are replaced, in case it's more than once inside a 250Hz tick.
This part implies that the index file is written via mmap, but that's not true. It is fully rewritten to a new tempfile/lockfile, and then atomically renamed into place.
Git does not ever mmap with anything but PROT_READ, because not all supported platforms can do writes (in particular, the compat fallback just pread()s into a heap buffer).
This insight makes me want to try redo.
One thing I dislike about redo is that it probably does not work well on Windows. Has anybody ever tried?
Redo also makes me wonder: Is a build directory just a habit from using Make or is it a flaw of redo to not support that well? With "build directory" i mean the concept where the build process generates files in an extra directory which can simply be deleted and nothing does pollute the source directory.
Yes, although only for a toy project. I'm sure it'd work fine with Cygwin or WSL, it might work with MSYS, but I've definitely had it working with busybox-w32.
> Is a build directory just a habit from using Make or is it a flaw of redo to not support that well?
You can use redo in a Make-like fashion by putting a single `default.do` in the root of your project that decides what to do by examining the filename it's been asked to build. That does give up some of the benefits of redo, however (since a single file builds everything, when you edit that file redo wants to rebuild everything).
Having a separate build directory that can be easily wiped is a good idea, but I'm a lot less worried about it (or things like 'make clean') now that I have 'git clean -dxf'.
And anytime I do an initial check out from any version control, "make clean" is always the first step.
hoping jdebp will chime in..
Nevertheless, a build system which does this implicitly is better, imho.
These dependencies can be recorded automatically with LD_PRELOAD. LD_PRELOAD can redefine functions such as fopen and let you record what files are read when running e.g. cc.
This makes it feasible to record the entire relevant state of a system, files and environment variables, at build time.
An argument in favor of checksums is the use of build caches. Switching between branches on large codebases triggers a lot of rebuilds. With a build cache, that can be avoided. SCons is a build system that uses such a cache.
Not only can a process sidestep libc entirely by calling the `open`(2) syscall, but there are often many ways of combining function calls to achieve the same outcome. This method will also fail completely on systems that have new, previously unknown functions that are not monitored by the LD_PRELOAD solution.
Worst of all, a LD_PRELOAD solution would not cover operations that are done on the behalf of the target program by external programs via IPC (think system daemons and dbus), at least not without intercepting and interpreting all io that target does.
In short, it doesn't scale.
I don’t think this is true anymore with APFS.
Obviously this faces the "not every toolchain will support this" but you could have a switch to use checksums and continue to use the older approach by default.
You would have to call cksum instead of touch in your Makefile.
With a modules system (like C++20 is trying to embrace) you could in theory generate a vector of checksums (with subranges in the file) for each file and only recompile portions of the file that needed it. It's rather an accident of history that we use the granularity of a file at all.
The average project is what, a few MB? Less? Most of which is going to be cached after the first compile anyway.
Even on an enormous codebase, as long as you have an SSD or a nontrivial amount of RAM I can't see this being an issue.
You don't care if a file is newer - you care if it's different!
The other case seems valid in a sort of 'if your build intentionally makes use of mtime, you'll need to look at mtime'. It seems like an odd thing to do in the first place - I guess Makefile as deployment rather than for building?
A 470K LoC project I have here with >1000 files takes 0.04 seconds to do a full sha256sum traversal on my box from the cache. That's single-threaded.
If I drop caches, it takes approximately 1 second (from spinning rust, not SSD).
You can probably argue that this is a consequence of bad tooling rather than any strength of nfs builds, but it is an example of a non-trivial number of developers frequently building over nfs.
It's the huge projects with millions of files and tens of gigabytes of source and assets that need these optimizations the most, and that's also where checksumming is the most painful.
It's not as unrealistic or monstrous as it sounds. It happens in monorepos when you include all of a project's thousand dependencies (down to things like openssl and libpng).
It seems like solving a problem that could be fixed more easily by just rsyncing or cloning the codebase. Storage is cheap.
the only message I read is that someone wants to believe they're as smart as Dijkstra. /s