Unix version control lore: what, ident

larschdk · 2024-05-14T07:37:00 1715672220

I work on a code base dating back to 2000. First years, the change log was kept at the top of the file, in a comment. Commit comments were automatically inserted into the file by a "$Log" directive (CVS). The first 200 lines are just 20 year old commit log entries, none of them providing any valuable information ("minor changes", "changed XYZ to be consistent with spec", "reenable reporting"). So much cargo culting.

theta_d · 2024-05-14T14:39:03 1715697543

I worked on a telecom codebase that dated back to 1985. Similar experiences. Was wild to think of contributing to something that was almost as old as me.

fanf2 · 2024-05-14T14:45:39 1715697939

Everyone I knew who was using CVS in the 1990s said that $Log$ was a bad idea that should never be used. Why haven’t you deleted the useless comments?

phinnaeus · 2024-05-14T13:46:04 1715694364

This reminds me of my fond memories of migrating from perforce to git and getting to delete similar perforce style comments at the top of every file. So satisfying to clean that up.

puetzk · 2024-05-13T18:49:41 1715626181

Relatedly, GCC/binutils still has a `.ident` assembly directive (https://sourceware.org/binutils/docs/as/Ident.html) and `#ident` preprocessor command (https://gcc.gnu.org/onlinedocs/cpp/Other-Directives.html) that emit data into an ELF .comment for `what` to read...

wpollock · 2024-05-13T20:13:13 1715631193

The original name was .SCCS. I put the SCCS strings in there to save memory. In the early 1980s, the 3B20 computers used to manage the U.S. phone network had a 32MiB memory limit. By 1985 the network had outgrown that memory limit, by just a few kilobytes. So I hacked the C compiler to look for #(*) strings and put them into .SCCS rather than .data. Since .SCCS didn't load into memory by default, I saved just enough to run one more process! Each binary was built from about 2,000 source files, so those strings added up to a significant amount of memory.

This was at Bell Labs Columbus Ohio. Also, I think it was COFF and not ELF. Used System V r2.

jmix · 2024-05-14T00:49:21 1715647761

Good riddance to these tools because this technique, of relying on embedded strings in the code, is inherently insecure and unreliable. You can only really on it when you know you can trust the build, and yet they are used in cases where the build is of unknown etiology, so there's an inherent mismatch between when the tool is used and what it does.

stevekemp · 2024-05-13T18:39:00 1715625540

Golang binaries embed the version of the compiler into them, and can easily add git revision information too.

That's a nice feature for adding "foo -version", or similar, to show the users what their binaries were built from.

fanf2 · 2024-05-13T20:05:13 1715630713

Here's a nice blog post with more information about that https://blog.carlana.net/post/2023/golang-git-hash-how-to/

peterldowns · 2024-05-14T00:01:36 1715644896

I'm currently using the ldflags approach [0], but the linked library [1] looks much nicer. I'll probably switch over to it soon. Thanks!

[0]: https://github.com/peterldowns/localias/blob/main/Justfile#L...

[1]: https://github.com/earthboundkid/versioninfo

paradox460 · 2024-05-13T22:39:00 1715639940

Years ago, when moving off svn to git, I cursed the fact that there was no such string replacement feature. I understand why it doesn't exist, but when it was an obstacle to my job, I loathed it.

It was easy enough to replace with a short script, and I use a variation of that to this date

bananskalhalk · 2024-05-14T04:48:56 1715662136

There is a string replacement feature, and to my knowledge it has been there the whole time.

https://git-scm.com/book/en/v2/Customizing-Git-Git-Attribute... look under "Keyword Expansion" halfway down the page.

ggm · 2024-05-13T22:40:01 1715640001

FreeBSD finally removed the last of the $Id$ in the source some months ago. I don't entirely know why they were so keen to do this but I'm sure they had a reason.

I don't see this, or binary .ident strings as e.g. clashing with idempotent builds.

fanf2 · 2024-05-14T14:55:45 1715698545

I gather it was because $FreeBSD$ stopped working with the move to git, so they removed it and its predecessors. (TBH the historical idents were not practically useful.) IMO it would be nice to at least retain a git hash in the kernel that can be found by what and ident.

HeadlessChild · 2024-05-14T14:51:14 1715698274

Does anyone have information about these lines? Are they auto-generated?

njt · 2024-05-14T18:52:56 1715712776

Both Subversion[1] and CVS[2] had keyword substitution, which replaced those tags with useful information like the commit id, author, date, etc.

They were very useful when you were looking at a source file, to see what version of that file you had.

Git had something similar with Git Attributes[3], but AFAIK, they were just references to blob ids, so they never really took off.

For git, I now use tags (and versioning based on tags), that more or less replaced svn/cvs keyword substitution in the git ecosystem.

[1] - https://svnbook.red-bean.com/en/1.7/svn.advanced.props.speci...

[2] - https://www.gnu.org/software/trans-coord/manual/cvs/html_nod...

[3] - https://git-scm.com/book/en/v2/Customizing-Git-Git-Attribute...

neilv · 2024-05-13T17:52:44 1715622764

I used to use these embedded version strings, and occasionally they were very helpful.

As the article says, it's not as easy a fit for Git.

The embedded version etc. strings could also make reproducible builds slightly more tricky than they already are.

Even if they're not worth the trouble for current software, they could be a big timesaver for archaeology/reconstruction of old software.

fanf2 · 2024-05-13T18:02:00 1715623320

It should be safe for reproducible builds because the version strings come from the commit metadata, so they are fixed for a given version of the code.

(My scripts have some cruft for marking builds from dirty source trees, in which case they are not reproducible – but in that case it’s OK.)

rjsw · 2024-05-13T18:02:41 1715623361

For Git, you could embed a string of the form:

  <git commit id>:path/in/repo/to/file.ext

to be able to retrieve the exact source file contents used to build something.

bananskalhalk · 2024-05-14T04:46:20 1715661980

Or just the hash from the blob of the file.

GauntletWizard · 2024-05-13T20:30:43 1715632243

I think that Git solves an imporant problem that those embedded version strings didn't - Git commits are guaranteed to be either unique or dirty. Generating a summary of your repository state should be done early, and included exactly in the artifacts, as should build flags. Google's internal build system included a bunch of build server information, including build time and build host, but importantly it included these variables in every built package as a simple .env file. This env file would cause the build system to "falsely" report all of these things, so you had a reproducible build, even with uniquifying data embedded.

alerighi · 2024-05-13T18:01:24 1715623284

I also used this mechanism more than one time on embedded project. Define a static variable with some metadata of the firmware and markers, such that tools (e.g. tools to perform an update) can, by simply using a regex on the binary file (that is a couple of kb at most) get metadata on the file, such as the version. This way you don't need to add another file header around the raw binary, something it's not always possible.

PhilipRoman · 2024-05-13T18:08:20 1715623700

haha I've been bitten often enough by complex build systems (cough Yocto cough) to develop a habit of adding random strings to see what actually gets included in the final executable. When you have patches, patches patching other patches, hundreds of ifdefs and python mixed with shell scripts, it is often the only way to make sense of it all.

m463 · 2024-05-14T06:34:35 1715668475

A lot of these techniques are fragile, and complicate lots of other things.

It is very useful if the same code compiles to the same binary if no changes occurred.

But having the date and time or a version control comment change a binary may lead to unnecessary churn with dependencies, packages, and integrity checks.

fanf2 · 2024-05-14T14:49:31 1715698171

This has no effect on reproducible builds since the version strings are stable.

renewiltord · 2024-05-13T18:06:30 1715623590

I use absolute overkill for this. A `build.rs` file that persists a metadata file into `OUT_DIR`. Something simpler would be nice, for sure.

fanf2 · 2024-05-13T20:03:09 1715630589

You might like https://docs.rs/git-version/latest/git_version/ or https://docs.rs/git-testament/latest/git_testament/

kazinator · 2024-05-14T05:20:22 1715664022

We now know that keyword expansion is idiotic and have moved past it.

The worst ones are the ones that expand the log messages, when they are implemented in such a way that it becomes permanent.

A wall of short commit messages condensed into a block comment don't help anyone understand anything, without the actual changes to refer to.

Keywords are supposed to help someone who works outside of the context of the version control, but that's ironically the person who is trying to apply a patch that is failing because of the expanded cruft, whereas the person working in the version control system may have a way to do the merge on unexpanded artifacts.