Back dating Git commits based on file modification dates (simonwillison.net)
Back dating Git commits based on file modification dates

Thanks for sharing, I love stuff like this. I get frustrated at mtimes being munged. MacOS finder is terrible about this.

If anyone is interested, I wrote a more generalized tool for storing metadata called storetouch[1] which takes a snapshot of file's mtimes, and stores them in a nicely formatted file for convenient restoration. You can run it's output by itself with bash, or easily parse it's output.

I found that the commit or author date's did not always make sense to represent the true age of a file. Especially for non-source-code documents. For example, a repo of scanned documents. Or a git-annex repo of podcasts going back 20 years.

Instead, I treat the commit date as the commit date, and I have a versioned sidecar that stores the actual original mtimes of the files.

It can also handle all valid POSIX filenames (a surprising amount of utilities don't, including tools like git-restore-mtime, last time I checked). If you're already using a tool like git-annex or datalad though, they have the ability to store metadata for each file as well.

I would also suggest checking out git-store-meta[2] which was the original inspiration. It stores more than just mtime, but it is more closely tied to git.

[1]: https://github.com/unqueued/storetouch

[2]: https://github.com/danny0838/git-store-meta

The mtimes aren't being munged. mtimes are filesystem metadata that reflect when the file was changed; trying to use them as version control information (or application level metadata) is wrong. For example, mtimes are used by programs that cache file data, like file indexing programs; backdating mtimes will break everything that uses mtimes for their intended purpose.

Yep, and a lot of backup software will use them (along with file size, and other stat() fields) to determine if a file has changed, so backdating them can actually lead to silently stale backups

Sometimes search indexing and build tools get it wrong, so if you know what you're doing, then tools like this allow you to provide an override.

Just browsing an old photo collection without making any changes can cause a 15 year old file to show up as a "new file", even though it's contents have not changed. This is especially true with media files that the os might take it upon itself to regenerate metadata for.

Or you may wish to store an accurate snapshot of the timestamps of cached or intermediate build artifacts.

It's 2024, any backup mechanism that isn't based on a delta between two snapshots is legacy from the old bad times.

Both Finder and explorer.exe will mung mtimes from casual browsing, usually due to metadata updates. It is a common enough problem that there are utilities to bind mtimes to files.

IMO mtime the de-facto file timetsamp metadata, since it is most widely supported. The other file timestamp metadata I found were not as useful or portable.

And people use git for storing more than just source code.

I personally find it very useful to be able to clone my podcast index and have accurate timestamps not just from the commit, but from the file itself.

Timestamps can be important for a collections of files, but they are fragile and often an afterthought.

Not exactly what's in this original post but here's a little shell function that could be useful to conveniently backdate Git commits:

  dated() {
      shift 1
      set -x
      GIT_AUTHOR_DATE="$date" GIT_COMMITTER_DATE="$date" "$@"
      unset date
Some usage examples:

  dated "1990-01-01 10:10" git commit
  dated "1990-01-01 10:10:00 +0100" git commit
  dated "1990-01-01 10:10:10 +0100" git commit --amend
  dated "1990-01-01 10:10:10 +0100" git commit --amend --reset-author

If all you want is author date (which I think is more correct), then just this works fine:

  git commit --date '1990-01-01 10:10'

IIRC the GitHub UI started showimg commit date instead of author date some years ago.

I use the following to create a commit using the latest modification date of the staged files. If you put it in your PATH as git-tcommit, you can invoke it as `git tcommit`. If GIT_AUTHOR_DATE or GIT_COMMITTER_DATE is passed, it overrides the modified time, and it honors TZ. It works with GNU or BSD.

    #!/usr/bin/env bash
    set -eEuo pipefail

    # `git commit`, using the latest file modification time as the commit and author
    # dates, when GIT_AUTHOR_DATE or GIT_COMMITTER_DATE, respectively, is not set.

    # Use fractional seconds when available to display the newest file more
    # accurately, even though Git only uses seconds.
    if which gstat >/dev/null; then
      stat=(gstat --format='%.Y %n') # Detected aliased GNU coreutils
    elif stat --version 2>/dev/null | grep -q 'GNU coreutils'; then
      stat=(stat --format='%.Y %n') # Detected GNU coreutils
      stat=(stat -f '%m %N') # Fallback to BSD-style, which only reports seconds

    # Select the latest modified time of all staged files, excluding deletions.
      cd "$(git rev-parse --show-toplevel)" &&
      git diff --staged --diff-filter=d --name-only -z |
        xargs -0 "${stat[@]}" |
        sort -rn |
        head -1

    if [[ -n $modified ]]; then
      modified_seconds="${modified%% *}"
      modified_file="${modified#* }"
      modified_date="$(date -r "${modified_seconds%.*}" +'%Y-%m-%d %H:%M:%S %z')"
      echo "Modify date: $modified_date ($modified_file)"
    if [[ -n ${GIT_AUTHOR_DATE+.} ]]; then
      echo "Author date: ${GIT_AUTHOR_DATE:-now}"
    if [[ -n ${GIT_COMMITTER_DATE+.} ]]; then
      echo "Commit date: ${GIT_COMMITTER_DATE:-now}"
      last_commit_seconds="$(git show -s --format=%at HEAD 2>/dev/null || echo 0)"
      if [[ $modified_seconds < $last_commit_seconds ]]; then
        echo "Commit date: now (last commit: $(git show -s --format=%ai HEAD 2>/dev/null))"

    GIT_AUTHOR_DATE="${GIT_AUTHOR_DATE-"$author_seconds"}" \
    GIT_COMMITTER_DATE="${GIT_COMMITTER_DATE-"$committer_seconds"}" \
    exec git commit "$@"

You probably don't want to reset committer date like that. It tracks repo information, not file information, and is generally not visible in things like log and blame.

I first tried just using author date but that didn’t display correctly in GitHub, and the whole point of this exercise was to get the GitHub web UI to show “33 years ago”.

Weird. Wonder why they did that. This is what it looks like using git directly:

  $ git init foobar
  Initialized empty Git repository in /tmp/foobar/.git/
  $ cd foobar/
  $ vim Bar
  $ git add Bar 
  $ git commit --date '2021-01-01 01:01:01' -m'Bar'
  [master (root-commit) fd1d31c] Bar
   Date: Fri Jan 1 01:01:01 2021 -0600
   1 file changed, 1 insertion(+)
   create mode 100644 Bar
  $ git blame Bar
  ^fd1d31c (Izkata 2021-01-01 01:01:01 -0600 1) Bar
  $ git log
  commit fd1d31c2d29e3e8f6325d9abca883ce7e00d48e3 (HEAD -> master)
  Author: Izkata <***>
  Date:   Fri Jan 1 01:01:01 2021 -0600
  $ git log --pretty=fuller
  commit fd1d31c2d29e3e8f6325d9abca883ce7e00d48e3 (HEAD -> master)
  Author:     Izkata <***>
  AuthorDate: Fri Jan 1 01:01:01 2021 -0600
  Commit:     Izkata <***>
  CommitDate: Sat Aug 3 12:50:45 2024 -0500

I think using pathlib instead of the more old-school os.walk and os.path.getmtime would clean that code up substantially!

I agree, and if I’d written this code by hand I would have used pathlib. I basically stopped caring as soon as I ran Claude’s code and it did the thing I wanted it to do.

I had heard that git was incompatible with dates pre Unix (around 1970). This makes it hard if you want to use git as a format to show collaborate on documents historically. Has there been any progress on general agreement on how to do this?

Ref: https://github.com/JesseKPhillips/USA-Constitution?tab=readm...

I did something similar from backup files for one of my projects. Although I did it mostly manually. It's on GitHub now at https://github.com/jurakovic/Comets-Archive

When I saw the repo yesterday I thought how did Simon do that? Now I know. Great work mate.

Very nice, if one wants something more meaningful than only perhaps you can add a Claude summary aka https://github.com/RomanHotsiy/commitgpt

Love that they included the Claude workflow they used to write the script.

I’ve used a similar technique to create a git history for configuration files in an AWS S3 bucket that has version control enabled.

Misread the title thinking someone invented nerdy dating service on github. But it's not that. Disappointed.

> clicking a link to a .m file triggers a download

Firefox can be convinced to simply show you any resource that you know is some flavor of plain text if you open it in View Source mode, i.e. by constructing a view-source: URI out of the original URL and accessing it that way. So to bypass the download nag screen for Anchor.m, you'd access it as <view-source:https://www.w3.org/History/1991-WWW-NeXT/Implementation/Anch...> instead. This even works on Firefox for Android, but not Chrome. (Of course you don't need to construct these by hand most of the time; if you're on a page linking to a bunch of files that otherwise prompt you to download them, like the Apache directory listing in this case, you just view the source of the directory listing page and click the links in the href attribute values there.)

This is pretty useful in multiple circumstances—which of course means that someone at Mozilla Corp will remove it from Firefox or find some other way to stop you from doing this any day now.

> which of course means that someone at Mozilla Corp will remove it from Firefox or find some other way to stop you from doing this any day now

Mozilla has its fatal flaws and I'm all for pointing them out (loudly), but it seems like time would be better invested at actually pointing the real flaws.

For all their flaws, I don't think they are out there actively trying to piss people off, even if it can feel like they are. I don't see why they would remove this. It doesn't seem like supporting this use case causes any complexity in the code or in the UI / UX.

Hatred is not free, I would suggest spending it wisely.

(of course, I know you are probably being sarcastic)


Your response's logic applies to your response far more than to theirs.


