Hacker News new | past | comments | ask | show | jobs | submit login
Back dating Git commits based on file modification dates (simonwillison.net)
69 points by nailer 38 days ago | hide | past | favorite | 42 comments



Thanks for sharing, I love stuff like this. I get frustrated at mtimes being munged. MacOS finder is terrible about this.

If anyone is interested, I wrote a more generalized tool for storing metadata called storetouch[1] which takes a snapshot of file's mtimes, and stores them in a nicely formatted file for convenient restoration. You can run it's output by itself with bash, or easily parse it's output.

I found that the commit or author date's did not always make sense to represent the true age of a file. Especially for non-source-code documents. For example, a repo of scanned documents. Or a git-annex repo of podcasts going back 20 years.

Instead, I treat the commit date as the commit date, and I have a versioned sidecar that stores the actual original mtimes of the files.

It can also handle all valid POSIX filenames (a surprising amount of utilities don't, including tools like git-restore-mtime, last time I checked). If you're already using a tool like git-annex or datalad though, they have the ability to store metadata for each file as well.

I would also suggest checking out git-store-meta[2] which was the original inspiration. It stores more than just mtime, but it is more closely tied to git.

[1]: https://github.com/unqueued/storetouch

[2]: https://github.com/danny0838/git-store-meta


The mtimes aren't being munged. mtimes are filesystem metadata that reflect when the file was changed; trying to use them as version control information (or application level metadata) is wrong. For example, mtimes are used by programs that cache file data, like file indexing programs; backdating mtimes will break everything that uses mtimes for their intended purpose.


Yep, and a lot of backup software will use them (along with file size, and other stat() fields) to determine if a file has changed, so backdating them can actually lead to silently stale backups


Sometimes search indexing and build tools get it wrong, so if you know what you're doing, then tools like this allow you to provide an override.

Just browsing an old photo collection without making any changes can cause a 15 year old file to show up as a "new file", even though it's contents have not changed. This is especially true with media files that the os might take it upon itself to regenerate metadata for.

Or you may wish to store an accurate snapshot of the timestamps of cached or intermediate build artifacts.


It's 2024, any backup mechanism that isn't based on a delta between two snapshots is legacy from the old bad times.


Both Finder and explorer.exe will mung mtimes from casual browsing, usually due to metadata updates. It is a common enough problem that there are utilities to bind mtimes to files.

IMO mtime the de-facto file timetsamp metadata, since it is most widely supported. The other file timestamp metadata I found were not as useful or portable.

And people use git for storing more than just source code.

I personally find it very useful to be able to clone my podcast index and have accurate timestamps not just from the commit, but from the file itself.

Timestamps can be important for a collections of files, but they are fragile and often an afterthought.


Not exactly what's in this original post but here's a little shell function that could be useful to conveniently backdate Git commits:

  dated() {
      date=$1
      shift 1
      set -x
      GIT_AUTHOR_DATE="$date" GIT_COMMITTER_DATE="$date" "$@"
      unset date
  }
Some usage examples:

  dated "1990-01-01 10:10" git commit
  dated "1990-01-01 10:10:00 +0100" git commit
  dated "1990-01-01 10:10:10 +0100" git commit --amend
  dated "1990-01-01 10:10:10 +0100" git commit --amend --reset-author


If all you want is author date (which I think is more correct), then just this works fine:

  git commit --date '1990-01-01 10:10'


IIRC the GitHub UI started showimg commit date instead of author date some years ago.


I use the following to create a commit using the latest modification date of the staged files. If you put it in your PATH as git-tcommit, you can invoke it as `git tcommit`. If GIT_AUTHOR_DATE or GIT_COMMITTER_DATE is passed, it overrides the modified time, and it honors TZ. It works with GNU or BSD.

    #!/usr/bin/env bash
    set -eEuo pipefail

    # `git commit`, using the latest file modification time as the commit and author
    # dates, when GIT_AUTHOR_DATE or GIT_COMMITTER_DATE, respectively, is not set.

    # Use fractional seconds when available to display the newest file more
    # accurately, even though Git only uses seconds.
    if which gstat >/dev/null; then
      stat=(gstat --format='%.Y %n') # Detected aliased GNU coreutils
    elif stat --version 2>/dev/null | grep -q 'GNU coreutils'; then
      stat=(stat --format='%.Y %n') # Detected GNU coreutils
    else
      stat=(stat -f '%m %N') # Fallback to BSD-style, which only reports seconds
    fi

    # Select the latest modified time of all staged files, excluding deletions.
    modified="$(
      cd "$(git rev-parse --show-toplevel)" &&
      git diff --staged --diff-filter=d --name-only -z |
        xargs -0 "${stat[@]}" |
        sort -rn |
        head -1
    )"

    modified_seconds=
    if [[ -n $modified ]]; then
      modified_seconds="${modified%% *}"
      modified_file="${modified#* }"
      modified_date="$(date -r "${modified_seconds%.*}" +'%Y-%m-%d %H:%M:%S %z')"
      echo "Modify date: $modified_date ($modified_file)"
    fi
    author_seconds="$modified_seconds"
    committer_seconds="$modified_seconds"
    if [[ -n ${GIT_AUTHOR_DATE+.} ]]; then
      echo "Author date: ${GIT_AUTHOR_DATE:-now}"
    fi
    if [[ -n ${GIT_COMMITTER_DATE+.} ]]; then
      echo "Commit date: ${GIT_COMMITTER_DATE:-now}"
    else
      last_commit_seconds="$(git show -s --format=%at HEAD 2>/dev/null || echo 0)"
      if [[ $modified_seconds < $last_commit_seconds ]]; then
        committer_seconds=
        echo "Commit date: now (last commit: $(git show -s --format=%ai HEAD 2>/dev/null))"
      fi
    fi

    GIT_AUTHOR_DATE="${GIT_AUTHOR_DATE-"$author_seconds"}" \
    GIT_COMMITTER_DATE="${GIT_COMMITTER_DATE-"$committer_seconds"}" \
    exec git commit "$@"


You probably don't want to reset committer date like that. It tracks repo information, not file information, and is generally not visible in things like log and blame.


I first tried just using author date but that didn’t display correctly in GitHub, and the whole point of this exercise was to get the GitHub web UI to show “33 years ago”.


Weird. Wonder why they did that. This is what it looks like using git directly:

  $ git init foobar
  Initialized empty Git repository in /tmp/foobar/.git/
  $ cd foobar/
  $ vim Bar
  $ git add Bar 
  $ git commit --date '2021-01-01 01:01:01' -m'Bar'
  [master (root-commit) fd1d31c] Bar
   Date: Fri Jan 1 01:01:01 2021 -0600
   1 file changed, 1 insertion(+)
   create mode 100644 Bar
  
  $ git blame Bar
  ^fd1d31c (Izkata 2021-01-01 01:01:01 -0600 1) Bar
  
  $ git log
  commit fd1d31c2d29e3e8f6325d9abca883ce7e00d48e3 (HEAD -> master)
  Author: Izkata <***>
  Date:   Fri Jan 1 01:01:01 2021 -0600
  
      Bar
  
  $ git log --pretty=fuller
  commit fd1d31c2d29e3e8f6325d9abca883ce7e00d48e3 (HEAD -> master)
  Author:     Izkata <***>
  AuthorDate: Fri Jan 1 01:01:01 2021 -0600
  Commit:     Izkata <***>
  CommitDate: Sat Aug 3 12:50:45 2024 -0500
  
      Bar


I think using pathlib instead of the more old-school os.walk and os.path.getmtime would clean that code up substantially!


I agree, and if I’d written this code by hand I would have used pathlib. I basically stopped caring as soon as I ran Claude’s code and it did the thing I wanted it to do.



I had heard that git was incompatible with dates pre Unix (around 1970). This makes it hard if you want to use git as a format to show collaborate on documents historically. Has there been any progress on general agreement on how to do this?

Ref: https://github.com/JesseKPhillips/USA-Constitution?tab=readm...


I did something similar from backup files for one of my projects. Although I did it mostly manually. It's on GitHub now at https://github.com/jurakovic/Comets-Archive


When I saw the repo yesterday I thought how did Simon do that? Now I know. Great work mate.


Very nice, if one wants something more meaningful than only perhaps you can add a Claude summary aka https://github.com/RomanHotsiy/commitgpt


Love that they included the Claude workflow they used to write the script.


I’ve used a similar technique to create a git history for configuration files in an AWS S3 bucket that has version control enabled.


Misread the title thinking someone invented nerdy dating service on github. But it's not that. Disappointed.


> clicking a link to a .m file triggers a download

Firefox can be convinced to simply show you any resource that you know is some flavor of plain text if you open it in View Source mode, i.e. by constructing a view-source: URI out of the original URL and accessing it that way. So to bypass the download nag screen for Anchor.m, you'd access it as <view-source:https://www.w3.org/History/1991-WWW-NeXT/Implementation/Anch...> instead. This even works on Firefox for Android, but not Chrome. (Of course you don't need to construct these by hand most of the time; if you're on a page linking to a bunch of files that otherwise prompt you to download them, like the Apache directory listing in this case, you just view the source of the directory listing page and click the links in the href attribute values there.)

This is pretty useful in multiple circumstances—which of course means that someone at Mozilla Corp will remove it from Firefox or find some other way to stop you from doing this any day now.


> which of course means that someone at Mozilla Corp will remove it from Firefox or find some other way to stop you from doing this any day now

Mozilla has its fatal flaws and I'm all for pointing them out (loudly), but it seems like time would be better invested at actually pointing the real flaws.

For all their flaws, I don't think they are out there actively trying to piss people off, even if it can feel like they are. I don't see why they would remove this. It doesn't seem like supporting this use case causes any complexity in the code or in the UI / UX.

Hatred is not free, I would suggest spending it wisely.

(of course, I know you are probably being sarcastic)


[flagged]


Your response's logic applies to your response far more than to theirs.


[flagged]


"Don't be snarky."

https://news.ycombinator.com/newsguidelines.html

Edit: we've been having to ask you to stop breaking HN's guidelines for a long time:

https://news.ycombinator.com/item?id=40778889 (June 2024)

https://news.ycombinator.com/item?id=34194251 (Dec 2022)

https://news.ycombinator.com/item?id=29797582 (Jan 2022)

https://news.ycombinator.com/item?id=27241824 (May 2021)

https://news.ycombinator.com/item?id=19411598 (March 2019)

https://news.ycombinator.com/item?id=17641642 (July 2018)

https://news.ycombinator.com/item?id=16000997 (Dec 2017)

https://news.ycombinator.com/item?id=14745494 (July 2017)

https://news.ycombinator.com/item?id=14740536 (July 2017)

https://news.ycombinator.com/item?id=14482567 (June 2017)

If this keeps up, we're eventually going to have to ban your account. I don't want to do that since most of your comments are fine, but it's the bad ones that matter for moderation. If you'd please review https://news.ycombinator.com/newsguidelines.html and stick to the rules from now on, that would be good.


So people can literally boil the earth with this garbage and that gets boosted on this site but you want to ban me because I dared to be snarky over it, did I get this right? Snarky is the very least we can do. I haven't even started calling them a blight on humanity.

What about leading for once and banning all LLM from the site?

I stand by everything you linked: all crypto"currencies" are scams, JavaScript is garbage, "rugged individualism" defines American people and we could call that selfishness, we can not be tolerant of other people's own definitions of words especially when those definitions deny someone's existence, AI art is stolen art, AI can only be used to generate bullshit, AI is adding to the climate suicide mankind is committing.

Go ahead and ban me if you want to, the truth hurts, I guess.


I know how powerful that argument feels, but since it can justify breaking any rule, I don't think it's convincing. Of course HN's rules are less important than the planet being boiled. What isn't?

HN mod comments are not about climate change, cryptocurrency, Javascript, individualism, AI, theft, suicide, or mankind, or truth. They're just about whether your comments are sticking to the intended spirit of the site, as expressed in the guidelines.


What else is there to do against this flood of LLM spam?


Flag and downvote, and let the mods do their job.


Huh? The fact that someone used an LLM to write part of their code doesn't make their blog post "LLM spam".

And it's complete nonsense to say the repository was "generated by an LLM". The code doesn't even touch the file contents, and it has straightforward functionality that was reviewed by a human.


Please don't respond by breaking the site guidelines. That only makes things worse.

(Your post would be just fine without the "Huh?" and "complete nonsense" bits.)

https://news.ycombinator.com/newsguidelines.html


"Huh" is not me being snarky here, it's because I'm unsure if I interpreted the comment correctly and I want to express that.

As far as "complete nonsense", okay, I'll keep in mind that that language is considered too rude no matter what the circumstances are. I'm not trying to respond to rule-breaking by also breaking the rules, I'm just trying to use a strong statement to argue against an attack like that.


I'm unsure if I interpreted the comment correctly and I want to express that.

An unambiguous way to express that is to ask about the thing you find unclear.


In my experience when I think I probably understand but I'm not sure, asking for clarification and then waiting screws up the flow of conversation and half the time doesn't get an answer.

And more directly asking for clarification in a normal tone is usually just as open to negative interpretation as "Huh?" is. Sentences along the lines of "What do you mean?" can be taken quite badly!


Yelling 'Huh?' at people disrupts the flow of conversation a lot more by being hard to distinguish from something an online jerk would do.


If you interpret anything as yelling it's disruptive. But I'm not sure why that would be the interpretation. It's a short preamble to the next sentence.


This happens to me all. the. time. (with me, it's "wait" and "what?" not "huh?"). In a normal speaking conversation, it flows naturally. In writing, it doesn't come across as conversational punctuation so much as a kind of dunk. It might just be something you'll have to be vigilant about.


Alright, I appreciate the detail.


The problem is that the 'you' in 'if you interpret' is 'many (probably most) people reading online comments'. That's the point the mod comment is making. The effect counts rather than the intent.


You got good responses from fellow users and I don't want to pile on!

But in case it's of interest or helpful: "huh?" is one of the many markers of internet putdown, and in the weird hierarchy of these things, the punctuation matters: "Huh?" comes across as snarkier than "Huh." or "Huh," or "Huh!" would. I wouldn't have posted a mod reply for that alone, but since the comment also included the name-calling "complete nonsense", it tipped me over the threshold.

I totally believe you about your intent, but that's where things get dodgy because no one (else) has direct access to that. Hence these years-long sequences of modtalk:

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: