Hacker News new | past | comments | ask | show | jobs | submit login
Continuous Unix commit history from 1970 until today (github.com)
160 points by lelf 31 days ago | hide | past | web | favorite | 43 comments


     int n;
     signal(SIGALRM, dingdong);

I haven't written C in a bit, so I'm a little confused about the second part of that snippet; is `setalarm` a macro? It doesn't look like a function declaration, as there's no return type, and the presence of `int n;` after `setalarm(n)` makes it seem like it would be shadowing `n` if `n` were a parameter rather than an argument (not to mention the fact that `int n;` is outside of the braces. Given that `setalarm` doesn't seem to be declared/defined anywhere in that file, can anyone who's a bit more familiar with the codebase point out to me what's going on here?

it's pre-ANSI C function definition

There was a time when "I haven't written C in a bit" was a reason for immediately recognizing code such as this. (-:

Interestingly, this way of function definitions is "still" the way subroutines/functions are defined in (modern/all time) Fortran (if one ignores the syntactic differences). Also, at "old" C, declarations only may come at the beginning of a block -- just as it still is with Fortran. Out of curiosity, is there any reason for the similiarity?

At least for the declarations at the beginning of the block, that allows for very simple compiler design as it is very easy to split the code and data sections. Start off allocating space for all the variables it sees, and then when it reaches something not a declaration or definition, start emitting assembly.

The good news is that all commit times fit in the unix timestamp which started in 1970 :)

I mean, that's not a coincidence.


As a piece of history and source for research, this is simply amazing. They show an example of `git blame` on `pipe.c` that includes names like Ken Thompson and Bill Joy; pretty cool.

It is important to realize that it does have some gaping holes, though.

For example: I was recently unable to unearth from it exactly when the BSD ps command changed to using getopt() and minus signs for arguments. The entire history of the FreeBSD repository back to the original 4.4BSD commits has ps using getopt(), and minus signs have been the only option syntax documented in the ps(1) manual page for all of that time, too.

* https://svnweb.freebsd.org/base/head/bin/ps/ps.c?revision=15...

* https://svnweb.freebsd.org/base/head/bin/ps/ps.1?revision=15...

This repository, alas, does not yield very much further information.

For the curious: The furthest back that I have managed to trace it so far is 1986, using Usenet archives. A patch that was posted to mod.sources on 1986-08-01 (v06i083, Michael A. Callahan) apparently references getopt() argument parsing and mentions "ps -U", implying that 4.2 BSD ps was using getopt() by then. (Although there's no direct mention of getopt and this may be wrong.)

* https://groups.google.com/d/msg/mod.sources/H-shkbdVIbs/VJUI...

The repository unearthed the long lost source code of the original style and diction tools by Lorinda Cherry: https://github.com/dspinellis/unix-history-repo/tree/BSD-4_1... GNU style and diction are their faint copies.

Found it interesting someone has created a GitHub account for Dennis Ritchie: https://github.com/dmr-1941-2011

Note how the year selector on the right of "Contribution activity" is broken - seems it can't fit 1970-2019. Same happens with Ken Thompson's profile.

As linked in the repository: https://youtu.be/S7JB0mhrGCQ

For this alone, all the work paid off.

On a more serious side, I wonder what we can learn from this data. Its not a question for the field of computational social science, but social science of computation!

I wish someone would do this for legal code, starting with federal law with hashes for all representatives

(way I envision it is that their committees being their own branches, along with the chambers of congress

even though its not part of the git protocol, the GUI element of the Pull Request feature could have continuous integration that showed when a threshold of votes were passed to get something into to the next branch like out of committee and onto the floor

then after whichever path was taken to get something codified into law, it is merged into the master branch where the agencies have their own process to update the code of federal regulations)

> even though its not part of the git protocol, the GUI element of the Pull Request feature could have continuous integration that showed when a threshold of votes were passed to get something into to the next branch like out of committee and onto the floor

I'm wondering if Fossil would be more suited to this than Git.

It would be interesting to do this for one of the open source Solaris distros and for plan9 to then compare the three.

It’s missing those specifically, but http://fxr.watson.org has quite a few kernels

I can't believe they used git even in the 70s and they managed to do it all in just 4 commits. Incredible engineers and astounding geniuses.

The main product I work on started in RCS, moved to CVS then to git. I have continuous history going back to 1988.

What is it?

Closed source language product, though parts are open source, like this:


First commit 1991.

> This repository will be often automatically regenerated from scratch, so this is not a place to make contributions. To ensure replicability its users are encouraged to fork it or archive it.

I've run into this with source code archeology projects. Git is ill-suited to integrating newly discovered pre-history/missing intermediary history steps. Any new historical change, out of necessity, alters all subsequent commit hashes. This means collaboration and permalinks can't happen like with "normal" Git repos.

Does anyone know of any tooling or an alternate vcs which has the ability to integrate new pre-history or alternate history (e.g. original branch commits alongside a squash and merge commit) without requiring completely breaking/re-writing the entire tree?

Git has a mechanism to declare two commits equal and replace one with the other: man git-replace

This comes at the cost of having intentionally multiple histories and is not well-suited for complicate cases, but for the common case of "we want to stitch this old CVS-history to this commit", it does a good job.

Usability-wise, replace refs are not cloned automatically and some web-based tools lack support for it.

You can also use the 'magic empty tree object' to make your repo more amenable to joining disparate histories of other trees. It requires both the source and destination repo to abide by this practice, but I do this on all of my repos...

Immediately after `git init`, do `git commit --allow-empty -m"initial empty commit"`. Now you have an empty commit, and any other repo which has this empty commit has some history in common with your repo.

The SHA-1 hash is well-known and there are plenty of articles you can find about it, if you search for 4b825dc642cb6eb9a060e54bf8d69288fbee4904

Here's one for example:


I'm not sure this helps for projects which have a shared development history, but not a shared commit history. But within an organization, where you have projects which may split and/or merge, it can help to bridge some gaps.

You can have unrelated commits and roots in a single git repo. No need for empty root commit.

I think the payoff is for merging unrelated histories, that you can then rebase one history on the other, and present a new unified history.

Is the really only reason why people do this, so that you can rewrite your initial working commit in a rebase? I think that might be it. (You can't easily rewrite the initial commit with a rebase.)

For some other github accounts, see https://unix.stackexchange.com/q/320133/5132 .

Linked from the README is the tooling: https://github.com/dspinellis/unix-history-make

Git wasn't released until 2005, so what did they use before then? I know it says they pulled the history from 24 snapshots.

I assumed this history was meticulously reconstructed with a lot of hard work, but apparently Unix was one of the first systems maintained in SCCS.

I guess the part from before SCCS was still meticulously reconstructed?

Poking around a bit in the repo, I think the history prior to SCCS is just release snapshots. e.g. there is no history for the cat command in the PDP-7 Unix : https://github.com/dspinellis/unix-history-repo/commits/Rese...

iirc, Linux used to live inside of BitKeeper which was a paid for solution. I believe Linus wrote git as a replacement because he got fed up with the commercial licensing (even though they gave free copies to linux devs).

Edit: this doesn't actually answer your question of where unix used to live, rather than linux though....

> I believe Linus wrote git as a replacement because he got fed up with the commercial licensing (even though they gave free copies to linux devs).

No - BitKeeper revoked the free license because some guy (unrelated to Linux) wrote a client that reverse engineered more features than BitKeeper liked.

And Linux actually used BitKeeper only between 2002 and 2005; prior to that time, it did not use a source control system at all.

The BitKeeper reverse engineering is a fun story to read: https://lwn.net/Articles/132938/ (and short!)

The funny thing is the "reverse engineering" involved telnetting into the server and typing "help".

And if I recall correctly it sent back SCCS-style data

They did use a system---tarballs and patches---but it was just a socially constructed system rather than rigorously enforced using software. Linus has often said he would have rather gone back to that system rather than use CVS or SVN.

Fortunately instead he created Git. I'm a fan of Git, though I'm aware many aren't.

The "some guy" was Samba/rsync developer Andrew Tridgell.

Legacy code.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact