
Continuous Unix commit history from 1970 until today - lelf
https://github.com/dspinellis/unix-history-repo
======
self_awareness
[https://github.com/dspinellis/unix-history-
repo/blob/386BSD-...](https://github.com/dspinellis/unix-history-
repo/blob/386BSD-0.0-Snapshot-Development/usr/src/sbin/reboot/reboot.c)

    
    
        void
        dingdong()
        {
         /* RRRIIINNNGGG RRRIIINNNGGG */
        }
        
        setalarm(n)
         int n;
        {
         signal(SIGALRM, dingdong);
         alarm(n);
        }
    

:)

~~~
saghm
I haven't written C in a bit, so I'm a little confused about the second part
of that snippet; is `setalarm` a macro? It doesn't look like a function
declaration, as there's no return type, and the presence of `int n;` after
`setalarm(n)` makes it seem like it would be shadowing `n` if `n` were a
parameter rather than an argument (not to mention the fact that `int n;` is
outside of the braces. Given that `setalarm` doesn't seem to be
declared/defined anywhere in that file, can anyone who's a bit more familiar
with the codebase point out to me what's going on here?

~~~
jcelerier
it's pre-ANSI C function definition

~~~
JdeBP
There was a time when "I haven't written C in a bit" was a reason for
_immediately recognizing_ code such as this. (-:

~~~
ktpsns
Interestingly, this way of function definitions is "still" the way
subroutines/functions are defined in (modern/all time) Fortran (if one ignores
the syntactic differences). Also, at "old" C, declarations only may come at
the beginning of a block -- just as it still is with Fortran. Out of
curiosity, is there any reason for the similiarity?

~~~
ethhics
At least for the declarations at the beginning of the block, that allows for
very simple compiler design as it is very easy to split the code and data
sections. Start off allocating space for all the variables it sees, and then
when it reaches something not a declaration or definition, start emitting
assembly.

------
Aardwolf
The good news is that all commit times fit in the unix timestamp which started
in 1970 :)

~~~
ComputerGuru
I mean, that's not a coincidence.

~~~
eclipxe
thatsthejoke.jpg

------
cobbzilla
As a piece of history and source for research, this is simply amazing. They
show an example of `git blame` on `pipe.c` that includes names like Ken
Thompson and Bill Joy; pretty cool.

~~~
JdeBP
It is important to realize that it does have some gaping holes, though.

For example: I was recently unable to unearth from it exactly when the BSD ps
command changed to using getopt() and minus signs for arguments. The entire
history of the FreeBSD repository back to the original 4.4BSD commits has ps
using getopt(), and minus signs have been the only option syntax documented in
the ps(1) manual page for all of that time, too.

* [https://svnweb.freebsd.org/base/head/bin/ps/ps.c?revision=15...](https://svnweb.freebsd.org/base/head/bin/ps/ps.c?revision=1556&view=markup#l133)

* [https://svnweb.freebsd.org/base/head/bin/ps/ps.1?revision=15...](https://svnweb.freebsd.org/base/head/bin/ps/ps.1?revision=1556&view=markup#l42)

This repository, alas, does not yield very much further information.

~~~
JdeBP
For the curious: The furthest back that I have managed to trace it so far is
1986, using Usenet archives. A patch that was posted to mod.sources on
1986-08-01 (v06i083, Michael A. Callahan) apparently references getopt()
argument parsing and mentions "ps -U", implying that 4.2 BSD ps was using
getopt() by then. (Although there's no direct mention of getopt and this may
be wrong.)

* [https://groups.google.com/d/msg/mod.sources/H-shkbdVIbs/VJUI...](https://groups.google.com/d/msg/mod.sources/H-shkbdVIbs/VJUIw_L2hb8J)

------
mci
The repository unearthed the long lost source code of the original _style_ and
_diction_ tools by Lorinda Cherry: [https://github.com/dspinellis/unix-
history-repo/tree/BSD-4_1...](https://github.com/dspinellis/unix-history-
repo/tree/BSD-4_1_snap-Snapshot-Development/.ref-BSD-4/usr/src/cmd/diction)
GNU _style_ and _diction_ are their faint copies.

------
nes350
Found it interesting someone has created a GitHub account for Dennis Ritchie:
[https://github.com/dmr-1941-2011](https://github.com/dmr-1941-2011)

Note how the year selector on the right of "Contribution activity" is broken -
seems it can't fit 1970-2019. Same happens with Ken Thompson's profile.

------
ianbooker
As linked in the repository:
[https://youtu.be/S7JB0mhrGCQ](https://youtu.be/S7JB0mhrGCQ)

For this alone, all the work paid off.

On a more serious side, I wonder what we can learn from this data. Its not a
question for the field of computational social science, but social science of
computation!

------
caprese
I wish someone would do this for legal code, starting with federal law with
hashes for all representatives

(way I envision it is that their committees being their own branches, along
with the chambers of congress

even though its not part of the git protocol, the GUI element of the Pull
Request feature could have continuous integration that showed when a threshold
of votes were passed to get something into to the next branch like out of
committee and onto the floor

then after whichever path was taken to get something codified into law, it is
merged into the master branch where the agencies have their own process to
update the code of federal regulations)

~~~
amyjess
> even though its not part of the git protocol, the GUI element of the Pull
> Request feature could have continuous integration that showed when a
> threshold of votes were passed to get something into to the next branch like
> out of committee and onto the floor

I'm wondering if Fossil would be more suited to this than Git.

------
emidln
It would be interesting to do this for one of the open source Solaris distros
and for plan9 to then compare the three.

~~~
zeckalpha
It’s missing those specifically, but
[http://fxr.watson.org](http://fxr.watson.org) has quite a few kernels

------
e40
The main product I work on started in RCS, moved to CVS then to git. I have
continuous history going back to 1988.

~~~
atm0sphere
What is it?

~~~
e40
Closed source language product, though parts are open source, like this:

[https://github.com/franzinc/clim2](https://github.com/franzinc/clim2)

First commit 1991.

------
notpeter
_> This repository will be often automatically regenerated from scratch, so
this is not a place to make contributions. To ensure replicability its users
are encouraged to fork it or archive it._

I've run into this with source code archeology projects. Git is ill-suited to
integrating newly discovered pre-history/missing intermediary history steps.
Any new historical change, out of necessity, alters all subsequent commit
hashes. This means collaboration and permalinks can't happen like with
"normal" Git repos.

Does anyone know of any tooling or an alternate vcs which has the ability to
integrate new pre-history or alternate history (e.g. original branch commits
alongside a squash and merge commit) without requiring completely breaking/re-
writing the entire tree?

~~~
Rondom
Git has a mechanism to declare two commits equal and replace one with the
other: man git-replace

This comes at the cost of having intentionally multiple histories and is not
well-suited for complicate cases, but for the common case of "we want to
stitch this old CVS-history to this commit", it does a good job.

Usability-wise, replace refs are not cloned automatically and some web-based
tools lack support for it.

~~~
yebyen
You can also use the 'magic empty tree object' to make your repo more amenable
to joining disparate histories of other trees. It requires both the source and
destination repo to abide by this practice, but I do this on all of my
repos...

Immediately after `git init`, do `git commit --allow-empty -m"initial empty
commit"`. Now you have an empty commit, and any other repo which has this
empty commit has some history in common with your repo.

The SHA-1 hash is well-known and there are plenty of articles you can find
about it, if you search for 4b825dc642cb6eb9a060e54bf8d69288fbee4904

Here's one for example:

[https://stackoverflow.com/questions/9765453/is-gits-semi-
sec...](https://stackoverflow.com/questions/9765453/is-gits-semi-secret-empty-
tree-object-reliable-and-why-is-there-not-a-symbolic)

I'm not sure this helps for projects which have a shared development history,
but not a shared commit history. But within an organization, where you have
projects which may split and/or merge, it can help to bridge some gaps.

~~~
dearrifling
You can have unrelated commits and roots in a single git repo. No need for
empty root commit.

~~~
yebyen
I think the payoff is for merging unrelated histories, that you can then
rebase one history on the other, and present a new unified history.

Is the really only reason why people do this, so that you can rewrite your
initial working commit in a rebase? I think that might be it. (You can't
easily rewrite the initial commit with a rebase.)

------
JdeBP
For some other github accounts, see
[https://unix.stackexchange.com/q/320133/5132](https://unix.stackexchange.com/q/320133/5132)
.

------
pjc50
Linked from the README is the tooling: [https://github.com/dspinellis/unix-
history-make](https://github.com/dspinellis/unix-history-make)

------
beckler
Git wasn't released until 2005, so what did they use before then? I know it
says they pulled the history from 24 snapshots.

~~~
dboreham
RCS
[https://en.wikipedia.org/wiki/Revision_Control_System](https://en.wikipedia.org/wiki/Revision_Control_System)
then SSCS
[https://en.wikipedia.org/wiki/Source_Code_Control_System](https://en.wikipedia.org/wiki/Source_Code_Control_System)
before 1972 I'm not sure..

~~~
mcv
I assumed this history was meticulously reconstructed with a lot of hard work,
but apparently Unix was one of the first systems maintained in SCCS.

I guess the part from before SCCS was still meticulously reconstructed?

~~~
dboreham
Poking around a bit in the repo, I think the history prior to SCCS is just
release snapshots. e.g. there is no history for the cat command in the PDP-7
Unix : [https://github.com/dspinellis/unix-history-
repo/commits/Rese...](https://github.com/dspinellis/unix-history-
repo/commits/Research-PDP7-Snapshot-Development/cat.s)

------
cagataygurturk
Legacy code.

------
nukeop
I can't believe they used git even in the 70s and they managed to do it all in
just 4 commits. Incredible engineers and astounding geniuses.

