The Earliest Unix Code: An Anniversary Source Code Release

bbanyc · on Oct 18, 2019

I've been reading up lately on the earlier OS's that influenced Unix - the Compatible Time-Sharing System, the Berkeley Time-Sharing System (Project Genie), and Multics. Bitsavers.org and Multicians.org both have a lot of information on that early era (and Bitsavers has a whole lot more).

Lots of the ideas in Unix came from those earlier systems. What Thompson and Ritchie contributed was synthesizing these ideas into a more coherent whole, and demonstrating that they could be made a whole lot smaller. (Both in terms of the PDP-7 and 11/20 being smaller computers than the mainframes previous systems were written for, and in that descriptive command names were reduced to cryptic abbreviations. CTSS's LISTF was shortened to ls, ARCHIV to ar, RUNOFF to roff and then nroff/troff...) And of course Unix ran on the most popular computer of the 1970s and got rewritten in a portable language to let it run on every popular computer since then, while just about every other OS was tied to its specific hardware platform and died off as the hardware did.

All really fascinating stuff.

msla · on Oct 18, 2019

Having used Multics and a farrago of other OSes (not CTSS or the Berkeley timesharing system) as part of my retrocomputing hobby, I think the single biggest thing Unix brought to the day-to-day experience of using an OS was the pipe, and the concomitant transformation of the command line from just being a way to enter program names with command line options (on OSes which even have command line options as we know them, of course) to being a programming language in its own right suitable for rapid prototyping and the creation of glue code.

Glue code isn't glamorous. It isn't something which seems to get a lot of research put into it. It is, however, important to get right, and part of getting it right is foregrounding the right thing: The stuff you're gluing together, as opposed to the glue itself. This is something the "replace shell with a Real Programming Language" projects get wrong, in that the Unix shell defaults to treating unknown barewords as external programs as opposed to syntax errors. This plays Hell with any kind of automated analysis, but it's essential for a language primarily intended to glue those external programs together. Typing isn't just about the type system, after all.

TL;DR: OSes prior to Unix had surprisingly weak scripting facilities, and attempts to "improve" scripting tend to miss the point.

larsbrinkhoff · on Oct 18, 2019

I understand TENEX is largely a port of the Berkeley system from SDS 940 to PDP-10. So that would make Unix and TENEX siblings.

beefhash · on Oct 18, 2019

People are working on transcribing it: https://minnie.tuhs.org/pipermail/pdp7-unix/2019-October/000...

Incidentally, wkt confirmed that's one of the two artifacts[1] which he [teased a while ago[2].

[1] https://minnie.tuhs.org//pipermail/tuhs/2019-October/019106....

[2] https://minnie.tuhs.org//pipermail/tuhs/2019-September/01868...

saagarjha · on Oct 17, 2019

> PDP-7 assembly listing for “pd”

> Unidentified program.

> Might “pd” stand for “previous directory”?

I know approximately zero PDP-7 assembly, but this looks like it might be “print directory”, equivalent to pwd today? It seems to open its parent “dotdot” directory and write it out.

quicklime · on Oct 17, 2019

I don't know PDP7 assembly either, but I don't see how it could stand for "previous directory" (and be equivalent to today's `cd -`). Any command to change directory needs to be done in-process by the shell, so it would need to be implemented as a builtin, not a standalone executable.

So "print directory" or (print something about the) "parent directory" seem more likely to me.

kragen · on Oct 18, 2019

In the earliest versions of Unix, the shell didn't fork to run programs. It would exec the program, and then the program would exec the shell instead of exiting. Makes sense, right? That's what you'd do if you didn't have an operating system at all, and it's how, for example, CP/M worked. So you could write "cd" as a non-built-in command.

But it sounds like the other commenters have figured out that it's "pack directory".

Pete_D · on Oct 18, 2019

After staring at the listing for a while, I think it is a kind of garbage collector for removing unlinked files from directory listings ("pack directory"?). As best as I can work out, the code does:

    open ..
    loop
        read a directory entry into tbuf
        if we read 0 bytes (eof presumably), break loop
        if tbuf[0] == '\0', go back to start of loop
        append tbuf to dir
    done
    close ..
    reopen .. with creat
    write the stuff we built up into dir to ..
    close ..
    exit

kps · on Oct 18, 2019

I think you're right. My C-ish translation (that ignores errors) is:

    int dst /*8*/, src /*9*/, c1, df, tbuf[8], dir[BIG];
    char dotdot[] = "..      ";

    df = open(dotdot, O_RDONLY);
    dst = dir - 1;
    while (read(df, tbuf, 8) != 0) {
      if (tbuf[0] != 0) {
        c1 = -8;
        src = tbuf - 1;
        do {
          *++dst = *++src;
        } while (++c1 != 0);
      }
    }
    close(df);

    df = creat(dotdot);
    write(df, dir, dst - dir + 1);
    close(df);
    exit();

That is, copy any directory entry with a non-zero inode into dir[], and then write that back. Pack Directory.

According to https://wiki.tuhs.org/doku.php?id=systems:pdp7_unix ‘..’ was the name of the current directory (i.e. what is now ‘.’), so this operated on the current directory.

saagarjha · on Oct 18, 2019

Looking at this again, your psuedocode seems to be pretty reasonable. However, I'm still left wondering why this command would be necessary: doesn't it just read the directory entry into memory and write it back out again? I'm also curious why "sys write" takes .. directly when all the other calls seem to need dotdot.

microtherion · on Oct 18, 2019

Maurice Bach, "The Design of the UNIX Operating System", on page 73 describes the format of directory entries: https://archive.org/details/DesignOfTheUnixOperatingSystemBy...

"Directory entries may be empty, indicated by an inode number of 0".

So if the pseudo-code is correct, this may be an utility to squeeze out those empty entries.

Pete_D · on Oct 18, 2019

If my reading is right, it filters out any with a leading null-byte before writing back out. My guess is that that is how they implemented unlink/rm - just write zeroes over the entry.

As for the sys write: my current hypothesis is that .. there is a placeholder for an argument which is written to by the preceding dac .+4. ('"If a program can't rewrite its own code", he asked, "what good is it?"'). But I can't make sense of what it's actually putting there - I'd guess length, but it looks like ~(dir - 2) + 8.

kps · on Oct 18, 2019

> but it looks like ~(dir - 2) + 8.

That took me a while. The ‘8’ in ‘tad 8' is the contents of memory location 8, i.e. the destination pointer in the memory copy, so it's ~(dir - 2) + dst. And since -x = ~x + 1 (there being no negation instruction), that works out to the length, dst - dir + 1.

kps · on Oct 17, 2019

It only opens dotdot once at the start, though, and calls creat(dotdot) later.

kazinator · on Oct 17, 2019

If dotdot can't be opened it fails. So I don't think creat was the same as what we now know.

It looks like the code opens dotdot, and then scans through it for something (match for dir?).

The creat call might mean "open for writing", and possibly the flags mean append. It appears dir is appended to the dotdot directory file.

So maybe, add this "dir" entry to dotdot if it isn't there already?

I can't guess what initializes dir; maybe it somehow comes as an argument from the command line or whatever.

Pete_D · on Oct 18, 2019

dir appears to be copied into from tbuf at some point. It puts dir-1 into memory location 8 at the start, and then the code around the 2: section I'm pretty sure is a memcpy-ish loop like:

    c1 = -8;
    *9 = tbuf - 1;
    do {
        *(++(*8)) = *(++(*9));
    } while(++c1 != 0);
    goto 1b; // b for backwards?

It seems that memory locations 8-15 are auto-indexing[0], so the lac/dac i increments the pointed-at location before use, and isz is "increment and skip next instruction if zero".

[0] PDP7 manual, big PDF: http://bitsavers.trailing-edge.com/pdf/dec/pdp7/F-75P_PDP7pr...

saagarjha · on Oct 17, 2019

Maybe it just prints the patent directory, then. I’m not sure why it calls creat, though…

kps · on Oct 17, 2019

https://github.com/DoctorWkt/pdp7-unix/blob/master/src/sys/N...

I think ‘dotdot’ is what is called ‘dd’ there (the corresponding symbol in ls is still named ‘dd'). I don't think the concept of a parent directory existed yet. Maybe ‘pd’ is something like ‘prepare directory’, constructing the dd/dotdot entry.

Pete_D · on Oct 17, 2019

> Maybe ‘pd’ is something like ‘prepare directory’, constructing the dd/dotdot entry.

I think this is on the right track - it looks like the write call is to the newly created .., with df being used to hold the file descriptor.

kazinator · on Oct 17, 2019

But the program appears to fail if dotdot can't be opened for reading, and it appears to scan through it doing reads first. Then it closes and does a creat, which is expected to succeed also. I'm suspecting that at this stage of development, open may have been used for reading and creat for writing (including to an existing file).

kps · on Oct 18, 2019

From TFS¹, creat() on an existing file truncates it (i.e. to length zero, as O_TRUNC), and only the superuser can do this to a directory. open() has read/write flag bits as now.

I am leaning toward Pete_D's ‘pack directory’ idea.

¹ https://github.com/DoctorWkt/pdp7-unix/blob/e94417092a2980e7...

auvi · on Oct 17, 2019

Who owns the copyright? Novell?

bsdimp · on Oct 18, 2019

Nobody! There were no copyright notices, and this is a work made before the US adopted the BERN convention. Prior copyright law required copyright notices.

We learned from the ATT vs Regents case that a judge ruled there was a large likelihood that AT&T couldn't establish it had a valid copyright on V32 because they never marked it properly.

beefhash · on Oct 18, 2019

> Nobody! There were no copyright notices, and this is a work made before the US adopted the BERN convention. Prior copyright law required copyright notices.

...in the U.S.

This does not apply to other countries, especially continental European ones. They'll happily retroactively apply copyright for software, even when it wasn't explicitly protected by means of their jurisdiction claiming that software has is categorized as a work even before the convention. No license, no luck over there.

cf. https://virtuallyfun.com/wordpress/2018/11/26/why-bsd-os-is-...

If it's not deemed a work for hire (and given UNIX was a rogue operation at the time, that's not entirely unreasonable to question), then the copyright probably remains with Thompson and Ritchie themselves, or rather Thompson and Ritchie's family (or whichever way the inheritance process went). If it is, then probably Micro Focus via Attachmate via Novell via USL. Special considerations may also apply because it wasn't "published" in any sense of the world until past Ritchie's death.

bbanyc · on Oct 17, 2019

Either Micro Focus (through its acquisition of Attachmate/Novell) or Nokia (through its ownership of Bell Labs), depending on whether or not the Research Unix copyrights were part of the Unix business that AT&T sold to Novell. Bell Labs and USL were distinct business units within AT&T.

That's assuming it's deemed a work-for-hire (which if it isn't makes it the property of Ken Thompson and the Ritchie estate) and that it wasn't somehow "published" before 1989 (which if it was without a copyright notice, makes it public domain - but works written before 1989 but not published until after are under copyright regardless). It's confusing stuff.

fernly · on Oct 18, 2019

45 pages in before I saw the first comment. whoever wrote section 8, "what may be a simulation or game for billiards or pool", put actual one-line descriptions before each section. Later on he gets real chatty, e.g.

    fsin: 0203 " sine of the fine rotation angle
    mfsin: -0203 " negative of fsin

Seriously, I bet there is a lot to be learned figuring out how he was approximating trigonometry using 12-bit int constants.

boomlinde · on Oct 18, 2019

It looks like there are symbols named "sin" and "cos" not declared in the program. These are possibly system provided tables.

Accesses to "sin" and "cos" are preceded by what looks like a self modifying store ("dac .+3", interpreted to mean "deposit accumulator at current program counter + 3"), possibly to modify the "lac sin" and "lac cos" instructions to index the tables, but I'm not familiar with the instruction encoding.

If so, this is a relatively straight forward, non-magic way of implementing trigonometric functions.

6c696e7578 · on Oct 18, 2019

> Later on he gets real chatty

At least it wasn't one of those people who write comments that don't align with what the code is doing.

bbanyc · on Oct 18, 2019

>pp. 145−169 >PDP-7 assembly listings for “t1,” “t2,” “t3,” “t4,” “t5,” “t6,” “t7,” and “t8”

>Unidentified program.

>Perhaps an interpreter for a programming language? B?

Could be TMG, the language that B was originally written in. See https://www.bell-labs.com/usr/dmr/www/chist.html

bsdimp · on Oct 18, 2019

SIMH now has a reason to upgrade to support Graphics2: Looks like the spacewar source uses it!

larsbrinkhoff · on Oct 18, 2019

I'm doing this.

Ididntdothis · on Oct 18, 2019

Is this written in assembly? I thought Unix was written in C and you had to only write a C compiler to port it to other platforms. I remember that was the story I was told.

beefhash · on Oct 18, 2019

UNIX was written in assembly at first. By the Fourth Edition, the kernel was rewritten to be in a version of C (that won't go through a modern C compiler anymore), both 1973. The C compiler itself was introduced in the Third Edition. Parts of userland would still be in assembly even until the 7th Edition (1979), such as roff(1) (nroff/troff were in C), parts of as(1) or chess(6).

pjmlp · on Oct 18, 2019

Not at all, UNIX had a couple of releases in Assembly before the C rewrite came to be.

Likewise there were other high level systems programming languages being used since 1961.

The lost notable one being ESPOL, replaced a couple of years later by NEWP, both Algol derivatives for systems programming. The OS was called Burroughs B5000, used compiler intrisics with zero Assembly and is still being sold nowadays by Unisys, as ClearPath MCP.

There are other notable examples available to discover there was a decade old of other OSes and systems languages.