
Dennis Ritchie's first C compiler on Github - jnord
https://github.com/mortdeus/legacy-cc
======
a_bonobo
Coincidences... I thought "How come Warren Toomey (one of the guys of the Unix
Heritage Society [1]), has never posted this?"

Turns out, this Github-repo is just a mirror/copy of his work, but with
attribution [2]. Still worth reading through there, tuhs also stores some
extremely old UNIX versions.

[1] www.tuhs.org

[2] <http://cm.bell-labs.com/cm/cs/who/dmr/primevalC.html>

Edit: Warren has written a paper on restoring ancient UNIX versions and
C-compilers, you might like it [3]

[3] <http://epublications.bond.edu.au/infotech_pubs/146/>

Edit2: Now that I've thought a little bit about it, I'm not happy that the
sources are on GitHub in this form. This is Warren's work - he did a lot of
work in getting these tapes to work again, and "mortdeus" just copied the work
and didn't even change the folder-names - "last1120c" is the first tape,
"prestruct" the second. And you still need Warren's Apout emulator to get
these files to work.

~~~
shabda
> Edit2: Now that I've thought a little bit about it, I'm not happy that the
> sources are on GitHub in this form.

If one releases code under an open license and people use/copy/fork it, why
should you or the original author be unhappy. (As long as the license terms
are not being broken.)

If we had to worry about this each time we forked/cloned someone's work, it
would make code reuse very hard.

[Edit]

I am working on some Python charting. I found Pychart, which I found
interesting: <http://home.gna.org/pychart/> . Because it is in bzr which I
don't work with, I put it on github. Should I be worried that someone will
feel offended with it?

~~~
yoklov
If you're within the terms of the license, you're fine. The developer should
say so in the license (or choose another license) if they don't want it
reproduced anywhere. A link to the original is common courtesy, however.

In the case of the C compiler, I think the modifications are under the same
license as the original, but I'm not totally certain. If they aren't it would
be cause for concern.

------
rmrfrmrf
Reading Dennis Ritchie's code is as close to reading a religious text as I'll
ever come. The straightforward elegance of it is so inspiring!

~~~
anonymous
The lack of type noise makes it easier to read too.

~~~
yoklov
I can't disagree more! Everything is an int -- Basically untyped, in C terms!
Maybe less noisy, but hard to figure out what's going on.

------
lholden
The declaration of printf is both scary and pretty cool.

What happens when you have more then 9 substitutions specified in the string?
:D

edit: decided the code was a bit long to have pasted into my post. Can find it
at the bottom of <http://cm.bell-labs.com/cm/cs/who/dmr/last1120c/c03.c>

~~~
grandpa
This is the compiler's printf(), not the one that gets linked in with compiled
code. As long as the compiler itself doesn't have more than 9 substitutions,
it's OK. Your code will link with a different one.

------
androidb
I'm ashamed for having to google his name but for others like me here's a
glimpse:

"Dennis MacAlistair Ritchie (born September 9, 1941; found dead October 12,
2011) was an American computer scientist who "helped shape the digital era."
He created the C programming language and, with long-time colleague Ken
Thompson, the Unix operating system"

<http://en.wikipedia.org/wiki/Dennis_Ritchie>

~~~
jychang
You should have seen HN when he died. It was that hell week of a bunch of
early computer pioneers all passed away...

~~~
chimeracoder
To be honest, I don't remember much of a hullaballo when either Dennis Ritchie
or John McCarthy died. There was a post or two on each event, but nothing like
when Steve Jobs died (when literally the entire front page was posts about his
death).

(All three events were in October 2011 - Jobs, then Ritchie, then McCarthy)

------
mseepgood
He already used the right brace style (hanging braces, cuddled else) and the
right indentation (tabs, not spaces).

~~~
comex
Somehow over the years learning C on my own, perhaps because it seemed more
consistent to me, I got used to writing

    
    
        if(foo) {
    

omitting the space after control statements' names, which is almost never done
- except in very early C code. (It's not just this: V7 Unix often omits the
space as well.). I should probably settle on a less idiosyncratic personal
style, but I can't help but take a little heart from seeing "my" style in such
a famous codebase. :)

~~~
brianberns

        if(foo)
    

I avoid doing this because it looks like a function call.

~~~
kybernetikos
In an appropriately powerful language, it could be a function call.

~~~
TheBoff
This is all very well if you have a friendly, high level scripting language
like Ruby, but I'm definitely glad I don't have to write a C compiler where
functions can take arbitrary blocks of code.

~~~
rat87
> In an appropriately powerful language, it could be a function call.

> This is all very well if you have a friendly, high level scripting language
> like Ruby, but I'm definitely glad I don't have to write a C compiler where
> functions can take arbitrary blocks of code.

That's a surprisingly good setup because in Smalltalk(one of ruby's main
ancestor languages)

if/else is a method which takes a block closure.

    
    
        a ifTrue: [ l log: 'a is true'] ifFalse: [ 'a is false']
    

while

in ruby if else is a syntatic construct

    
    
        if a
            l.log ('a is true')
        else
            l.log ('a is false')
        end
    

probably more for perceived clarity/comfortability then speeds sake.

------
habosa
Is this written in C? If so, what compiled this? Sorry for the noob question,
I'm just a little lost.

~~~
tcas
Looks like a very early dialect. C assumes everything is an int unless
specified otherwise. You can declare parameter types after the function name.
So:

    
    
      init(s, t)
      char s[]; {
    

would be equivalent to:

    
    
      int init(char s[], int t) {
    

This still works with modern compilers.

I'd be interested if anyone has any more info about this:

    
    
      waste()		/* waste space */
      {
      	waste(waste(waste),waste(waste),waste(waste));
      	waste(waste(waste),waste(waste),waste(waste));
      	waste(waste(waste),waste(waste),waste(waste));
      	waste(waste(waste),waste(waste),waste(waste));
      	waste(waste(waste),waste(waste),waste(waste));
      	waste(waste(waste),waste(waste),waste(waste));
      	waste(waste(waste),waste(waste),waste(waste));
      	waste(waste(waste),waste(waste),waste(waste));
      }
    

Found in last1120c/c10.c

~~~
cwzwarich
From the linked description (<http://www.cs.bell-
labs.com/who/dmr/primevalC.html>):

A second, less noticeable, but astonishing peculiarity is the space
allocation: temporary storage is allocated that deliberately overwrites the
beginning of the program, smashing its initialization code to save space. The
two compilers differ in the details in how they cope with this. In the earlier
one, the start is found by naming a function; in the later, the start is
simply taken to be 0. This indicates that the first compiler was written
before we had a machine with memory mapping, so the origin of the program was
not at location 0, whereas by the time of the second, we had a PDP-11 that did
provide mapping. (See the Unix History paper). In one of the files
(prestruct-c/c10.c) the kludgery is especially evident.

~~~
tcas
Doh, I completely glossed over the readme and went straight to the code. That
makes sense -- Thanks!

Cool to think that that waste function can still compile with todays
compilers. A quick disassembly it seems to take up 751 bytes compiled on x64
using clang on O0.

------
aap_
Also see my B compiler that was inspired by these early C compilers:
<https://github.com/aap/abc>

~~~
willvarfar
Gorgeous! Have good fun, thx for the link

------
ch
Someone should check that thing for back doors!

<http://cm.bell-labs.com/who/ken/trust.html>

~~~
Mr_T_
Too late. Ken Thompson already put the back door in his B compiler.

------
atesti
If you like c compilers, also check out <http://bellard.org/tcc/> and
<http://bellard.org/otcc/>

The later is a c compiler in about 1kb of source code! It's quite functional
and can compile itself.

The first link is what came out of it: A compiler so fast, that it can boot
Linux from source code in a few seconds: <http://bellard.org/tcc/tccboot.html>

------
orangethirty
It felt weird being able to read it and understand (on a high level) what was
happening.

~~~
Kiro
I didn't understand anything but I want to. Where do you start reading to
follow the flow?

------
areddy
Can anyone explain the naming of files? c00.c, c01.c etc

~~~
aap_
c0 is the first pass (c to intermediate), c1 is the second pass (intermediate
to assembly), c2 is the optimizer. c0 is built from files beginning with 'c0'
and so on.

~~~
areddy
Thanks for the explanation

------
csense
Can anyone get this to compile and run?

Does anyone know what hardware the assembly language files are for?

Maybe you could produce a modified version with the archaic features removed,
compile it with a modern compiler, then use the binary produced to compile an
unmodified version. Or maybe there are still binaries of really old compilers
that can understand this code floating around out there.

Any ideas?

------
Tyr42
I don't understand this

    
    
        main(argc, argv)
        int argv[]; {
    

Is that still valid today?

~~~
rmrfrmrf
It's called a K&R style function definition, which was the way to do it _back
in the day_. Basically, you define your parameter names first, then you define
the parameter types immediately after the function but before the opening
curly brace. It's definitely not recommended today and can result in undefined
behavior if your compiler doesn't recognize it. If you're working with legacy
code, though, I'm pretty sure you can set some C compilers to allow for it.

To explain further:

    
    
        main(argc, argv)
        int argv[]; {
    

is equivalent to:

    
    
        int main(int argc, int argv[]) {
    

The old style definition works because C had a default type of int, so the
type specifications for the function main and the parameter argc could be
omitted.

As for int argv[]? What that actually represents is an array of memory
addresses that hold the command line arguments given. Obviously this becomes a
problem if you're on a 64-bit system, where int and (void * ) are two
different sizes. However, I checked this out on my 64-bit machine and it works
just fine:

    
    
        int main(int argc, unsigned long long argv[]) {
          char *firstarg = (void *)(argv[1]);
          printf("%s", firstarg);
        }
    

which, given "./a.out pickles" will print "pickles" (argv[0] gives the memory
address of the cstring "./a.out"). I'm guessing that, in the case of a
compiler, the memory addresses of arguments are more relevant to have than the
arguments themselves.

~~~
_kst_
The 1999 ISO C standard dropped the "implicit int" rule, so this:

    
    
        main(argc, argv)
        char *argv[];
        {
            /* ... */
        }
    

is illegal (strictly speaking, it's a "constraint violation"). Note that it's

    
    
        char *argv[]
    

not

    
    
        int argv[]
    

But this:

    
    
        int main(argc, argv)
        int argc;
        char *argv[];
        {
            /* ... */
        }
    

is still perfectly valid.

As for this:

    
    
        int main(int argc, unsigned long long argv[]) {
          char *firstarg = (void *)(argv[1]);
          printf("%s", firstarg);
        }
    

it's not a constraint violation, but its behavior is undefined (unless your
compiler specifically supports and documents that particular form as an
extension).

~~~
rmrfrmrf
We're talking about the code from the actual source files, not the standard.

Look here (lines 22 and 23): [https://github.com/mortdeus/legacy-
cc/blob/master/prestruct/...](https://github.com/mortdeus/legacy-
cc/blob/master/prestruct/c00.c)

The compiler code states int argv[], not char _argv[] (I assumed this is why
the OP asked for clarification in the first place, since char_ argv[] is much
more common).

You're right, in theory this is undefined behavior, but in practice on a
32-bit system, sizeof(int) will almost always be equal to sizeof(void *). I
was just demonstrating how one could recreate the code in the compiler while
on a 64-bit architecture.

