Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: If Unix is written in C, how did they run C before Unix?
53 points by v7engine on April 4, 2023 | hide | past | favorite | 48 comments
C is a programming language and Unix is an OS. So, how did they run C before Unix era?



V1: (PDP-7 Unix) Kernel is written in assembly. C does not exist yet.

V2: (PDP-11 Unix) Kernel is written in assembly. C compiler is written in assembly.

V3: Kernel is written in assembly. C compiler is written in C.

V4: Kernel is written in C. C compiler is written in C.

https://www.tuhs.org/cgi-bin/utree.pl?file=PDP7-Unix

https://www.tuhs.org/cgi-bin/utree.pl?file=V2

https://www.tuhs.org/cgi-bin/utree.pl?file=V3

https://www.tuhs.org/cgi-bin/utree.pl?file=V4


On top of that, C came about specifically because Ken Thompson and Dennis Ritchie wanted a better language for implementing Unix utilities and, later, Unix itself. https://en.wikipedia.org/wiki/C_(programming_language)#Histo...


Part of the reason for registers carrying powers of 2 bits is the runaway popularity of C and Unix. The PDP-11 had 16-bit registers. Some of the programming conventions we still use today originated with a humble PDP-11 at Bell Labs.


I would put most of the popularity for word lengths that are powers of 2 on the IBM System/360 (1964), rather than the PDP-11 (1970). The enormous popularity of the System/360 is what made computers standardize on bytes, along with words that were multiples of bytes. Prior to that, word sizes were all over the map: 6, 12, 36, as well as really random values such as 23, 27, 29, or 35 bit words.

This isn't to deny the popularity of the PDP-11, of course, but its use of 16-bit words was more of a consequence of the System/360. DEC's use of 12-bit words goes back to the LINC (1962) but by 1970, bytes were obviously the way to go.


What does C have to do with power of 2-bit registers? CHAR_BIT isn’t restricted and could be 7.


Doesn't the standard state CHAR_BIT has to be >=8? If could still be, say, 11 of course.


And to extend that a bit for the OP also read: https://en.wikipedia.org/wiki/Computer_programming_in_the_pu... and https://en.wikipedia.org/wiki/Punched_tape and similar literature for the "interface" with a computer. No "terminal" required :)


And then later versions of the C compiler were written in earlier version of the C compiler...


We had a few pretty cool courses in university touching this thing. It depends a bit on the work you want to do, but:

- Pick a language that's simple enough. A subset of ML would be good, but if you want to complete it, I'd recommend a simple LISP. This is your new language. This is C.

- Use a language you know and like to implement a compiler for this language. This is your bootstrap language. Compile this for example into C--, ASM or LLVM, depending on what you know. This is the target language. As a recommendation, keep this compiler as simple as possible so you have a reference for the next step. For C, both the bootstrap and the target language were ASM.

- And now iterate on extending the stdlib your language has, until you can implement a compiler for your new language in your new language. Again, keep this compiler simple without optimization or passes, just generate the most trivial machine code possible. This usually takes a bit of back-and-forth. You'll need some function evaluation first, some expression evaluation first (this is where a lisp can be an advantage, as those are the same), then you need function definitions, then you need filesystem interactions and so on. You kinda discover what you need as you implement.

- Once you have all of that, (i) compile the compiler for the new language in your bootstrap language and (ii) compile the compiler for the new language using the result of (i). If you want to verify the results, compile the compiler again with the output of (ii) and check if (ii) and (iii) are different.

- Your new language is now self-hosted.

This was fun, because it was accompanied with other courses like how processor microcode implements processor instructions, how different kinds of assembly is mapped onto processor instructions, and then how higher level languages are compiled down into assembly. All of this across 4-6 semesters resulted in a pretty good understanding how Java ends up being flip-flop operations.

EDIT - got target & bootstrap mixed up in first part.


Do you have any books/online courses recommendations for learning what you just described?


A lot, if not all of this, was based on the Dragon Book[1]

https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniq...


A lot of people build toy languages too -- to the point that some become self hosting.

But there are also just an absolute ton of parsers out there to define a grammar and parse the language into an AST (Abstract Syntax Tree).

Once you have an AST then you can build an interpreter that runs the code directly in a JVM, or you can build a compiler that translates AST into LLVM, say.



They didn't. There was no C before Unix. The first Unix was written in Assembly, and was used to develop the first C compiler.


Computerphile on youtube has a pretty good series on bootstrapping a compiler:

https://www.youtube.com/watch?v=lJf2i87jgFA&list=RDLVlJf2i87...


see Dennis Ritchie's The Development of the C Language: https://www.bell-labs.com/usr/dmr/www/chist.html

pre-C Unix was written in assembly


I swore at one point Unix was written in a dead language called "B". Perhaps that was a joke that went over my head.


The start of C and the start of Unix are approximately at the same time. In that time, there was a stepwise process. The first Unix (IIRC, someone can correct me) was written in assembly. Also written at that time was a simple compiled language. That compiled language was used to write a more complicated compiler. There was another such cycle (I think) before there was a C compiler - and even then, it wasn't the full C, but a subset. Unix was then re-written in C.


I think it was BCPL then B then C.

That is why there was a nerd joke that the follower language to C should be P, not C++


BCPL is actually the abbreviation of Bootstrapping CPL.

Originally BCPL wasn't going to be used for anything beyond bootstraping the CPL compiler, eventually it took a life of its own.

https://en.m.wikipedia.org/wiki/CPL_(programming_language)

There is some irony that for UNIX workloads we are stuck with the evolution of a language whose main purpose was only to bootstrap a compiler and be done with it.


First Unix was written in asm. Then Ken came up with B to write some userland stuff more easily. B was not portable. B was a BCPL-like, but smaller and simpler.

Then Dennis came up with C. C was just portable B (it added types for the purpose of getting around the fact that word sizes differed between archs). And then the B code was recompiled with C and modified as needed. Though C is back-compatible to B somewhat (hence why C has the thing where no type implies int).


Redneck dog meme: If we wuz descended from wolves, how are there still wolves?

(I'm just teasing, as you were.)


Bootstrapping is a fun thing. You already have some good answers.

Now imagine how most things around you were made, how higher tech was made with lower tech. How they made high precision tools, when there were only lower precision tools available? For example: how to make a 0.001 mm precise caliper when all you have is a 0.1 mm one? There were a lot of challenges like that and we still get to new ones. I just wonder what general term is used for things like that.


C does not depend on Unix in any way. Hell, 8 bit micros like Commodore 64 and ZX Spectrum had C compilers.


That is technically true, but historically false (C was created for Unix).


You don't need an operating system to run a program. Nothing about C requires Unix (or any other OS).


> Nothing about C requires Unix (or any other OS).

That is not quite true, the C specification defines the standard headers, and there are many facilities thereof which don't make much sense outside of an OS (environment, filesystem, dynamic memory allocation, threads, ...)

There is a somewhat restricted subset of C called "freestanding", which is not required to accept every "strictly conforming program", specifically the standard only requires a small subset of the standard headers to be implemented: <float.h>, <iso646.h>, <limits.h>, <stdalign.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, <stdint.h>, and <stdnoreturn.h>. Everything else is optional, and may be implementation-defined rather than standard-conforming.


Support for operating system facilities in no way implies those facilities are necessary.

> filesystem, dynamic memory allocation, threads

All of which can be (and has been) done without an OS. Besides, there is no native C language support for any of those things. Those things are usually (but certainly not always) done by library calls, not language primitives.

My underlying point is that the premise of the question is mistaken. It's assuming that an operating system is required in order to have a programming language. That is simply untrue, and is particularly untrue for C, which is very commonly used to program microcontrollers that have no operating system. I'm working on one such system right now.


All true, it is also the reason why POSIX takes the role of the "C standard library" that didn't fit into ISO C.


C existed as a cross platform language long before there was a C standard specification or standard headers. Specifically, C was introduced around 1972 and the first standard was 1989. (Wikipedia). There were C compilers for non-Unix systems during that time (e.g., Borland Turbo C for MS-DOS around 1987). (Wikipedia).


I was using C compilers for micros well before those upstarts like Borland and MS (or even DOS) came around. The first micro I used C on was a Cromemco System III, running CP/M. That machine was built in 1977, I think, although I was using it a few years later than that.


The K&R C dialects were anything but portable, specially in non UNIX platforms.

There was hardly anything much portable across RatC, Small-C, and other K&R C subsets.


Any self-hosting language compiler has to start from somewhere.

Generally it's an incremental process where the compiler for an early/subset version of the language is written in another, existing language (absent one, may be assembly code).

Once it's possible to rewrite the compiler in its own subset language, it becomes self-hosting. Then you can add a feature to the language, and once it works, enhance the compiler to use it, and so on.

Eventually the language and compiler go hand-in hand: The only way to compile it is with a compiler, and the only way to compile the compiler is with itself. This leads to interesting thought experiments such as:

https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...



As the question has already been answered, I'll just add that the general principle is known as self-hosting:

https://en.wikipedia.org/wiki/Self-hosting_(compilers)


The question is malformed because there is no need for Unix to run C programs. In theory they could have created C without an OS. However, in practice C was written to rewrite Unix in a high level language (previously it was assembly).


They ran it in UNICS.

UNIX is just an UNICS rewrite in C, and was done by the authors of UNICS and C.


A related question: was an Operating System first known as a Time Sharing System? I've seen most of what I've known as an OS in this era termed that, also I read that the term OS was coined for CP/M.


IBM had OS/360 in 1966, about a decade before CP/M. Not sure where "OS" first appeared but it was before that.

Related: early OSes were not timesharing systems. They were batch oriented - you submit your stack of punch cards to the operator, and pick up your output later once they've had a chance to run it.


Operating systems abstracted the hardware (and other things) rather than having everything embedded in the program (as is typical in many embedded applications even still today). They don't have to offer time-sharing (aka, a scheduler), but that does constrain them to running a single process to completion before beginning the next. There were OSes before there was time-sharing OSes.


It would be useful to see if the software architecture of Whirlwind I [1] would have a piece that was like an OS. It is about a decade earlier than the IBM 360. Also it is a realtime system that tracks incoming airborne objects. This means that paradigms like "submit a batch job" are probably not an important part of using Whirlwind.

[1] https://en.wikipedia.org/wiki/Whirlwind_I


Lots of C implementations don't/didn't use Unix as the underlying OS.

There were Cs for MSDOS, Cs for CP/M, even Cs for Windows, etc, etc.


You don't need an OS written in a language to use that language, as evidenced by Python/JavaScript/BASIC/Go/...


As I understand it, the first versions of Unix were written in PDP-7 assembly. Thompson and Ritchie's C work took place on that.


>If UNIX is written ...

Doesn't that imply already that C precludes UNIX? The question doesn't make sense.


you've always been able to run C on bare metal without an OS


Not before there were C compilers.


Compiling C can be done by humans on pen and paper.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: