Author here. Just for context tinyc.c was created in 2000 (I found the file in my archives and the last modification date is January 12, 2001). I was not aware at the time of Fabrice Bellard's work which after all won the IOCCC in 2001, so the confusion with TCC was not intentional. My tinyc.c was meant to teach the basics of compilers in a relatively accessible way, from parsing to AST to code generation to bytecode interpreter. And yes it is the subset of C that is tiny, not a tiny compiler for the full C language.
Haven't looked at what it would take to support C89 (or other) fully. Certainly it is possible to extend the compiler in small increments to implement more stuff: all the operators, local variables, arrays, pointers, functions, etc. The hardest part I would say is implementing types (starting with the C syntax for types!) and also handling the memory allocation for them. Nothing too complex when you know what you are doing.
My personal interest is with higher-level languages and I have worked on various Scheme systems over the last 30 years. If you like small language implementations then take a look at Ribbit which is feature-full with a tiny VM. Also check out sectorlisp (not my work) which is a fascinating tour de force.
tinyc.c was designed to illustrate three things at once in a second year course on concepts of programming languages. The course had 4 parts: imperative programming (using C), functional programming (using Scheme), and logic programming (using Prolog), and programming language implementation. For the imperative programming part it was important to show enough of the C language for the following operating systems course, so we needed to show C manual memory management and pointers in a rather detailed way. So tinyc.c was principally an example of programming with pointers, including pointer arithmetic such as *pc++. It was indirectly an example of compiler and interpreter, subjects we also covered in the course. I have also used parts of the compiler in a third year compiler course but not as the basis of a project. I have always asked (forced?) my students to use Scheme to implement their compiler projects... a much friendlier language than C for compiler writing.
I'm not sure what you mean by "minimal set of instructions". What is more important for bootstrapping is to restrict the programming style used to write the compiler. So if that was the goal I would implement pointers and maybe arrays, and not implement other types such as structs (which can be useful but add complexity to the compiler, i.e. it is no longer the case that a value fits in a machine word). Functions and function call would obviously be useful to add. But that's about it.
What I'd love to do someday is to write a compiler for a fairly complete C subset in POSIX shell. The goal would be to use only a POSIX shell and this compiler to compile TCC, and then use TCC to bootstrap gcc, all from source. This would be a great tool for reproducible builds from source. If someone here finds this interesting and would like to help out, please reach out to me.
I use tcc for all of my small C "scripts" for doing ioctls, etc. Less bloat, suckless. I imagine most software would be better off using tcc than gcc/clang. Performance isn't that important in most cases.
Since they say "scripts", note that tcc supports being invoked in the shebang line. E.g.
#!/usr/bin/tcc -run
You can do that with gcc/clang too (e.g. #if 0, #endif to wrap a block of shell script to compile the current file and execute the result) but a primary value of tcc is that it compiles fast.
On a more philosophical note, the suckless approach is to optimise for simplicity not storage. It's perfectly valid to disagree with that of course, but if simplicitly of the system as a whole is a consideration gcc and clang doesn't really fit.
You can only sort of do that with gcc/clang. The #if 0 trick relies on funny behavior that is in a few common shells. When you try to execve(2) a script without a proper #! shebang, the kernel will return ENOEXEC. Bash will check for ENOEXEC then check a few heuristics to see if it looks like a text file, and if it does, then it will try to run it as a shell script.
This means that your script will work when run from a shell, but won't work when exec()ed from a non-shell program, which is a weird foot-gun.
That makes sense. When I checked the tcc command line I was mildly surprised to see you could do the #if 0 hack at all, so not surprised to hear it has severe limitations.
I feel like a lot of software written in C is written in C for performance reasons. Obviously that’s not always the case and TCC is useful but I wouldn’t say that that most software should use it
C is usually the path of least resistance on *nix, if you're trying to interact with the OS. The docs give you C prototypes and C examples, and the source is mostly C, several standard directories are filled with C code and libraries compiled from C.
I think you are confusing the work of Frabrice Bellard with this very one.
The former is a C-language compiler. This once is a compiler for a language called "Tiny C".
Understandable confusion, though.
It is sad that tcc is unmaintained as it would be really useful in small embedded systems. I just tried it on Debian and compilation fails without #undefining CONFIG_TCC_MALLOC_HOOKS in lib/bcheck.c. After compilation it passes tests, but they warn that it could be unreliable.
While Fabrice Bellard is no longer working on TCC [0] and an official release tarball hasn't been packaged since version 0.9.27 (5 years ago) the project is by no means unmaintained.
For details, check their current working repository [1] and mailing list [2].
This is an interpreter for a super restricted subset of C and it looks well written from a pedagogical standpoint (keeps thing pretty simple, fairly easy to read). But it's slightly awk to strip-down a language (what features do you keep, what do you lose?). I think it's more fun to build an interpreter for an actual tiny language. In my next book I have interpreters for Brainfuck [0], an obfuscated kind of joke of a language, and Tiny BASIC[1] a real tiny language that was used on early personal computers. These are pretty common first projects for folks interested in doing an interpreter.
Here's why real languages are better than stripped down languages:
Anyone with programming knowledge can implement a Brainfuck interpreter in a few hours and run any Brainfuck program. Anyone with a tiny bit of CS knowledge can implement a Tiny BASIC interpreter in just a day and then you can run any real Tiny BASIC program from the late 70s. It's cool to run real programs people actually used. With this stripped down C, there are no pre-made real programs...
Another language that's more modern and currently useful but which is very tiny to write an interpreter for is Lua [0][1]. Currently the official Lua interpreter has around 30k LOC which I find pretty amusing for a language used so widely in games and for scripting purposes [2]. Of course it's still at least an order of magnitude larger than a small Tiny BASIC interpreter but the fact it's a current language used in so many places makes it even more interesting to make your for-fun implementation.
Also related to small language implementations I find notable PicoC [3] which is a C interpreter written in around 3k LOC of C. Past discussion about it here 13 years ago [4].
1. You use the example of a tiny basic of a 'real language' and I don't see how tiny basic is a 'real language', but tiny C is a stripped down language.
2. You can build on this to make a full c implementation. A minimal c implementation that can potentially bootstrap a full c environment is more useful than a brainfuck interpreter.
1. I think I explained why. You can Google real programs people used to do their day to day computing tasks and games and run them unmodified. You can’t do that with this dialect of C. It’s cool to run real programs.
2. Good point, but it’s several orders of magnitude more work to go from this to C, so it depends if you’re doing a one off project or have much bigger ambitions. I think you get more bang for the buck out of Tiny BASIC if it’s a one-off project is my point.
FORTH is another language that's quick and easy to write from scratch, where you need a couple of dozen words written in assembler and then the rest of FORTH can be written in FORTH.
It's unfortunately not self-compiling, but has a structure which is very reminiscent of C4 --- another tiny C-subset compiler + stack-based VM which is self-compiling:
It's probably better to call it an interpreter, since it will also run the program and print the values of all non-zero variables afterward.
Calling it a compiler is (to me) really stretching things, I can't see any code to emit any other form of the code, it's all aimed at evaluating (executing) it.
Edit: oops, I didn't read the code closely enough, it does emit code but only internally, that code is what gets executed. Thanks for the corrections!
It is a compiler rather than a direct evaluator, since it generates bytecode for a stack VM --- and also includes the interpreter for that (look at the bottom).
I think this is a question of interface vs. implementation.
Python, JavaScript, and other languages which are traditionally considered interpreted but may do (JIT) compilation in their implementation are used as if they were interpreters: to the user, there's no separate compilation step. You run python somefile.py or node somefile.js (or refresh a browser holding a page), and editing the source code causes the next invocation to take those changes immediately. Contrast this with C/C++ and Java where there is definitely an explicit compilation step in nearly all implementations.
The program in this article thus is an implementation of a compiler, but has the interface of an interpreter.
In contrast, Java also did that and I doubt if most people think of Java as interpreted. So, using a byte-code interpreter may not be the criteria most people are using to decide on this. Truthfully, I think it is all a bit arbitrary.
That is definitely a compiler and anyone with a CS degree would call it that if they were discussing its functionality, because that's technically what it is. (Referring specifically to the part which compiles Python to bytecode)
Your SQL database also has a compiler. SQL is compiled to an execution plan. Compile doesn't only mean "create a machine code executable file".
> That is definitely a compiler and anyone with a CS degree would call it that if they were discussing its functionality because that's technically what it is.
None of these assertions is correct.
> (Referring specifically to the part which compiles Python to bytecode)
So referring specifically to something different than what I explicitly specified, it's called something else.
By that reasoning, a cow is a muscle and you are an acid.
> Your SQL database also has a compiler.
"Has a" and "is a" are rather different relationships.
> Compile doesn't only mean "create a machine code executable file".
I did my CS degree at umontreal and this was an assignment in a second year class. This was a pretty interesting introduction to compilers, and even if this is a toy subset of C, this was challenging, at least for me. We would get 0 if there were any memory leak, so we were pretty paranoid about it.
The second assignment was writing a Scheme interpreter.
That's kind of surprising they cared so much about memory use, a lot of one-shot C programs such as compilers don't bother freeing memory and let the OS clean up after them once they exit.
Recently I'm working on toy C compiler and x86 Assembler in TypeScript[1] and I can confirm that the amount of work that have to be done to compile and print simple Hello World is astronomically huge (as the satisfaction)
There was also a "Small C", see DDJ #45 (May 1980), "A Small C Compiler for the 8080s" by Ron Cain[4]. Cain references buying a copy of "the Tiny-C interpreter from Tiny-C Associates" and finding it too slow, so he bootstraps his own C compiler, writing it in C, compiles it on a UNIX system, then using it to compile itself to get the 8080 machine code.
See also DDJ #69 (July 1982) p. 66, "Small C for the 9900" by Matthew Halfant[5], porting Cain's compiler to another platform.
To be clear, I'm not claiming to be the first to have used the name Tiny-C, just that the confusion with TCC is not intentional. Your links to DDJ are great an bring back memories of the good old days when there was much to learn from reading DDJ and Byte magazine!
I hacked in support for ">", ">=", and "<=" to match the "<" support, but I just noticed that ints are truncated, so the maximum value stored in a variable is 127.
This doesn't have types, functions, arrays or much error checking. It has one char identifiers. I don't think we should read into this any more than a tiny example or experiment by the author.
That's not on my TODO! But Gambit does have support for TCC. For example you can use TCC to compile a file to a dynamically loadable object file (aka shared library). The compilation is faster than gcc and the code size is typically smaller too:
$ cat hello.scm
(display "hello!\n")
$ gsc hello.scm
$ gsi hello.o1
hello!
$ ls -l hello.o1 # this is generated by gcc
-rwxrwxr-x 1 feeley feeley 18152 Mar 13 17:16 hello.o1
$ rm hello.o1
$ gsc -cc "tcc -shared" hello.scm
$ gsi hello.o1
hello!
$ ls -l hello.o1 # this is generated by tcc
-rwxrwxr-x 1 feeley feeley 4432 Mar 13 17:17 hello.o1
Sigh. I wish people would teach compilers using Oberon as an example. One can write a small yet complete compiler for (what turns out to be not-so-tiny) a language.
The function runs through the program by incrementing the program counter (*pc++) and dispatching what instruction it sees. It's a stack-based VM so individual instructions are pushed onto and popped from the stack depending on the operation. Is there anything specific you don't grok? Happy to help.