Hacker News new | past | comments | ask | show | jobs | submit login
Tiny-C Compiler (2001) (umontreal.ca)
229 points by swatson741 on March 13, 2023 | hide | past | favorite | 65 comments



Author here. Just for context tinyc.c was created in 2000 (I found the file in my archives and the last modification date is January 12, 2001). I was not aware at the time of Fabrice Bellard's work which after all won the IOCCC in 2001, so the confusion with TCC was not intentional. My tinyc.c was meant to teach the basics of compilers in a relatively accessible way, from parsing to AST to code generation to bytecode interpreter. And yes it is the subset of C that is tiny, not a tiny compiler for the full C language.


I wish I had time to make a list what would be required to bootstrap this.

Either by adding complexity (more features to the compiler) or dropping complexity (fewer C features in the implementation).

Did you ever look at that?

Edit: functions, enum, struct, arrays and maybe make all variables/functions a-z?

Edit2: https://joyofsource.com/projects/bootstrappable-tcc.html

Edit3: https://news.ycombinator.com/item?id=35135384


Haven't looked at what it would take to support C89 (or other) fully. Certainly it is possible to extend the compiler in small increments to implement more stuff: all the operators, local variables, arrays, pointers, functions, etc. The hardest part I would say is implementing types (starting with the C syntax for types!) and also handling the memory allocation for them. Nothing too complex when you know what you are doing. My personal interest is with higher-level languages and I have worked on various Scheme systems over the last 30 years. If you like small language implementations then take a look at Ribbit which is feature-full with a tiny VM. Also check out sectorlisp (not my work) which is a fascinating tour de force.


I want to thank you for this pedagogical tool, I have used a couple times to learn a new language by porting this exercise.

How did you use this in your teaching? It seems like it could the basis for a longer term project that students could take in many directions.


tinyc.c was designed to illustrate three things at once in a second year course on concepts of programming languages. The course had 4 parts: imperative programming (using C), functional programming (using Scheme), and logic programming (using Prolog), and programming language implementation. For the imperative programming part it was important to show enough of the C language for the following operating systems course, so we needed to show C manual memory management and pointers in a rather detailed way. So tinyc.c was principally an example of programming with pointers, including pointer arithmetic such as *pc++. It was indirectly an example of compiler and interpreter, subjects we also covered in the course. I have also used parts of the compiler in a third year compiler course but not as the basis of a project. I have always asked (forced?) my students to use Scheme to implement their compiler projects... a much friendlier language than C for compiler writing.


Would you think rewriting this to generate a minimal set of instructions could be benefitial to the compiler bootstrapping?


I'm not sure what you mean by "minimal set of instructions". What is more important for bootstrapping is to restrict the programming style used to write the compiler. So if that was the goal I would implement pointers and maybe arrays, and not implement other types such as structs (which can be useful but add complexity to the compiler, i.e. it is no longer the case that a value fits in a machine word). Functions and function call would obviously be useful to add. But that's about it.

What I'd love to do someday is to write a compiler for a fairly complete C subset in POSIX shell. The goal would be to use only a POSIX shell and this compiler to compile TCC, and then use TCC to bootstrap gcc, all from source. This would be a great tool for reproducible builds from source. If someone here finds this interesting and would like to help out, please reach out to me.


Not to be confused with https://bellard.org/tcc/, which is a tiny compiler for the C language.


I use tcc for all of my small C "scripts" for doing ioctls, etc. Less bloat, suckless. I imagine most software would be better off using tcc than gcc/clang. Performance isn't that important in most cases.


> Performance isn't that important in most cases.

Optimizing for storage space is…better?


Since they say "scripts", note that tcc supports being invoked in the shebang line. E.g.

    #!/usr/bin/tcc -run
You can do that with gcc/clang too (e.g. #if 0, #endif to wrap a block of shell script to compile the current file and execute the result) but a primary value of tcc is that it compiles fast.

On a more philosophical note, the suckless approach is to optimise for simplicity not storage. It's perfectly valid to disagree with that of course, but if simplicitly of the system as a whole is a consideration gcc and clang doesn't really fit.


You can only sort of do that with gcc/clang. The #if 0 trick relies on funny behavior that is in a few common shells. When you try to execve(2) a script without a proper #! shebang, the kernel will return ENOEXEC. Bash will check for ENOEXEC then check a few heuristics to see if it looks like a text file, and if it does, then it will try to run it as a shell script.

This means that your script will work when run from a shell, but won't work when exec()ed from a non-shell program, which is a weird foot-gun.


That makes sense. When I checked the tcc command line I was mildly surprised to see you could do the #if 0 hack at all, so not surprised to hear it has severe limitations.


Thanks for sharing! I've yet to go through my C phase, but see it on the horizon, and will remember this and the shebang trick.


This is a recommended practice for scripting with Nim if you want a batteries-included language.


I feel like a lot of software written in C is written in C for performance reasons. Obviously that’s not always the case and TCC is useful but I wouldn’t say that that most software should use it


C is usually the path of least resistance on *nix, if you're trying to interact with the OS. The docs give you C prototypes and C examples, and the source is mostly C, several standard directories are filled with C code and libraries compiled from C.


I think you are confusing the work of Frabrice Bellard with this very one. The former is a C-language compiler. This once is a compiler for a language called "Tiny C". Understandable confusion, though.


I'm quite confused, not the same project at all? To me tiny c compiler always meant the bellard page. Super useful stuff for micro hacky projects.


I understand the confusion: it is more about "syntax associativity"

(tiny C) compiler --> "This is a compiler for the Tiny-C language"

vs

Tiny (C compiler) --> "TinyCC [...] is a small but hyper fast C compiler"

That's it! ;-)


Now obviously the next step is to make a tiny tiny c compiler compiler.


One could say that the one from this submission is Tiny-C Compiler and Bellard's is Tiny C-Compiler.


This is a compiler for a language called Tiny-C.


It is sad that tcc is unmaintained as it would be really useful in small embedded systems. I just tried it on Debian and compilation fails without #undefining CONFIG_TCC_MALLOC_HOOKS in lib/bcheck.c. After compilation it passes tests, but they warn that it could be unreliable.


While Fabrice Bellard is no longer working on TCC [0] and an official release tarball hasn't been packaged since version 0.9.27 (5 years ago) the project is by no means unmaintained.

For details, check their current working repository [1] and mailing list [2].

[0]: https://bellard.org/tcc/

[1]: https://repo.or.cz/tinycc.git

[2]: https://lists.nongnu.org/archive/html/tinycc-devel/


Try chibicc. It's x86_64 native and so much more readable as a codebase than TCC.


This is an interpreter for a super restricted subset of C and it looks well written from a pedagogical standpoint (keeps thing pretty simple, fairly easy to read). But it's slightly awk to strip-down a language (what features do you keep, what do you lose?). I think it's more fun to build an interpreter for an actual tiny language. In my next book I have interpreters for Brainfuck [0], an obfuscated kind of joke of a language, and Tiny BASIC[1] a real tiny language that was used on early personal computers. These are pretty common first projects for folks interested in doing an interpreter.

Here's why real languages are better than stripped down languages: Anyone with programming knowledge can implement a Brainfuck interpreter in a few hours and run any Brainfuck program. Anyone with a tiny bit of CS knowledge can implement a Tiny BASIC interpreter in just a day and then you can run any real Tiny BASIC program from the late 70s. It's cool to run real programs people actually used. With this stripped down C, there are no pre-made real programs...

0:https://en.wikipedia.org/wiki/Brainfuck 1:https://en.wikipedia.org/wiki/Tiny_BASIC


Another language that's more modern and currently useful but which is very tiny to write an interpreter for is Lua [0][1]. Currently the official Lua interpreter has around 30k LOC which I find pretty amusing for a language used so widely in games and for scripting purposes [2]. Of course it's still at least an order of magnitude larger than a small Tiny BASIC interpreter but the fact it's a current language used in so many places makes it even more interesting to make your for-fun implementation.

Also related to small language implementations I find notable PicoC [3] which is a C interpreter written in around 3k LOC of C. Past discussion about it here 13 years ago [4].

[0]: https://www.lua.org/about.html

[1]: https://www.lua.org/spe.html

[2]: https://en.wikipedia.org/wiki/Lua_(programming_language)

[3]: https://gitlab.com/zsaleeba/picoc

[4]: https://news.ycombinator.com/item?id=1658890


While I appreciate your point.

1. You use the example of a tiny basic of a 'real language' and I don't see how tiny basic is a 'real language', but tiny C is a stripped down language.

2. You can build on this to make a full c implementation. A minimal c implementation that can potentially bootstrap a full c environment is more useful than a brainfuck interpreter.


1. I think I explained why. You can Google real programs people used to do their day to day computing tasks and games and run them unmodified. You can’t do that with this dialect of C. It’s cool to run real programs.

2. Good point, but it’s several orders of magnitude more work to go from this to C, so it depends if you’re doing a one off project or have much bigger ambitions. I think you get more bang for the buck out of Tiny BASIC if it’s a one-off project is my point.


FORTH is another language that's quick and easy to write from scratch, where you need a couple of dozen words written in assembler and then the rest of FORTH can be written in FORTH.


It's unfortunately not self-compiling, but has a structure which is very reminiscent of C4 --- another tiny C-subset compiler + stack-based VM which is self-compiling:

https://news.ycombinator.com/item?id=8558822

The 26 predefined integer variables make this look like a variant of minimal BASIC, except with structured control flow instead of only GOTO.


Aha, this is interesting!


It’s worth noting that this is a compiler for the Tiny-C language, and not as one might think a tiny compiler for the C language.


It's probably better to call it an interpreter, since it will also run the program and print the values of all non-zero variables afterward.

Calling it a compiler is (to me) really stretching things, I can't see any code to emit any other form of the code, it's all aimed at evaluating (executing) it.

Edit: oops, I didn't read the code closely enough, it does emit code but only internally, that code is what gets executed. Thanks for the corrections!


It is a compiler rather than a direct evaluator, since it generates bytecode for a stack VM --- and also includes the interpreter for that (look at the bottom).


That’s more or less every interpreter. CPython compiles to bytecode before interpreting that, yet nobody would call it a compiler.


I think this is a question of interface vs. implementation.

Python, JavaScript, and other languages which are traditionally considered interpreted but may do (JIT) compilation in their implementation are used as if they were interpreters: to the user, there's no separate compilation step. You run python somefile.py or node somefile.js (or refresh a browser holding a page), and editing the source code causes the next invocation to take those changes immediately. Contrast this with C/C++ and Java where there is definitely an explicit compilation step in nearly all implementations.

The program in this article thus is an implementation of a compiler, but has the interface of an interpreter.


In contrast, Java also did that and I doubt if most people think of Java as interpreted. So, using a byte-code interpreter may not be the criteria most people are using to decide on this. Truthfully, I think it is all a bit arbitrary.


That is definitely a compiler and anyone with a CS degree would call it that if they were discussing its functionality, because that's technically what it is. (Referring specifically to the part which compiles Python to bytecode)

Your SQL database also has a compiler. SQL is compiled to an execution plan. Compile doesn't only mean "create a machine code executable file".


> That is definitely a compiler and anyone with a CS degree would call it that if they were discussing its functionality because that's technically what it is.

None of these assertions is correct.

> (Referring specifically to the part which compiles Python to bytecode)

So referring specifically to something different than what I explicitly specified, it's called something else.

By that reasoning, a cow is a muscle and you are an acid.

> Your SQL database also has a compiler.

"Has a" and "is a" are rather different relationships.

> Compile doesn't only mean "create a machine code executable file".

You're the only person who made that assertion.


> None of these assertions is correct.

You should fix the Wikipedia article:

https://en.wikipedia.org/wiki/CPython

"CPython can be defined as both an interpreter and a compiler as it compiles Python code into bytecode before interpreting it."


It compiles to a sort of byte code that is executed by a stack based virtual machine.


Yes, a better title would be:

Compiler for the Tiny-C Language (2001)

In fact, that is exactly how the source code describes itself in the comments.


I did my CS degree at umontreal and this was an assignment in a second year class. This was a pretty interesting introduction to compilers, and even if this is a toy subset of C, this was challenging, at least for me. We would get 0 if there were any memory leak, so we were pretty paranoid about it.

The second assignment was writing a Scheme interpreter.


That's kind of surprising they cared so much about memory use, a lot of one-shot C programs such as compilers don't bother freeing memory and let the OS clean up after them once they exit.


I was about to comment and say the same thing, but as a graded learning exercise there is certainly value in that approach.


Recently I'm working on toy C compiler and x86 Assembler in TypeScript[1] and I can confirm that the amount of work that have to be done to compile and print simple Hello World is astronomically huge (as the satisfaction)

[1] https://github.com/Mati365/ts-c-compiler


This isn't a C compiler though. It's a compiler for a language called Tiny-C.


Um, excuse me, but there existed a Tiny-C in 1979. Whatever you are talking about creating in 2000 is in no way an original idea.

References:

Dr. Dobb's Journal #32 (Feb 1979) page 41, review of Tiny-C User Manual by Ted Shapin [0]

Dr. Dobb's Journal #35 (May 1979) page 37, "Tiny-C Interpreter on C-Dos" by Ray Duncan[1]

Tiny-C Associates incorporated in Holmdel, NJ, March 1978 [2]

"Tiny C" trademark application filed 1979, cancelled 1987 [3]

There was also a "Small C", see DDJ #45 (May 1980), "A Small C Compiler for the 8080s" by Ron Cain[4]. Cain references buying a copy of "the Tiny-C interpreter from Tiny-C Associates" and finding it too slow, so he bootstraps his own C compiler, writing it in C, compiles it on a UNIX system, then using it to compile itself to get the 8080 machine code.

See also DDJ #69 (July 1982) p. 66, "Small C for the 9900" by Matthew Halfant[5], porting Cain's compiler to another platform.

[0] https://archive.org/details/dr_dobbs_journal_vol_04_201803/p...

[1] https://archive.org/details/dr_dobbs_journal_vol_04_201803/p...

[2] https://www.bizapedia.com/nj/tiny-c-associates.html

[3] https://alter.com/trademarks/tiny-c-73219160

[4]https://archive.org/details/dr_dobbs_journal_vol_05_201803/p...

[5] https://archive.org/details/dr_dobbs_journal_vol_07_201803/p...


To be clear, I'm not claiming to be the first to have used the name Tiny-C, just that the confusion with TCC is not intentional. Your links to DDJ are great an bring back memories of the good old days when there was much to learn from reading DDJ and Byte magazine!


That's a cute project, thanks for sharing.

I hacked in support for ">", ">=", and "<=" to match the "<" support, but I just noticed that ints are truncated, so the maximum value stored in a variable is 127.


This doesn't have types, functions, arrays or much error checking. It has one char identifiers. I don't think we should read into this any more than a tiny example or experiment by the author.


So, it's the C equivalent of Tiny BASIC?

So, a Tiny C?


Not even that as it doesn't have function calls or even print!

See Feeley's response for the proper context.


Oh, Marc Feeley. Wonder if we'll see a Tiny-C target for Gambit?


That's not on my TODO! But Gambit does have support for TCC. For example you can use TCC to compile a file to a dynamically loadable object file (aka shared library). The compilation is faster than gcc and the code size is typically smaller too:

  $ cat hello.scm
  (display "hello!\n")
  $ gsc hello.scm
  $ gsi hello.o1
  hello!
  $ ls -l hello.o1   # this is generated by gcc
  -rwxrwxr-x 1 feeley feeley 18152 Mar 13 17:16 hello.o1
  $ rm hello.o1
  $ gsc -cc "tcc -shared" hello.scm
  $ gsi hello.o1
  hello!
  $ ls -l hello.o1   # this is generated by tcc
  -rwxrwxr-x 1 feeley feeley 4432 Mar 13 17:17 hello.o1


Sigh. I wish people would teach compilers using Oberon as an example. One can write a small yet complete compiler for (what turns out to be not-so-tiny) a language.


Best to pick languages anybody has heard of.


I liked it. I like those kind of short useful examples but it's an interpreter. That makes a huge difference. So, the title is misleading.


not sure i understand how enums work here. but interesting.


first assignment would be to add the multiply and divide operators...

I admit I have trouble understanding how the VM run() function works... anybody can give some insight?


The function runs through the program by incrementing the program counter (*pc++) and dispatching what instruction it sees. It's a stack-based VM so individual instructions are pushed onto and popped from the stack depending on the operation. Is there anything specific you don't grok? Happy to help.


How long would it take to write a toy C compiler?


neat




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: