
Tiny C Compiler - Koshkin
https://bellard.org/tcc/
======
fwsgonzo
It's not just a tiny C compiler - it's a compiler you can use as a static
library that can compile your C code directly to memory. And then you can call
into it, if you dare.

I used it as a scripting backend for a long time, but eventually you will
realize that you are not the only one that could write scripts, but you also
really need to have a trust boundary (or just a different address space), so
that errors in the script don't drag the whole thing down.

That's where Lua and LuaJIT shines: It's simple, and it's sandboxed. However,
there is still one thing missing. These days, several programming languages
have several targets that they can compile to, and so imagine if you could
emulate some platform at high performance with low overhead, in a sandbox. You
would then be able to script in whatever language you so desire, provided it
has that target.

Unfortunately, those languages tend to be system languages and not the
absolute best for scripting. With one big exception: Compile-time programming
support.

~~~
haberman
LuaJIT is not sandboxed.

Mike Pall told me this in no uncertain terms: [http://lua-
users.org/lists/lua-l/2011-02/msg01582.html](http://lua-
users.org/lists/lua-l/2011-02/msg01582.html)

> The only reasonably safe way to run untrusted/malicious Lua scripts is to
> sandbox it at the process level. Everything else has far too many loopholes.

~~~
wahern
The issue isn't that Lua can't nominally provide a sandboxed environment--it
can, better than almost any other language. The central issue is whether
LuaJIT, PUC Lua, or any other particular piece of software can be made
sufficiently free of bugs that you can trust such a sandbox to run potentially
malicious code.

The answer in the case of LuaJIT is definitely no, because the JIT engine is
sufficiently complex that exploits are inevitable. Note that this is _also_
the case with JavaScript. Many browser exploits start with some codegen bug in
V8 or SpiderMonkey. And there are many more eyeballs looking to fix bugs in V8
and SpiderMonkey than LuaJIT, so in the case of LuaJIT the prudent answer is
that you should never trust it to run potentially malicious code.

The case for PUC Lua is more nuanced. Lua used to have a bytecode verifier,
but it was removed for 5.2 because too many bugs were found in the verifier,
and because the VM relied heavily on the verifier to filter bad opcode
patterns, that led to sandbox breakouts. This made the PUC developers believe
that the verifier made Lua worse off as compared to a more wholistic emphasis
on correctness and robustness, so they dropped the verifier. They also dropped
any pretense that you could safely sandbox bytecode (i.e. precompiled
scripts); if you want a sandbox in Lua you should only load untrusted code as
plain Lua scripts into the sandboxed environment. To that end Lua 5.2 added a
parameter to all APIs for loading code that specified whether to accept text
scripts or binary bytecode. In other words, the bytecode verifier was
considered a third wheel, so they removed it and redirected attention to the
compiler and the rest of the VM.

So for PUC Lua the issue really comes down to how prudent it is to draw a
trust boundary around a pure Lua sandbox; or rather, what your adversarial
model is, precisely. PUC Lua is committed to Lua's sandboxing features, but
many developers are fairly of the opinion that the only way to run untrusted
code, if you're to run untrusted code at all, is using either a hardware VM or
a very strict seccomp jail. If you're of the latter opinion, the language is
irrelevant--you shouldn't trust Lua, JavaScript, Java, or any other language
environment, period. In practice, however, even people of the latter opinion
generally apply the principle of defense in depth. That's why browser
JavaScript APIs and capabilities are still relatively limited, even though
browsers execute JavaScript in OS-based sandboxes. In most practical contexts
Lua's sandboxing features still provide great value; it just needs to be
understood that they're a complement to rather than substitute for process
sandboxing.

The same analysis applies to WebAssembly, FWIW, especially JIT'ing WASM
environments. Anybody who thinks WASM is a magic cureall for running untrusted
code is mistaken.

~~~
Touche
These types of comments are why I still read HN.

------
laurensr
The author is a genius: a while ago he implemented an x86 virtual machine in
javascript:
[https://bellard.org/jslinux/vm.html?url=buildroot-x86.cfg](https://bellard.org/jslinux/vm.html?url=buildroot-x86.cfg)

~~~
rramadass
>The author is a genius

Understatement of the century! :-) The breadth of his work, all non-trivial,
is truly humbling. All done without fanfare or self-aggrandizement. I can't
even find a good interview of him to understand his mind and thought-process.

~~~
kzrdude
Is there a good magazine, news outlet or blog we could tip off, and try to get
them to make an interview?

~~~
rramadass
The usual suspects: Reddit, Slashdot, ArsTechnica and of course HN.

And the interview should be purely on the technical side i.e his educational
background, how he approaches design and programming, how he learns new
technical domains, his thoughts on languages, advice to young programmers etc.
Given his off the charts productivity, i suspect he has a highly efficient way
of learning and doing things which i would like to copy :-)

------
jonathonf
Want to "skip" the compile step?

    
    
      //usr/bin/tcc -run $0; exit
      main() { printf("Hello\n"); return 0; }
    

Just `chmod +x` and run it...

~~~
mmastrac
How does this work? Is there an alternate form for #! (aka shebang)?

EDIT: hah, I should have realized that "//" is the C comment and
"//usr/bin/tcc" is equivalent to "/usr/bin/tcc". Clever!

~~~
dfabulich
"C comment," you mean.

So it's running the file as a shell script, where the first line runs tcc on
the current file and quits.

~~~
mmastrac
Fixed typo, thanks. Yeah for some reason it didn't click when I saw it.

------
stevefan1999
I used to learn the source code of TCC 4 years ago and the experience
is...pretty awful. Until then, I never know that compilers aren't reentrant
and is supposed to follow the Unix philosophy -- use global state but do your
thing as minimalistic as you can then let people spawn your program as a
process to turn a blind eye to lack of reentrancy. This means you can't embed
TCC to your program without TLS or global lock to each compilation however,
and I think I already lost my reentrancy patch cause the maintainers said they
won't accept it :(

Now I carry on and would like to make a TCC rival. I also wondered if I can
make a TCCPP until Iearnt not only C++ syntax is ambiguous (and so we need
some form of context sensitivity, and GLR is one good technique to handle C++
type of thing) C++ templates are Turing-complete effectively meaning it will
be hellish hard. I abandoned this idea thereafter.

~~~
efferifick
"I also wondered if I can make a TCCPP until Iearnt not only C++ syntax is
ambiguous (and so we need some form of context sensitivity)"

I believe a C parser would also need some form of context sensitivity. "T *
x;" can be a declaration or a statement. However, C++'s syntax is larger than
C's and I believe that there may be more cases where context sensitivity is
needed to resolve ambiguities.

[https://eli.thegreenplace.net/2007/11/24/the-context-
sensiti...](https://eli.thegreenplace.net/2007/11/24/the-context-sensitivity-
of-cs-grammar)

~~~
terminalcommand
Another great piece by Eli on how Clang solves this problem and the nested
functions inside C++ classes is [https://eli.thegreenplace.net/2012/07/05/how-
clang-handles-t...](https://eli.thegreenplace.net/2012/07/05/how-clang-
handles-the-type-variable-name-ambiguity-of-cc).

Clang basicaly moves the semantic processing to another pass, and tokenizes
both types and variables as ‘identifiers’.

I’d agree that C/C++ have some sort context sensitivity. However I think this
is not an ambiguity. You can deduce a single logical solution when you see T
*x; . If the T in scope is a variable you do multiplication, if the T in scope
is a type definition you make a variable declaration.

Real ambiguity happens when there are two logical solutions to the same
statement and the language specification has to prefer one to remove ambiguity
and undefined behaviour.

For example a C# code example from spec that is ambigious and therefore
doesn’t compile:

“ static void F(bool a, bool b) { Console.Writeline($”{a} and {b}”); }

static void Main(string[] args) { int G = 2; int A = 1; int B = 2; F(True,
False); F(G<A, B>(7)); ”

The compiler prefers to interpret the last line as a function call with one
argument, which is a call to a generic method G (that doesn’t exist in the
code above) with two type arguments a one regular argument. Instead of one
function call with two boolean arguments.

~~~
int_19h
It can be complicated even when it's not ambiguous. Consider:

    
    
       template<size_t N = sizeof(void*)> struct a;
    
       template<> struct a<4> {
           enum { b };
       };
    
       template<> struct a<8> {
           template<int> struct b {};
       };
    
       enum { c, d };
    
       int main() {
           a<>::b<c>d;
       }
    

This is not ambiguous, but the meaning of the line inside main() depends on
the compiler and the target - it can be either a declaration:

    
    
       a<>::b<c> d;
    

or an expression:

    
    
       a<>::b < c > d;
    

depending on which b gets selected. This in turn affects future references to
d etc.

~~~
steerablesafe
Note that this is only an issue in non-generic context. For dependent names
you have to disambiguate with "typename" and "template".

But yeah, it looks like a C++ compiler has to instantiate templates in lock-
step with parsing to make this work, but possibly there are some tricks that
are applicable here. In any case, this looks like a hard problem.

------
mmastrac
It's worth reading through all the other bellard projects. He is an amazingly
productive developer.

[https://news.ycombinator.com/from?site=bellard.org](https://news.ycombinator.com/from?site=bellard.org)

~~~
LeoPanthera
Perhaps most famously, the original creator of both ffmpeg and qemu.

------
dang
See also

2016
[https://news.ycombinator.com/item?id=13249851](https://news.ycombinator.com/item?id=13249851)

2017
[https://news.ycombinator.com/item?id=15272894](https://news.ycombinator.com/item?id=15272894)

2018 - obfuscated!
[https://news.ycombinator.com/item?id=17335856](https://news.ycombinator.com/item?id=17335856)

------
pronoyc
The author is incredible:

> On 31 December 2009 he claimed the world record for calculations of pi,
> having calculated it to nearly 2.7 trillion places in 90 days. Slashdot
> wrote: "While the improvement may seem small, it is an outstanding
> achievement because only a single desktop PC, costing less than US$3,000,
> was used—instead of a multi-million dollar supercomputer as in the previous
> records."[10][11] On 2 August 2010 this record was eclipsed by Shigeru Kondo
> who computed 5 trillion digits, although this was done using a server-class
> machine running dual Intel Xeon processors, equipped with 96 GB of RAM.

------
hawski
I still wonder about the feasibility of Rob Landley's idea of QCC - TCC with
TCG backend - the code generator used by qemu which comes originally from TCC
itself. That would make it fast to compile and with quite an extensive
repertoire of architectures covered.

[https://landley.net/qcc/](https://landley.net/qcc/)

------
kriro
On a somewhat related note, I always thought dietlibc was a pretty cool
project (basically trying to get libc to be very tiny). It's been a pretty
long time since I browsed the code but I remember it being quite elegant.

[https://www.fefe.de/dietlibc/](https://www.fefe.de/dietlibc/)

~~~
Conlectus
From my experience musl libc[1] is a more popular project in this space

[1] [https://musl.libc.org/](https://musl.libc.org/)

~~~
acqq
The creator of Zig language wrote a praise to musl:

[https://andrewkelley.me/post/why-donating-to-musl-libc-
proje...](https://andrewkelley.me/post/why-donating-to-musl-libc-project.html)

------
Conlectus
Heads up: Though amazing, the compiler makes use of the deprecated POSIX
setcontext API[1], which means you may have trouble running it on some modern
distributions.

[1]
[https://en.wikipedia.org/wiki/Setcontext](https://en.wikipedia.org/wiki/Setcontext)

~~~
aidenn0
Do modern distros really not have setcontext? If they don't it shouldn't be
too hard to implement in assembly.

~~~
fwsgonzo
It's a very tiny API, and requires little assembly, so I agree.

------
stephc_int13
Anyone being impressed by the compile time of the modern languages such as
Switft, Rust or Go should try TCC at least once in their lifetime.

Both compilation and link are generally finished before your fingers have
relased the enter key...

~~~
anta40
You should not mention "Rust" and "fast compile time" in the same sentence,
unfortunately.

[https://prev.rust-lang.org/en-US/faq.html#why-is-rustc-
slow](https://prev.rust-lang.org/en-US/faq.html#why-is-rustc-slow)

~~~
stephc_int13
That was sarcasm on my side ^^

------
Teknoman117
My main experience with tcc was that it was the only C compiler available for
Damn Small Linux (a <50 MB Linux distro with X and Firefox).

~~~
ExtremisAndy
I loved DSL! I can’t remember the specs, but I had an ancient machine lying
around in about 2004, that would have been otherwise worthless, and put DSL on
it just to check it out. I remember being so impressed with how productive I
could be and how zippy that old machine could be with such a lightweight OS.
Fond memories :)

~~~
hexagone
How does DSL compare with Alpine?

~~~
Teknoman117
I wouldn't use DSL these days, it's very old :)

DSL's goal was to provide most of the tools you might need in a tiny
footprint. The platform target was the old "thin-client" style machines which
might pack a 500 MHz AMD Geode or a VIA C3 processo and have either no
internal storage or maybe a few hundred megabytes for preboot environments.

They were able to fit X (Xfree86, iirc) w/ Fluxbox, Firefox, Ted (a word
processor), and a few other things into a 50 MB "bizcard" image (PCMCIA cards
& compact flash).

Another thing it was designed to do was allow you to install most of the
common things to a small internal storage device, while running the rest off
of optical media.

It dates to the time before common support for USB booting. Also kinda crazy
to realize you could fit Firefox, a windowing environment, and a kernel into
50 MiB while today the standard vim install is 33 MiB...

~~~
Teknoman117
*it used Xvesa, not Xfree86.

------
tankfeeder
Its not dead, i use from git for years.

[https://repo.or.cz/w/tinycc.git](https://repo.or.cz/w/tinycc.git)

~~~
slezyr
wtf are those tags?

~~~
nadavami
Looks like anyone can add them right next to the tag cloud...

~~~
jjice
Classic mistake to allow anyone to add content to a public site anonymously

~~~
hvdijk
Yeah, I raised this on the mailing list almost a year ago. The people
maintaining TinyCC know about it but don't care enough to take action.

------
buserror
How funny, I use tcc /all the time/ and I was just tinkering with one 'script'
[0] the other day. Basically more often than not for me it's easier/quicker to
write stuff in C than fighting with another script language that will break
next time I want to use that 'script'. C works.

 _of course_ I could compile it, but really, it doesn't have to, and to
develop/try that sort of small C tools, tcc is just unbeatable.

[0]:
[https://gist.github.com/buserror/227a7e64c92acece821ec8ee587...](https://gist.github.com/buserror/227a7e64c92acece821ec8ee58776276)

------
e19293001
For someone who wanted to read more about Fabrice Bellard.

[https://web.archive.org/web/20110726063943/http://www.freear...](https://web.archive.org/web/20110726063943/http://www.freearchive.org/o/55dfc9935a719fc36ab1d16567972732c2db1fd7d7e3826fd73ee07e4c3c7d0b)

------
jason_slack
More info about Fabrice: [https://smartbear.com/blog/test-and-monitor/fabrice-
bellard-...](https://smartbear.com/blog/test-and-monitor/fabrice-bellard-
portrait-of-a-super-productive-pro/)

------
mywittyname
> Measures were done on a 2.4 GHz Pentium 4. Real time is measured.
> Compilation time includes compilation, assembly and linking.

How old is this project?

~~~
eeereerews
> (Nov 18, 2001) TCC version 0.9 is out. First public version.

------
FpUser
been already discussed numerous times. But I never miss a chance to sing
praises to Bellard. One of the very few real programming geniuses.

------
foobar_
It offers bounds checking! The documentation links to

[http://www.doc.ic.ac.uk/~phjk/BoundsChecking.html](http://www.doc.ic.ac.uk/~phjk/BoundsChecking.html)

Someone needs to add this to llvm.

~~~
shakna
Whilst TCC's bounds check was rare for quite some time, the bigger compilers
do now actually offer it.

Clang supports bounds checking as part of -fsanitize=address, (though with a
few more flags you can _just_ have bounds checking instead of the other
sanitisation options). (Since around 2015?)

GCC supports bounds checking and others, depending on which frontend you're
using the options can change. (Like -fbounds-checking for C, and -fcheck=all
for gfortran). (Since around 2013? GCC 4.71)

Even Intel has -check=bounds. (Though not under macOS). (Since around 2015?)

~~~
foobar_
Wow did not know that.

Is there a distribution that offers bounds checking for all linux software ? I
wonder how slow a typical LAMP stack will be. My guess is no more than 5x. I
think thats an acceptable tradeoff. I'm guessing there's a way to add global
compiler flags in source distributions like Arch linux / BSD.

~~~
shakna
Gentoo has a few ways to set up global C flags [0], because the default is to
compile all your software yourself.

(Arch isn't actually a source distro, it's binary, though rolling release.
Gentoo is, however).

[0]
[https://wiki.gentoo.org/wiki/USE_flag](https://wiki.gentoo.org/wiki/USE_flag)

------
throw_m239339
I tried to use that for toy C programs on Windows 10, but for some reasons I
cannot run these programs in the invite CMD afterwards. It works only if i use
tcc run or something like that.

------
_bxg1
I wanted to comb through the code without cloning, but can't tell if that's
possible with the git web UI they're using. Anybody know if it is?

~~~
smlckz
They're using gitweb. You can read the code for any commit, just click the
'tree' link associated with it. Alternatively, just click the 'tree' link at
the top for the tree of the latest commit in their default 'mob' branch.

------
winrid
As a side note, I'm surprised how big Links is!

------
fanf2
tcc is part of the guix bootstrap project
[https://guix.gnu.org/blog/2019/guix-reduces-bootstrap-
seed-b...](https://guix.gnu.org/blog/2019/guix-reduces-bootstrap-seed-by-50/)
which aims to be able to rebuild everything from source code plus the smallest
possible binary seed

~~~
mhd
So that's currently scheme-blob -> cc-in-scheme -> tcc -> ancient-gcc ->
proper-gcc? Wow, that's something that really deserves the term
_bootstrapping_.

------
sandbags
It looks like there is a real effort now to get TCC working on macOS which
would be amazing.

------
ngcc_hk
Can’t use it in iOS but I guess may be the mac arm ...

------
algorithm314
For some time it has a risc64 backend.

------
person_of_color
What is the author upto now?

~~~
roywiggins
[https://bellard.org/quickjs/](https://bellard.org/quickjs/), for one

------
agumonkey
someone to revive tccboot ?

