
SmallerC – Small, Simple, Self-Compiling, Single Pass C Compiler - peter_d_sherman
https://github.com/alexfru/SmallerC
======
hbbio
Fabrice Bellard (who also had some spare time to create ffmpeg, QEMU, and a
few others like a software implementation of a 4G-LTE base station) wrote tcc,
the tiny C compiler: [https://bellard.org/tcc/](https://bellard.org/tcc/)

Small, fast and of course self-compiling! The project is the better compiler
out of a first project named OTCC which won the IOCCC (yes, the obfuscated C
programming contest - less than 2k bytes).
[https://bellard.org/otcc/](https://bellard.org/otcc/)

~~~
masklinn
And from it grew tccboot
([https://bellard.org/tcc/tccboot.html](https://bellard.org/tcc/tccboot.html)),
a boot loader which compiles the Linux kernel on the fly from source then runs
it, the entire process taking only a few seconds ("15 seconds on a 2.4 GHz
Pentium 4.")

~~~
tbodt
About 3 seconds on my i5 from 2015.

------
cpeterso
What are some current best practices for writing (C) compilers that were not
common when, say, gcc 1.0 was written? Would someone writing a basic C
compiler today approach it differently than Stallman did with gcc 1.0?

~~~
delhanty
Supplementary to that and also a bit more specific I'd be grateful for an
answer to the following:

Is it considered best practice today to use Flex & Bison for the front-end of
(for example) a basic C compiler?

~~~
WalterBright
Lexing and parsing are the simplest, by far, parts of a compiler. (That's why
it is possible for Flex & Bison to exist.)

99% of the time spent writing a compiler will be elsewhere.

If you want a correct to the last drop C preprocessor, prepare to spend 6
months on that, minimum. Or just use Warp:

[https://github.com/facebookarchive/warp](https://github.com/facebookarchive/warp)

and then you can have fun just working on the compiler bit.

~~~
delhanty
Thank you - that's informative.

>99% of the time spent writing a compiler will be elsewhere.

Out of interest, has the distribution of labor in that remaining 99% changed
at all since you first started out?

For example, do things like LLVM change anything?

~~~
WalterBright
I re-used the Digital Mars C++ optimizer and back end for the dmd D compiler,
so that would be like using LLVM for an optimizer and back end.

I still wind up spending nearly all my time on the D front end, and not the
lexer/parser. The time spent on the lexer/parser is a rounding error.

Besides, build the lexer/parser by hand. The mystery of how to do them will
evaporate, and you'll find you can apply that knowledge to all sorts of other
tasks.

------
bootcat
Guys, i just started reading through source code of 8cc and now this new
project :). [https://github.com/rui314/8cc](https://github.com/rui314/8cc)

------
throwaway2016a
Nice. This is a really ambitious project for just 4 contributors (and one that
looks like they did most of the work). I'm curious why ASM and not LLVM?

Edit (comment above unaltered): I got down voted by someone which tells me I
missed an obvious answer to my question. Did I miss something?

I've noticed a lot of compiler projects lately tend to default to LLVM, I'm
curious of the strengths and weaknesses of LLVM vs ASM since the author here
chose to compile to ASM.

~~~
bitwalker
My impression (as someone working on a compiler of my own, using LLVM), is
that while you get a _lot_ out of building on LLVM (portability across
platforms, a collection of "free" optimizations, and a well defined low-level
IR), it comes at the cost of a huge toolchain (takes about 5-10 minutes to
compile from scratch on my laptop), and it is not stable across releases (the
extent to which you are affected depends on the features your compiler depends
on though). I think for toy projects it is probably more interesting to write
the code gen yourself; and for projects focused on a specific platform, it is
perhaps desirable to handle it yourself. Some people probably just don't want
the dependency. My perspective is that I don't want to reinvent the work on
cross platform support, or the optimization passes, and would rather take
advantage of those efforts by depending on LLVM. I don't know that this makes
sense for this project though.

------
linopolus
> Currently it generates 16-bit and 32-bit 80386+ assembly code for NASM that
> can then be assembled and linked into DOS, Windows and Linux programs.

I wonder why not 64Bit? Hardly anyone uses 32bit processors anymore, and even
in the Windows world 64Bit systems are slowly taking over, so it would seem
more logical to me to compile for modern processors, instead of 16bit
architecture.

~~~
colejohnson66
The x86_64 instruction set is a _lot_ more complicated than the x86 one.

~~~
mikebenfield
What do you have in mind here? What x86 code generated by a C compiler is not
readily translatable to x86-64 code?

~~~
kevin_thibedeau
The additional registers would be useful. Even if sticking to the x32 ABI.

------
jokoon
It seems there are C interpreters, I wonder how convenient it would be to use
a C interpreter versus using LUA for example.

Although I've never tried using an interpreter in a program, bridging calls
from a script inside your compiled program seems difficult.

------
derblitzmann
Hold on... I thought due to the C preprocessor, you couldn't have just a
single pass. Unless they aren't counting it as a pass?

~~~
boardwaalk
I don't see any reason the preprocessor couldn't be integrated with the main
parsing pass. It's all top-down (things defined before they're used) just like
C.

------
ryanpcmcquen
How does it compare to pcc and tcc?

~~~
dane-pgp
From a bootstrapping or Diverse Double-Compiling point of view, the real
question is "Can compiler Y compile compiler X?"

As of a few months ago, it is possible to compile (an old, non-C++ version of)
gcc with tcc, which is a good position to be in:

[https://lists.gnu.org/archive/html/tinycc-
devel/2017-05/msg0...](https://lists.gnu.org/archive/html/tinycc-
devel/2017-05/msg00103.html)

~~~
earenndil
Can you then compile a new version of gcc with that old gcc? Or are there more
steps in between?

~~~
dane-pgp
According to:

[http://gcc.gnu.org/install/prerequisites.html](http://gcc.gnu.org/install/prerequisites.html)

"versions of GCC prior to 4.8 also allow bootstrapping with a ISO C89 compiler
and versions of GCC prior to 3.4 also allow bootstrapping with a traditional
(K&R) C compiler."

So you might need to build tcc -> gcc 3.3 -> gcc 4.7 -> gcc 7.2

~~~
ptspts
Can't we skip gcc 3.3?

~~~
earenndil
You mean tcc -> gcc 4.7 -> gcc 7.2?

I don't think that would work because gcc 4.7 is written in c++.

------
feelin_googley
This was posted before, right?

I like that they allow FASM.

And that there is no "standard library".

~~~
megous
There is standard library.

~~~
feelin_googley
Yes, my mistake. I think this has been posted to HN before. I seem to remember
trying it out.

