Hacker News new | past | comments | ask | show | jobs | submit login
SectorC: A C Compiler in 512 bytes (xorvoid.com)
465 points by xorvoid on May 24, 2023 | hide | past | favorite | 80 comments



I saw the repeating 'A' at the end of the base64 text and thought "it's not even 512 bytes; it's smaller!"

That said, the title is just a little clickbaity --- it's a C-subset compiler, and more accurately a JIT interpreter. There also appears to be no attempt at operator precedence. Nonetheless, it's still an impressive technical achievement and shows the value of questioning common assumptions.

Finally, I feel tempted to offer a small size optimisation:

    sub ax,2
is 3 bytes whereas

    dec ax
    dec ax
is 2 bytes.

You may be able to use single-byte xchg's with ax instead of movs, and the other thing which helps code density a lot in 16-bit code is to take advantage of the addressing modes and LEA to do 3-operand add immediates where possible.


Good tip! Yeah, there’s ~20 bytes unused at the end. I kept finding ways to squeeze out a few more and had to tell myself to stop and just publish it already. You could take this further if you really wanted. But it’s already sufficiently absurd.


C has a bit of a history with interpreters and reduced implementations (not to devalue SectorC which is absolutely cool). I'm thinking Small C Interpreter for CP/M: http://www.cpm.z80.de/small_c.html an interpreted version of https://en.wikipedia.org/wiki/Small-C


The book that had the Cain/Hendrix "Small C" compiler, runtime library, assembler and tools was a fantastic resource that taught me C and compiler construction at the same time.

In general, reading (lots of) source code is a good way to learn how to do things in a new language, i.e. to move from the lexical and syntactical level to the level of idioms for problem-solving. On reflection, I find it strange that in programming teaching, larger pieces of existing well-written source code are never discussed/explained/critiqued.


Small C was the inspiration for SubC, a project of mine. It is pretty much a cleaned-up and slightly extended rewrite of Small C that compiles with few warnings on modern compilers. Also comes with a book.

http://t3x.org/subc/index.html (source code)

http://t3x.org/reload/index.html (book)


What does Just In Time mean for an interpreter?


Compiling to machine instructions and then executing the compiled output, instead of executing the AST directly.


That's a just-in-time compiler.


An interpreter with a JIT compiler is able to do more optimizations because it has the runtime context to make decisions, while a AOT (ahead of time) compiler will not know anything about what happens at runtime.

This is why some JIT'd languages (like Javascript) can be sometimes faster than C.


The correct term for this is "JIT compiler."

(JIT interpreter is not a phrase I'm familiar with.)

You can also have an optimizing JIT compiler. Or further, a profiling JIT compiler.


Can you give some simple example for the folks in the back of how JIT'd languages can be faster than C? I think most people are under the impression that statically compiled languages are "always faster."


> Can you give some simple example for the folks in the back of how JIT'd languages can be faster than C?

If the JIT has instrumentation that analyzes execution traces, then it can notice that a call through a function pointer always goes to the same function. It can then recompile that code to use a static function call instead, which is considerably faster.

Basically, it can perform a similar set of optimizations to a static compiler + profiling information in basic cases. In more advanced scenarios, it specializes the program based on the runtime input, which profiling can't do for all possible inputs, eg. say the above function pointer call only happens for input B but not for input C.


In theory, some execution sequences are not knowable except at runtime which could be optimized after the code has already been running for a while.

In practice, static AOT compilation is essentially always faster for a couple reasons. The various types of overhead associated with supporting dynamic re-compilation usually aren't offset by the gains. Re-compiling code at runtime is expensive, so it is virtually always done at a lower optimization level than AOT compilation to minimize side-effects. CPU silicon is also quite good at efficiently detecting and optimizing execution of many of these cases in static AOT code. You can also do static optimization based on profiling runtime execution, which is almost (but not quite) the same thing with more steps.


Honestly it always depends on what "faster" means for you. For one crowd faster means "fast number crunching" (e.g. anything AI these days). There statically compiled code reigns supreme because it is mostly about how fast your very specialized code (e.g. matrix multiplications) runs and it does not hurt if you just ship a specialized, statically compiled version for all possible targets. (iirc GCC does something like that when building generic code that will utilize different code sets (SSE,AVX,etc) when they are available at runtime.

For another crowd "fast" means that the code they haphazardly thrown together in an interpreted language runs fast enough that nobody is negatively affected (which is a completely valid usecase, not judging here).

And to answer your question for examples:

An interpreter with JIT compiler might for example notice that you have a for loop that always gets run with the same number of arguments, unroll the loop and at the same time vectorize the instructions for an immediate 4x gain in execution speed.

Otoh Javas Hotspot JIT compiler tracked how often code was called and once a "hotspot" was identified compiled that part of the program.

Last example: if you are using an interpreted language (say Python) every roundtrip through "python-land" costs you ... A simple for loop that just runs a simple instruction (say: acc = 0; for x in xs: acc += x) will be orders of magnitudes slower that calling a dedicated function (numly.sum(xs)), JITing that code (e.g. with numba) will remove the roundtrip through python and achieve similar speeds.


This is all in theory. Everyone says this like it's already here but it's really not (in the sense that these fast jits are still mostly worse than well-written C).

But what it is is mostly choosing optimizations based on runtime characteristics. It's a dynamic analogue to profile-guided optimization. Like you might have an optimization that trades code size for CPU cycles, which you could choose not to do at runtime if you're low on memory bandwidth instead of CPU time. Stuff like that.


https://www.youtube.com/watch?v=WLwTlC1R2sY -- How slow is Javascript really? Javascript vs C++ (Data Structures & Optimization)


The sufficiently smart JS compiler might be faster.



I think in this case that it executes the code as it's being parsed, in a single pass.


This reminded me the idea of compilers bootstrapping (https://news.ycombinator.com/item?id=35714194). That is, now you can code in SectorC some slightly more advanced version of C capable of compiling TCC (https://bellard.org/tcc/), and then with TCC you can go forward to GCC and so on.


Did you read about the guix full source bootstrap the other day? They've shrunk the bootstrap seed down to a 357-byte program:

https://guix.gnu.org/blog/2023/the-full-source-bootstrap-bui...


The bootstrap seed, https://github.com/oriansj/bootstrap-seeds/blob/master/POSIX..., is a tiny interpreter that takes a much larger program written in a special-purpose, bytecode-based language. This proceeds in turn once or twice more--special purpose program generating another interpreter for another special-purpose language--until you end up with a minimal Scheme interpreter, which then can be used to execute a C compiler program.

All of this is incredible work, but a minimal C-subset compiler in under 512 bytes of x86 assembly seems like a unique achievement as it includes non-trivial parsing and linking phases not required for that hex0 interpreter.


From that article:

> Even more recently (2018), the GNU C Library glibc-2.28 adds Python as a build requirement

I’m surprised they went with Python and not GNU software (e.g. Guile).

Edit: clicking through the link it sounds like this might be intended to replace other accumulated dependencies (Perl?) and stop supporting old versions of Python.


Is Nix incorporating this work as well? I fully support and want Guix to survive, but it seems well behind in mind share, and I would love if Nix could become similarly repeatable and reproducible.


I’ve considered that. I’ve long been interested in bootstrapping. Lots of unpublished projects in the archive that I should also write up.

I considered writing a self-hosting compiler. It should be doable without too much work. I was going to do one in BarelyC originally.

SectorC -> TCC -> GCC sounds fun. C all the way down, haha


Perhaps my favorite thing in all of computing is the authors of Lisp bootstrapping by writing an (academic) interpreter for Lisp in Lisp, and then realizing they could just compile that by hand.


Paper?

I once read a lisp paper that kept defining lisp primitives in terms of simpler lisp primitives until they had only 4-10 primitives. Then the paper started defining those in terms of assembly language (but with parens). In the final step, they just removed the parens. It was so elegant. I can't find the paper though :( Any ideas?


The paper is the original paper on LISP by McCarthy "Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I". It has the original definition of LISP in LISP.


The TCC step seems unnecessary. If you've got a SectorC C compiler sufficient to compile TCC, it can probably compile the bootstrap compiler used in GCC's 3 stage build process. The bigger issue is probably getting the ancillary tools up and running (a shell, make, coreutils, ...)

https://gcc.gnu.org/install/build.html


The method described in the page you reference, does not exclude the possibility that the compiler you start with, contains malicious code, and recognizes that it is compiling the GCC code base. It is not true bootstrapping as described in [0]. If you read through [1] you will see that even getting a fairly recent version of TCC to compile is not that simple.

[0] https://bootstrappable.org/ [1] https://bootstrapping.miraheze.org/wiki/Live-bootstrap


At that level of a paranoia, there's no guarantee that the CPU doesn't contain a malicious method for detecting and modifying compiler builds, so you better build your own CPU out of TTL chips ;)


I wonder if anybody tried to maintain TCC


Yes! There's a public git repo here: https://repo.or.cz/tinycc.git "Public" in the sense that anyone can just push to the "mob" branch. You don't even need an account.

Such anarchy has obvious security implications but has worked remarkably well in practice. There's a mailing list too.


Ah well, I would never have found out. Thanks.


Now they just need to port something like oneKpaq to 16 bit or maybe something from the extremely tiny decompressor thread [1], just to test compression level to get an idea kpaq on its quickest setting(taking minutes instead of what could be days on its highest) reduced SectorC to 82.81% of its size, of course adding the 128 bit stub knocked it to 677 bytes. It would be interesting to try it on the slowest takes day to bruteforce setting, but I'm not going to attempt that.

Some of the compressors in that forum thread since they are 32 bytes and such, might find it easier to get net gains.

[0] https://github.com/temisu/oneKpaq

[1] https://encode.su/threads/3387-(Extremely)-tiny-decompressor...


LZ decompressors are tiny, as your second link discusses, but it is unlikely that they'll find much redundancy that could easily be removed from the uncompressed program itself, thus removing the need to use them.


I once tried making a 1kb binary to see what I could fit in, once you've got to the point of considering compression there isn't much ordered data left to compress. I worked it out to be a ~10 byte saving.


This is fascinating, I really did not think it was possible to implement even a tiny subset of C in just 512 bytes of x86 code. Using atoi() as a generic hash function is a brilliantly awful hack!


Yeah, this was great:

Hashes are perhaps the holy-grail of computer-science. With a good hash, we can just side-step all the hard problems by trading them for an even harder problem (hash collisions), and then we just ignore that harder problem. Brilliant. (sticks fingers in ears)


wow, this is impressive.

I wrote a similar x86-16 assembler in < 512 B of x86-16 assembly, and this seems much more difficult <https://github.com/kvakil/0asm/>. I did find a lot of similar tricks were helpful: using gadgets and hashes. Once trick I don't see in sectorc which shaved quite a bit off of 0asm was self-modifying code, which 0asm uses to "change" to the second-pass of the assembler. (I wrote some other techniques here: <https://kvakil.me/posts/asmkoan.html>.)

bootOS (<https://github.com/nanochess/bootOS>) and other tools by the author are also amazing works of assembly golf.


I considered self-modifying code, but somehow I kept finding more ways to squeeze bytes out. I’m half convinced that you could condense it another 50 ish bytes and add operator precedence or even local vars. But.. frankly.. I was ready to switch my attention to a new project.


Pretty nifty, nice work!

I'll point out to any passerby that this C doesn't support structs, so it's unlikely you'd actually want to build anything in it.


The C Star (C*) language from the selfie project also does not support structs, yet in 12KLOC of code they implemented a C Star compiler that can compile selfie (and outputs ELF files), an emulator that runs RISC-U (RISC-V subset), and a hypervisor.


There was also Small-C[1], which initially only did integers, characters and simple arrays. Formed the basis for quite a few derivates.

[1]: https://en.wikipedia.org/wiki/Small-C


Amazing!

I think this, from the conclusion, is the real takeaway:

> Things that seem impossible often aren’t and we should Just Do It anyway

I certainly would never have tried to get a C compiler (even a subset) so small since it my instinct would have been that it was not possible.


See the bootstrap project: https://bootstrappable.org/projects.html


I'm wondering if you can build an actual "Linux from scratch" with this as the lowest level, without the need to use a host system at all.


Not using this, but tangentially related is (full disclosure, i am a maintainer of this project) live-bootstrap, which uses about a KB of binary to do a full "Linux from scratch" style thing - read https://github.com/fosslinux/live-bootstrap/blob/master/part... for all 143 steps you have to go through to get there.


Wow, just reading through the list of steps was fascinating!


I can't help but wonder if this is written in x86-16 bit mode to implicitly use Real Mode BIOS functions and platform interfaces as well.

There's something to be said for taking advantage of that existing code; but if that's a dependency it should be part of the environment manifest. Everything has (most things have) a context where it might be useful and that should be included in the explanation so a tool isn't misused and so incorrect expectations aren't set.


It’s written in x86-16 real mode because all immediates are smaller and switching to 32-bit mode takes a fair amount of code.

I would have preferred 32-bit mode because real mode is awkward with its seg:off addressing scheme.

Using the bios functions just allows it to have “not-boring” examples easily.


That's true, but it's still a sort of library dependency. Even for very tiny computers with a lot of hardware assistance, the basic ROM for communicating with the world in a useful way is bigger.


Without structs? Good luck...


LFS already has several "stepping stones" where it walks through a whole set of C compilers from very old ones compiling slightly newer / more capable ones and so on.

Perhaps with a few more "layers of compilers" on top you can get a very early GCC going.


>"Perhaps with a few more "layers of compilers" on top you can get a very early GCC going."

Applause. Best comment ;)


wonder how many extra bytes for structs?


In my experience, probably at least a 200-400 lines of C (another 512 bytes minimum? I don't know the conversion).

The problem is not just declaring structs but accessing them, referencing them, accessing/referencing fields and subfields, etc. That has to all be in the syntax. For such a minimal project you can ignore initialization.

I did all this while adding maybe 500 lines of C to GitHub.com/vitiral/fngi. Never tried for this level of minimalism though.


Once upon a time, when += was spelled =+, struct members in C were global names whose "values" were the offsets from the beginning of the struct[1]; a.b is simply a[ptrtab[b]] and a->b is a[0][ptrtab[b]].

[1]: This is why all of the names of struct members in unix are "needlessly" prefixed; i.e. struct stat.st_mode is not struct stat.mode because that could conflict with struct foo.mode until the early 1980s.


Ah that explains it, I wondered about those old struct defs.


I started reading the source... and digging for the part that allocates space for variables.... only to realize variable declarations are ignored and unnecessary... wow... what a breathlessly reckless hack! I love it!

It's like using an M18A1 Claymore mine and hoping it actually is aimed (and stays aimed) in the right direction.


The conclusion table was resume building skill in and of itself.


> I will call it the Barely C Programming Language

Or BCPL, for short.

> The C programming language was devised in the early 1970s as a system implementation language for the nascent Unix operating system. Derived from the typeless language BCPL, it evolved a type structure; created on a tiny machine as a tool to improve a meager programming environment, it has become one of the dominant languages of today. This paper studies its evolution. [1]

[1] https://www.bell-labs.com/usr/dmr/www/chist.html


Reminds of the META II Metacompiler http://hcs64.com/files/pd1-3-schorre.pdf


Great writeup!

Especially liked this nugget:

> (NOTE: This grammar is 704 bytes in ascii, 38% larger than it's implementation!)


That is insane, congrats.

I would have wished some explanation on where the function calls like vga_init and vga_set_pixel come from, I'm not a graybeard yet.


They're from the runtime, which is just concatenated with the program to be run: https://github.com/xorvoid/sectorc/blob/main/rt/lib.c


Yup. See “rt/lib.c” A bunch of inline machine code. The assembly mnemonics are in the comments.


something like this could be interesting for deep-space applications where you only have a bare metal environment with hardened processor and limited memory & of course ping time of days (to earth).

or alternatively for embedding a C compiler inside a LLM to use the LLM as a form of virtual machine.


You mean like for retrofitting an existing satellite? It seems like we're beyond the era of extremely constrained embedded environments, even for space hardware. Several hundred Mhz PowerPC based avionics computers with hundreds of MB of RAM seemed pretty common 15 years ago.

What would a C interpreter do that wouldn't be done better by simply uploading a new compiled binary, though? The source code is most likely less space efficient than machine code.


A 512 byte memory restriction and deploying an LLM could not be on further opposite sides of the spectrum. :D


LLM currently have fairly tight token restrictions for prompting. That said chatGPT could probably compile some C. Never tried though.


really interesting write-up. thanks for sharing!

do you think there are any lessons that can be applied to a "normal" interpreter/compiler written in standard C? i'm always interested in learning how to reduce the size of my interpreter binaries


Hard to say. I’m fairly sure that all of modern software could easily be 2-3 orders of magnitude smaller. But, the world has decided (and I think rightfully so) that it doesn’t matter. We have massive memories and storage systems. Unless you have a very constrained system (power, etc) then I think the big bloated, lots of features approach is the winner (sadly).


It depends on what you mean by "winning".

You can't have the goal for the code to generate revenue, at equal priority for it to be easy for lots of inexperienced programmers to work on (i.e. maximise costs), because the only time you're going to choose "readability" and "unit tests" (and other things with direct costs) is because of doubts you have in your ability to do the former without this hedging of your bets a little. If you disagree, you do not know what I said.

And so they ask: What is the next great software unicorn? Who knows, but being able to rapidly put stuff in front of consumers and ask is this it? has proven to be an effective way at finding out. And if you go into a product not knowing how (exactly) to generate revenue, you can't very well design a system to do it.

Do you see that? Software is small when it does what it is supposed to by design; Software is big when it does so through emergence. Programmers can design software; non-programmers cannot. And there are a lot more non-programmers. And in aggregate, they can outcompete programmers "simply" by hiring. This is why they "win" in your mind.

But this is also true: programmers who know what exactly makes money can make as much money as they want. And so, if someone has a piece of tiny software that makes them a ton of money, they're not going to publish it; nobody would ever know. 2-3 orders of magnitude could be taking a software product built by 100 people and doing it with 1-2. The software must be smaller, and that software I think "wins" because that's two people sharing that revenue instead of 100+management.

To that end, I think learning how to make tiny C compilers is good practice for seeing 2-3 orders of magnitude improvement in software design elsewhere, and you've got to get good at that if you want the kind of "wins" I am talking about.


> sure that all of modern software could easily be 2-3 orders of magnitude smaller

niklaus wirth thought similarly... in 1995![0]

i enjoy implementing array langs (k primarily) so small binaries come with the territory. really appreciated your write-up. i may try something similar for an array lang.

you might also appreciate (or detest) 'b'[1][2] - a small implementation of a "fast c compiler (called b: isomorphic to c)". it has some similarities to SectorC.

[0] https://www.computer.org/csdl/magazine/co/1995/02/r2064/13rR... [1] https://web.archive.org/web/20230117082148/https://kparc.com... [2] https://github.com/kparc/bcc (more legible)


Great read and awesome achievement. Could see this being useful for smaller microcontrollers.


It was funny to read. Thx! :))


If you don't use this... are you even suckless?


Bravo! This was a wonderful read, xorvoid.


Amazing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: