Now if the program were divided into basic blocks, then any jump based on flags would necessarily be the last instruction of the current basic block. Suppose that the program has the property that flags are always consumed in the same basic block. Then the control info is enough. So whenever we replace movl $0, eax, by xor, we can just check whether or not we are in a basic block that ends with a jump conditional on the Z flag.
If that doesn't hold, we need data flow info. For each instruction, we can know from doing data flow analysis whether there are any live flags: CPU flags whose current values are used down later in the program.
E.g. say we have:
movl $0, %eax
movl $0, %ecx
movl abc, %esi
clz
Here, data flow analysis tells us that the Z flag is dead at the movl instruction, because the only possible execution path reaches a clz, and nothing uses the Z flag before then. Thus we are free to clobber it. Moreover, when we do so, the data flow picture doesn't change. The Z flag is dead at the movl, and is still dead when that is replaced by xorl. The optimization doesn't require a recalculation of the data flow info.
Sure -- the compiler is small enough to include -- and fast enough.
Bellard did an experiment where TCC was used to compile the Linux kernel
on boot. Remarkable.
Yes, the source may be verbose. But, the API exposed would be C, and most
systems have a C FFI.
Instead of TCC, you may wish to consider Quick JS (Bellard's Javascript
interpreter).
The reason I use TCC is to validate C code. TCC evaluates arguments left to right, whereas GCC does right to left...
Really GCC -O1 is usually plenty fast if you are just precompiling things at startup that your main executable would dynamically load. The bigger question is, would you really want to be using a language with manual memory management for writing extensions to a program? Maybe including the boehm gc with it would not be terrible.
A wonderful and fun read! I've been writing my own compiler for my own language, and this may just be the motivation I needed to add optimizations to it! I was always thiking about really deep semantic analysis and optimization based on that, but this seems effective, simple, and like great fun!
The generated function also has a subtract zero from rsp which could just be plain eliminated. Though it's probably rarer to have this in general code, it means there are no local variables. Looks like there are still some low hanging fruits in there :-)
It looks like a generic prologue. Ideally the code generator would detect that no stack frame is required whatsoever and just emit the barebones return. It's possible that the prologue is generated ahead of time and then patched as the codegen determines how many locals it needs.
You could do a second pass if you detect no locals (you'd need to as you'll be computing new offsets for relative branches/loads/etc), or potentially generate two versions of each function as you compile and only commit the one that is smaller but those are some pretty big tradeoffs.
I don't have any idea how TCC is organized internally but my money would be on a generic prologue getting fixed up after-the-fact.
There isn't really any value in doing that kind of stuff honestly unless you just like really boring and annoying work. Like, if you are designing your own ISA, you can punt that down to basically the end of the line, because all the actual important stuff comes first.
There's a kind of great irony to compiler work like this where you need some outside thing to drive the design, but if the design being driven ultimately isn't very bespoke, you eventually realize everyone else has done all the work already so everything you have to do is boring (e.g. wire up a new ABI to some existing codegen... how riveting...) Alternatively you have something unique, and find out the compiler is actually the hard part of the whole problem, because now you're on your own.
If you want a tool for early programming your own ISA I recommend using Lisp or something else as a macro assembler to exercise the system. If you simply insist on arbitrary C code, something like WebAssembly might actually be better than patching some random 3rd party code generator (i.e. just write a simple one-shot one-pass compiler that does no advanced regalloc or optimization and just turns it into a binary. This is relatively easy and does not require advanced algorithms and there are already a bunch of WASM compilers available.)
> If you want a tool for early programming your own ISA I recommend using Lisp or something else as a macro assembler to exercise the system.
That something else should be, and hear me out, Forth.
Yes I'm perfectly serious, if you look around you can find some interactive AVR Forth assemblers which are in the 2-4K installed range.
This is Forth's niche, it's a surprisingly capable language for many things but interactive programming a bespoke ISA, this is where Forth comes from.
Firm recommendation to at least read A Problem Oriented Language if working in this space, a combination of memoir and a distinctive philosophy of software development.
But, this is TCC's strength. It IS a simple one-shot, one-pass compiler that does not advance regalloc or optimization! With a single-file code generator that has already been ported. Goes straight from C source to executable binary in memory.
The main problem is that TCC has been "extended" to take on other roles (like linking ELF).
Sure but if you're working on a custom softcore processor, there's a lot to do, and a lot to learn, and it's very easy to stall out on boring parts, because in a production setting there is a lot of boring stuff. A C compiler is "boring stuff." Unless your ISA design is non-traditional, for sequential scalar machines I feel you're mostly just doing tedium to emit slightly different instructions from an existing codebase. It's not that it isn't valuable to have a C compiler, it's that for the purposes of learning and absorbing information and new things, someone else has done most of the "learning" parts for you. So you could probably just... do something else.
My opinion here is broadly informed by two things, which is that 1) if you want to write a compiler, you should write a compiler, not just tweak an instruction emitter (yes I know that's actually a very common part of the job for working compiler devs, but that's not what you're doing.) And 2) people are far too eager to fall into their old ways even when it isn't productive. A C compiler is comfortable. For someone experienced, it would be easy to port an existing one -- but they probably wouldn't have been asking about it in the first place. For those other people, it's boring and a lot of work when a macro assembler like Forth or Lisp can get you going and writing higher-level code in a few hours. When you're doing things like bringing up silicon you wrote, that's way more important and much more valuable.
You shouldn't eschew things like a C compiler forever if you plan to really get serious. Sometimes you can get one for free (MIPS, RISC-V) and that's super valuable. For a production toolset, it would be absolutely essential. But when you're learning it's easy to sink a lot of time into things like that, when that's not where the real meat and potatoes are.
I've been porting subc https://www.t3x.org/subc/ to my 36-bit RISC machine arch. It's by no means 'full' C, however:
1. It is Public Domain. I like that.
2. There is a book ($) that explains it in detail (and it's good).
I respect the design choices made, and the compiler can compile itself once bootstrapped. It's really fast, but code gen is only 'ok'. That's a tradeoff that works for me.
It is half way in complexity between TCC and GCC or LLVM and was created to be educational. I think there are major differences between more recent versions and what it described in the book.
have you considered adding a backend for LLVM? perhaps a bit heavyweight, but it could be a good way to get C/C++, fortran, rust, etc. if that's something you'd like!
Seems like an interpreter? But maybe that could be tweaked... A bit messy code though.
> have you considered adding a backend for LLVM?
Just to the extent that I kinda dismissed it out of the gate due to seeming to be a fairly complex and dynamic target. Haven't used it personally but a good buddy maintains a backend.
Are you making your own softcores? Do you have a recommendation for getting started with that? I've recently gotten interested in making CPUs from logic gates, but I have no idea how to transfer that knowledge to FPGAs.
I'm fairly fresh to this field, so hopefully someone else can come along and give a better answer. I just picked up an iCE40[1] and Cyclone IV dev board[2] and started hacking Verilog.
I know Verilog isn't the best option around, and I'm planning on moving onto something better but for now it was easy to get started, and from what I could see many of those better tools end up generating Verilog so would be nice to be able to debug that.
If you know how to think in logic-gate-terms then FPGAs won't be that difficult I suspect. Essentially either you have plain logic gates where the output is purely a function of the inputs, or you have edge-triggered logic thanks to flip-flops.
Just think of it as writing code that specifies how to wire up set of logic gates, muliplexers and flip-flops.
As a brief example, here's an excerpt from my very first ALU:
wire [8:0] res_and;
wire [8:0] res_xor;
wire [8:0] res_add;
assign res_and = (bus_a & bus_b);
assign res_xor = (bus_a ^ bus_b);
assign res_add = (bus_a + bus_b);
always @(posedge clk)
case (op_sel)
OP_AND: result <= res_and;
OP_XOR: result <= res_xor;
OP_ADD: result <= res_add;
default: result <= 0;
endcase
Here "bus_a" and "bus_b" are wires (which may come from registers), while "result" is a register.
The wires don't hold state, while registers can. The <= means to latch the right side into the left side, in this case on a positive edge of clk. The case statement is just a multiplexer.
At least to me, it feels very much like "writing a circuit".
He designs a Z-80 from scratch, giving Verilog hints / good practice along the way. The design isn't 'complete' -- you have to learn what he's saying and add in the other parts.
It's not a 'trivial' book but if you want to Get Good -- recommended.
Thanks! I feel like I actually understand that verilog code! Do you ever see the "rendered" state on the FPGA, or is that all abstracted away?
Would you recommend one of those FPGAs over another? I know iCE40 from this amazing project (https://github.com/nickmqb/fpga_craft) so maybe I'll just get that!
I also want to refer you to this http://fpgacpu.org/ especially the XSOC links. Jan builds a complete SoC-on-a-FPGA in really 'ancient' fabric (4000-series Xilinx) for Circuit Cellar (articles available on this site). He also hacks a C compiler to emit code for his 16-bit RISC machine (which is very clever).
This is historical, but a 'must read' if you're new to this stuff.
I use the iCE40 with 8K LUTs. It's 'old' but the open-source chain is nice, the vendor-supplied chain is fine, and it's 'big enough' to do fairly serious stuff.
Like the subc compiler I mentioned, the tradeoffs working with an older FPGA like that are ok to me.
GL! p.s. I mostly used the TinyFPGA BX to learn about the iCE40, but it doesn't have enough pins to make even a semi-decent full machine (with wide memory bus) so I use the Lattice HX 8K dev board. It's also easy to use.
You can certainly inspect the synthesized primitives, either graphically post-place&route, or in a netlist which is usually a tangled mess.
Typically you're doing so either because you're curious, you're debugging a timing problem, or you're trying to make sure the toolchains inferred a particular primitive you are trying to use on purpose. It's not terrifically informative otherwise.