So the chibicc book is not available yet. I'm busy working on the other project (the mold linker) and don't have time to work on it. That being said, I believe the repo is still very valuable for those who want to learn how easy it is to implement a simple C compiler. chibicc's each commit was carefully written so that you can read one commit at a time. I'd recommend starting from the initial commit and observe how each feature (`if`, `for`, local variables, global variables, etc.) is implemented by following each commit.
I'm charmed by projects like this, thanks for sharing.
What I want to see is a program like this, with very simple assembly/object code, and very fast compilation, and uses DynASM to build the executable in-memory.
Then use a tracing JIT to make it fast.
Having good, tiny, reference C compilers is a prerequisite for that, so I'm always happy to see work in that field.
I am interested in this too. I would do different tradeoffs. I am more interested in optional garbage collection, the parallelism and async story in the language such as threading and coroutines or both together.
I suspect combining garbage collection, exceptions, closures, tail call optimisation, parallelism, JIT compilation and coroutines is difficult to do orthogonally.
On eatonphil's discord someone recently shared this link: This is a framework for building high performance language runtimes
> It didn't work. If it is available as open-source, they don't _need_ a different license.
Don't they? I'm surprised to hear that. I have sold a few proprietary versions of my AGPL codes (which were identical to the original one, but with the license stripped). Not enough to make a living, but I can buy some fancy bikes with the money.
In some cases, they even paid --separately-- for support and a few features of the software that were of particular interest to them. For some reason, many companies are extremely frightened of the AGPL, but a dual AGPL/commercial licensing seems to fit them very well. This is a nice model for free software distribution, but it only suits small projects that do not get external contributors.
I could buy bikes too, but as you wrote that's not enough to make a living. This is my full time project. I could earn a mid 6 figure salary if I work for a big tech, and I think I'm creating a more valuable program than I did when I was working for a big tech, and in return I make money that counts in "fancy bicycle" unit... I think it's not wrong to say it didn't work quite well.
I might get shit on for saying this here, but I agree with you that it's very difficult to make a decent living on fully "free" software.
I've had orgs not bat at an eye paying a few thousand for a software license if there are justifiable productivity returns and the license is required for us to continue to use the software.
If that license is not required, we would never give you a dime. It's sad, but true. Those few thousand, to many companies, are pennies on the dollar. Just price it graciously so you don't leave the little guys out.
You'll get grumbles but do what you need to survive if you want to do this full time and for a living. God knows you, of all people, have earned it.
I think I recall you saying that you elided the bugs, yeah? (Am I under-caffeinated this morning?) If so, you could write a second book about the bugs and issues you encountered while implementing chibicc and, uh, that would be awesome too! (I'd buy a copy.)
The book covers the vast topic with an incremental approach; in the first chapter, readers will implement a "compiler" that accepts just a single number as a "language", which will then gain one feature at a time in each section of the book until the language that the compiler accepts matches what the C11 spec specifies. I took this incremental approach from the paper by Abdulaziz Ghuloum.
Crenshaw's "Let's Build a Compiler" tutorial series --- from the late 80s/early 90s --- uses exactly the same approach:
Thing I missed. No memory manager. allocate() but never free().
This makes the stats a little bit darker.
You can't use it for anything other than this compiler. And adding a memory manager would possibly have added 3x more code and complexity than what's here.
You can consider sponsoring the author (rui314) on GitHub[1]. They're the one behind the mold linker and are considering switching the license[2] because of lack of funding.
> Last but not least, chibicc allocates memory using calloc but never calls free. Allocated heap memory is not freed until the process exits. I'm sure that this memory management policy (or lack thereof) looks very odd, but it makes sense for short-lived programs such as compilers.
I’ve thought about this – why isn’t this a more common thing to do for short-lived programs such as cli programs? Or is it common – could anyone give some examples of well-used programs that do this?
The reasons to not do it that I could think of is:
1. “It’s just bad practice”
2. You may suddenly find yourself having written some kind of malloc bomb, more easily than you think
In my experience it is about code reuse and/or maintainability - what if this code becomes a library and is integrated as part of a long-lived program? It would be risky a posteriori to retrofit a proper memory management.
And perhaps, there's something about programming habits. We hear often enough about C having not enough safeties, and one way to mitigate the issue is having "safe" habits. Kind of like how you activate your blinker when you turn even when there's nobody around; if you don't, you might forget to activate it when it matters.
It's considerably more work to get a piece of software in shape for use as a library. Matching all callocs with frees is not a huge amount of work, but if you know that there's no requirement to do it, you might as well save the typing.
Kind of like I admit I often don't active the blinker when nobody is around, because it means a little inconvenience and it requires you to get one hand off the steering wheel, which might be the only one holding it.
There are code patterns that allow to code like this in a reusable way. Lookup line allocators. You can allocate resources inside a group and release the group in one fell swoop.
> what if this code becomes a library and is integrated as part of a long-lived program
Cue one of my favourite comments[1]:
/*
* We divy out chunks of memory rather than call malloc each time so
* we don't have to worry about leaking memory. It's probably
* not a big deal if all this memory was wasted but if this ever
* goes into a library that would probably not be a good idea.
*
* XXX - this *is* in a library....
*/
(Of course, this consideration should be appropriately downweighted by YAGNI, as threading memory management through prototype or internal utility code can by itself easily force it into very non-prototype territory wrt effort.)
If you know it will always be short lived, that's fine. But even compilers are not strictly short lived nowadays. Both Swift and Rust (I think) maintains a daemon process to assist incremental compilation.
If you track and free all allocations diligently, you can also use tools to automatically find accidental memory leaks. If everything is a memory leak then useful tools become less so.
With ”short-lived”, I rather meant something like: ”does one job and then exits”, as opposed to something more dynamic – of course, that it takes a long or short time to execute isn’t by itself relevant.
As for C++, memory is not the only resource you can leak. When you leak, then you leak objects with it, and they can hold on to other resources that you might want to clean deterministically.
Having said that nothing stops you from replacing the global allocation functions, so that deallocation is noop. But your program, and especially libraries should still match up malloc/free, new/delete and allocate/deallocate.
Also your default malloc probably does a bunch of bookkeeping that is eventually only used by free. Once you decide that you won't call free or use a noop free, then there are potentially better candidate implementations for malloc as well.
It can be quite painful to add them afterwards and while the program might now be shortlived, this might not hold true in the future. And just maybe if one doesn't want to deal with manual memory management, then pick a tool/language that doesn't require it?
Maybe a practical alternative to this—that would still reap the performance benefits—would be to have an allocator that postpones frees until certain time (or e.g. number of bytes allocated) has been passed since the start of the program, and after that point works like normal alloc/free.
Actually this is not far from how garbage collecting languages work..
I know about this project for long time and in my opinion is an excellent educational resource. As an exercise I re-targeted chibicc to z80 without any problem and it was lots of fun. Now I am trying to write an optimizer on top of it.
`static` variables are file-scoped, similar to module-scoped in other languages. Truly global variables are accessible from any file. Those should be very rare.
> portability is not my goal at this moment. It may or may not work on systems other than Ubuntu 20.04.
... that is not good. On the contrary, you should make it so that portability is not an _issue_. IIUC, this should not depend on much beyond libc, or even just libc, so - why should it be Ubuntu-specific?
CRT object files which contain process startup routines are not in a fixed directory. Unfortunately compilers have to be hard-coded to contain an appropriate directory name.
Well, can't that be made a build parameter, that can be determined during build configuration? CMake'ifying the project might help you get there perhaps?
So the chibicc book is not available yet. I'm busy working on the other project (the mold linker) and don't have time to work on it. That being said, I believe the repo is still very valuable for those who want to learn how easy it is to implement a simple C compiler. chibicc's each commit was carefully written so that you can read one commit at a time. I'd recommend starting from the initial commit and observe how each feature (`if`, `for`, local variables, global variables, etc.) is implemented by following each commit.