Hacker News new | past | comments | ask | show | jobs | submit login
Zig is now self–hosted by default (github.com/ziglang)
387 points by BratishkaErik on Aug 20, 2022 | hide | past | favorite | 111 comments



To clarify what this means, since things can get confusing with subtle differences in wordings:

* stage1 is when zig is built from C++ code[1] (the "bootstrap" compiler) using system C/C++ compiler toolchain.

* stage2 is when zig is built from Zig code[2] using stage1.

* stage3 is when zig is rebuilt from the same Zig code[2] using stage2.

Before today, zig would give you stage1 by default, and you could opt in to stage2 using `-fno-stage1`. After today, zig gives you stage3 by default, and you can opt in to stage1 using `-fstage1`.

In all three cases, LLVM is being used. Although Zig has started to fully self-host by providing backends that have no dependency on LLVM, none of these fully self-hosted backends are complete.

Why bother self-hosting? Because:

* Zig produces smaller, faster binaries than C++ that use less memory[4]. Case in point: the new self-hosted compiler is 1.5x faster than the C++ implementation and uses 3x less peak RAM.

* Development velocity in Zig is much faster than C++, and debugging is a breeze in comparison due to Zig being much safer than C++ and having better debug tooling. Similarly, comptime features let us add more assertions and debug checks that are not possible in C++.

* Zig compiles much faster than C++.

* Zig cross compiles better than C++ making it easier to create builds of the compiler for every target.

* The self-hosted compiler does not need a softfloat library dependency to support f16 or f128 operations.

* Zig's std lib data structures are incredibly useful, especially compared to the C++ STL.

There are still many known bugs; not every project will be able to upgrade immediately. See the full upgrade guide [3] for help deciding when and how to upgrade.

[1]: https://github.com/ziglang/zig/tree/master/src/stage1

[2]: https://github.com/ziglang/zig/tree/master/src

[3]: https://github.com/ziglang/zig/wiki/Self-Hosted-Compiler-Upg...

[4]: https://media.handmade-seattle.com/practical-data-oriented-d...


> Case in point: the new self-hosted compiler is 1.5x faster than the C++ implementation and uses 3x less peak RAM.

I am not sure it is comparable: the self-hosted compiler uses a different architecture [1].

[1] https://vimeo.com/649009599/7f1fda43ff


This is true, but I don't think your point entirely fair either. Zig's design lends itself to writing this style of code. In contrast the usual way(s) C++ is used encourages code that makes these types of optimizations hard/impossible. In practical terms, if you wanted to write fast software, the ergonomics of the language allowing you to use the desired architecture matter a lot.

(Also, you're replying to the primary author of Zig and presenter of that talk, if you weren't aware.)


While ergonomics matter, I don’t believe the question is settled at all - for very high performance applications C++ is the answer for a long time to come.


Hi Andrew, sorry for the unrelated question but where can we find the financial reports of the Zig Software Foundation for 2022? The Finances spreadsheet hasn't been updated for the past half a year:

https://docs.google.com/spreadsheets/d/14_ljFHGFXY5NhBhlfjgk...


Isn’t it quite common to only do financial reports only every quarter, 6 months, yearly?


> having better debug tooling

What debug tooling are you referring to? It seems like people just use lldb to debug zig programs?

I'm honestly interested, and searching doesn't turn up good tools as far as I can tell.


Here are some features of Zig that make debugging much easier than C++:

* safety checks (equivalent to UBSAN) when triggered, print stack traces that include source code lines and point to the relevant line/column

* one of the safety checks is using the wrong field of an untagged union. This one is so underappreciated. Wrong union field access costs so much time to debug in C++ but it is a breeze in zig, you get a crash explaining the problem with a stack trace immediately, not a corrupt value that causes problems down the road. untagged unions are core part of the strategy that makes zig fast & have a small memory footprint.

* valgrind client request integration[1]. valgrind "just works" better with Zig than C/C++ because zig emits communicates more information about memory regions that are "undefined".

* error return tracing [2]

* segfaults print stack traces

* std.debug has features to collect stack traces in a compact manner and then dump them at a relevant time [3]

* std.heap.GeneralPurposeAllocator detects leaks and prevents memory corruption from Use-After-Free/Double-Free, making debugging easier

[1]: https://valgrind.org/docs/manual/manual-core-adv.html#manual...

[2]: https://ziglang.org/documentation/master/#Error-Return-Trace...

[3]: https://github.com/ziglang/zig/blob/4a98385b0aa3808ab05a1ebf...


Thanks!


Maybe the runtime safety in debug builds? Like for out of bounds, int overflows, etc. In my zig projects those checks have saved me from using a debugger or valgrind in many cases.


I guess I don't call runtime safety features "debug tooling"? But maybe that's the disconnect here


I think it's funny that I got downvoted for this! For full clarity: I wasn't suggesting that they _shouldn't_ be called debug tooling, just that it's not what I would call that class of feature, and it seems that other people do call it that.


I'd call tools like ASAN and friends debug tooling. I'm not sure if zig has tooling that doesn't have an analogue in the c++ world, but having them built in to the compiler by default is a notable feature.


Right, defaults matter. The fact sanitizers exist is much less of a game changer than the default behaviour being sanitized.

Because C++ defaults are notoriously all wrong, in programming language design you could usually do worse than consider, "Is there an alternative to what C++ does here by default?" and if there is one, choose that alternative as your default because it's more likely to be correct.

Sometimes C++ helps you out, it might have a keyword "do_it_right" and you can just default to that, if you feel people might want the C++ default behaviour, feel free to reserve "do_it_wrong" in case people wanted that, but in many cases they will never ask you to implement the C++ behaviour because it's just wrong. Examples: "const" in C++ doesn't mean "constant" it means "immutable", so, make immutable your default and offer "mutable" as the option. "explicit" in C++ makes constructors need an explicit cast to be used for conversion, once again of course you want that by default but feel free to reserve "implicit" in case some people are sure they need the C++ behaviour.


C++ frameworks pre-standard used to have better defaults, unfortunately there a majority of people voting that would rather have the wrong defaults, and so here we are, and given how ISO works it won't change.

Those that care know where the knobs are to re-enable those proper defaults, given that for some domains, C++ is going to stay around for a while, regardless of the alternatives.


> * Zig's std lib data structures are incredibly useful, especially compared to the C++ STL.

I think elaborating on this could make for a very interesting article. :-)

(I have a guess already at one item, which is MultiArrayList, something that kind of blew my mind when I first learned about it, and is in my opinion a very impressive testament to both the power and ease of use of Zig's approach to compile-time computation.)


I didn't even know about it until today, very impressive. Generic SoA in less than 500 lines of userland code...

https://github.com/ziglang/zig/blob/master/lib/std/multi_arr...


SoA in C++ isn't that long, here's my version (which for sure does not have the utility functions defined here though, but is useful enough for me): https://github.com/celtera/ahsohtoa/blob/main/include/ahsoht...


That's pretty sweet, I didn't know C++ could do this now!

Zig's still got it beat handily in simplicity though -- I think what's really cool about Zig's idea of comp-time evaluation is you get something like the power of C++ with something like the simplicity of C.


> Zig produces smaller, faster binaries than C++

This seems fairly significant to me and makes me wonder how this is possible with all the effort that has gone into optimizing C++


Because they can't do breaking changes and must live with legacy baggage.

While zig can work use modern techniques without such issues


This is wrong because you can strip out C++ code you don't need. Green code doesn't need to worry about legacy baggage


And how do you identify the unneeded code in a compiler?

Which is used all over the world?


A smaller binary will not necessarily be faster - specialization can bloat code size, but having a specific version for a given subtype can be much faster. It’s a tradeoff.


right after that sentence there's a link to a talk that explains one way this is true


We can hardly be expected to watch a 46 minute video on "A Practical Guide to Applying Data-Oriented Design" to find our answer. It seems unpromising on the surface.


Data oriented design is why the thing is faster.


I guess I'll wait for the benchmarks comparing C++ and Zig over a variety of problem solutions to see which one is faster


Congratulations. This is an important milestone.


For Andrew it must feel he's now on the home stretch towards 1.0, finally!


In a previous presentation I think he mentions they are still aiming at something like 2025.


Thanks for the context!

What's the difference between the stage2 and stage3 binary? Does stage1 produce different binaries for the same input compared to stage2/stage3?


Ideally there should be no difference and building stage 3 is basically a sanity check to ensure the compiler is working correctly.


Not quite - what you said is true for a hypothetical "stage4" however there is a distinct difference between stage2 and stage3. While they are built from the same source code, and therefore have the same logic, they are lowered by different backends, meaning they will have potentially drastically different performance characteristics depending on the differences between the stage1 and stage2 backend, respectively.

Related: https://github.com/ziglang/zig/issues/12183


> Memory usage is improved by a factor of about 3x. For Zig, building itself went from using 9.1 GiB to 2.7 GiB.

Pretty impressive.


Andrew did a nice talk on the kinds of modifications that were necessary to get those improvements https://vimeo.com/649009599


Not bad. For comparison Nim builds itself using just 668MiB.


Nice work!

I suspect the difference mainly comes down to the fact that Zig compiles everything into a single compilation unit. As you can see, this has tradeoffs. It produces better code (single-compilation-unit is what "LTO" approximates) but it requires more memory and more code needs to be rebuilt on changes.


How many processes does this involve, i.e. `make -jX` for which X?


That is stunning


I never understood why "self-hosting" is considered a good thing. Sure, the compiler developers can write in their favorite language and I guess it means the language is stable enough to be used in a complex project; however requiring an (older) compiler to build the compiler seems a significant complication for software distributions.

You either have to set up an ever increasing chain of compilers, as the complexity of the language and the features required to build the compiler grows, or rely on pre-existing binaries. Either way, it seems like a nightmare compared to keeping it in plain old C and building with any compiler of your choice. Just Imagine if any project did this, like, building Firefox now requires an existing Firefox installation.


I believe Andy noted that it's way easier to find contributors for the self hosted compiler. If someone is passionate about the language, they want to write the language, not C++. And specifically for languages like Rust and Zig, the explicit orthodoxy is that the language is a better successor to C++/C. If you're spending your time telling people that they shouldn't write C++/C and should instead write your language, you probably should put your money where your mouth is and switch the compiler to your own language. Note that this is significantly less common with high level languages. Ruby, Python, even Swift are not bootstrapped. That's because none of these are claiming to take over the systems software niche and therefore don't need to prove anything by bootstrapping.


PyPy is written in pure python and transpiled to C++


Pypy is mostly written in a restricted dialect of Python (rpython), not pure python. They're substantively different.


Better educate yourself on Swift, not only does Apple clearly assert on their documentation that Swift is their replacement for C, C++ and Objective-C, they have started the planning efforts to bootsptrap it.

Ruby and Python are scripting language, as such seldom bootstraped.


There's a third option that you missed which is what Zig does. It has a "bootstrap" compiler, written in C, which is kept in sync with the self-hosted compiler. Features are implemented twice, once in the bootstrap compiler, once in the self-hosted compiler. So the build process involves a fixed set of steps - stage1, stage2, stage3 - and the chain never grows more than this. No pre-existing binaries are required.


> No pre-existing binaries are required.

Except for of course a c compiler.


Which is something mescc and friends solve. Zig doesn't need to solve the full bootstrap from nothing chain, just the entry.


Currently, alternatively they could cross-compile.


Which is…self-hosted.


need a compiler to compile it, though.


> Either way, it seems like a nightmare compared to keeping it in plain old C and building with any compiler of your choice. Just Imagine if any project did this, like, building Firefox now requires an existing Firefox installation.

But this is exactly the same for C! Building C compiler requires an existing C compiler. C compilers are admittedly more widely available at the moment, but that probably won't be the case forever. Languages like zig aim to supplant C. How are they supposed to do that when they're directly dependent on C?


Because a large motivation of developing a new language is wanting to program in that new language. Being forced to keep programming in C can become very frustrating.


Although if you're Jonathan Blow you can stream yourself writing C++ and ranting about how terrible C++ is, while implementing your new Jai language.

[I watched a few hours of Jon doing this, but annoyingly I can't find a recording of Jon actually finding and fixing the subtle Heisenbug he realises during those hours must be in his Jai code - he resolves to investigate "later" and I can't find "later". I think this would be insightful to watch because this is exactly the sort of bug Jon says he doesn't have much trouble with in his C++ and so doesn't need to prevent in Jai... so if it's five minutes to fix that shows Jon is correct, if it's hours of difficult searching or eventually was resolved by just unrelated changes to the code not so much]


It was quite a pleasure for me when the D compiler was converted 100% into D. I immediately set about refactoring it (a process that continues today).

One consequence is that pointer bugs in the compiler have become extremely rare.


One time I needed an AVR programmer for a test stand that didn't suck[1] and didn't backfeed the device under test. So I designed one using an AVR. And programmed it with the crappy one. Then I programmed the next one with the first one.

Feels good.

[1] Atmels cheap USB programmer would lose it's mind if the programming pins aren't all connected[2]. Tech had to reset it manually by unplugging the USB cable. If it happened a couple of times in a row, windows would disable the USB port.

[2] Board house liked to put their inspection stamp on top of the programming port pads.


If Zig is built on a C compiler, and (I assume) the C compiler self-hosts, then if Zig starts to self-host, it doesn't _introduce_ the dependency on a chain of previously built binaries, it just makes it a step more severe while removing a dependency.

Would you prefer if Zig not only didn't self-host, but didn't have any self-hosting dependencies? For your ideal compiler, would a full build from scratch require making a platform-specific assembler from machine code, then a few stages of successively higher-level languages?


That would be a possibility, another one is to cross-compile.


You don’t know if the product you’re building is any good unless you’re using it yourself.


Also known as dogfooding.


Once upon a time C was bootstrapped as well.

A great outcome of bootstrapped compilers is that most contributors can use the language they known instead of two, and better it kills the myth that C is the only game in town for writing compilers.


> it kills the myth that C is the only game in town for writing compilers

A compiler is a pretty pure, high-level operation. Makes no sense to use a low-level language for it.


Agreed, but that is how things go, bad teachers that propagate the myth of UNIX being the genesis, language runtimes that happen to use C mostly of convenience than anything else, and around goes the myth.

Hence why stuff like MaximeVM, Jikes and now Graal are so relevant.


C is a wretchedly bad language for writing compilers.


Indeed, however since that is what many see, plenty of which place the genesis of high level languages on the birth of UNIX, the myth persists.


The fact that constant-folding is hard because the C language doesn't have portable semantics for basic things like integer addition (with overflow) is just braindead.


Yes, there are lots of optimizations that are hard in C and only possible thanks to UB abuse.


I feel like C screwed up cross-compilation so badly that people think it is hard for some reason. Which it obviously isn't. The Virgil compiler just includes all of its backends in builds for all of its hosts. There's nothing special about the host machine.


I understand that C is the same way and that there are benefits for language developers in using their own language, so I can't complain too loudly about languages that do this, but I can attest to the "nightmare" that this creates. I have personally ported java to an operating system that didn't previously have a java compiler. A colleague recently ported rust. Neither had usable cross-compilers. Both require a compiler of their own language to build their own compilers. It's a _giant_ pain in the neck.

I wish that all language devs would provide a way of compiling the build tools with something that nearly everybody already has, like C or C++. Having language support baked into a commonly used compiler like gcc is also nice. I think there are good efforts to get a rust compiler into gcc, and I believe it already works for Go and D and a few others. Such things make adoption outside the standard Windows/Linux world much, much easier.


I believe the long-term goal is to implement a C backend for the compiler (compile Zig code to C). Then compile the Zig compiler (which is written in Zig) to C, so you can compile that on your target system which only has a C compiler. Then you can use that to rebuild the native Zig compiler on that system.


Since February, the mrustc project [0] has been capable of bootstrapping a relatively-recent Rust compiler from C++. (It currently takes another 8 Rust-to-Rust steps to reach the latest stable version.) I've personally tested it on my x86_64-linux-gnu machine, but I don't know how tough it would be to get it to support new targets.

[0] https://github.com/thepowersgang/mrustc


Zig in particular stands out for having strong support for cross-compiling. I wonder how much simpler that makes adding a new platform to a self-hosted language, and what challenges persist?


I actually wouldn't mind porting Zig to our OS; right now it seems it would be pretty easy, and looks like they're going to make the bootstrapping process pretty straight forward in the future as well. I just wish other languages would do the same... Unfortunately for me, we don't have any users that need Zig right now and all my other side-projects are higher priority. I'll probably give it a try eventually though.


In case of Java, wouldn't you only need to bootstrap JVM? Bytecode for the initial compiler can be compiled on another platform and then brought over.


For me, compilers tend to be complex so it's a way to show that it can be done. Also, dogfooding[1].

[1]: https://en.m.wikipedia.org/wiki/Eating_your_own_dog_food


Virgil is completely self-hosted. The git repository just has stable binaries checked in that are updated periodically (15-20 times over the past 12 years or so). It bootstrapped first from an interpreter written in Java, which is long obsolete.

Because of this, Virgil has no dependencies on other compilers or languages, other than for testing (bash and a little C).


Very exciting. Are there plans to maintain the bootstrap path over the longer term? It always makes me sad when I hear about compilers which must bootstrap from magic binary blobs.



I believe the goal is to replace the C++ with a C bootstrap that is initially auto generated from the zig code by zig but manually cleaned up and maintained to match


Starting from auto-generated code isn't considered a proper bootstrap process by the Bootstrappable Builds project. You need to be able to start without any binaries or auto-generated code from the project itself. Usually that would mean starting with a basic implementation in another language, but starting with an older version of the same project that was written in another language and then going through several older versions of different milestones is often easier than reimplementing the language from scratch in another language.

https://bootstrappable.org/


> Starting from auto-generated code isn't considered a proper bootstrap process by the Bootstrappable Builds project.

Is there a reason we should weigh the Bootstrappable Builds project's opinion highly here?

There's nothing on their benefits page that is subverted by an auto-generated compiler that's hand maintained.

Actually, there's no mention of auto-generated code in their best-practices section. Do you have a link to where they even say this?


I am quite sure bootstraping as concept, precedes by several decades that website.


Please read comments before responding


Why?


Probably because they'll start adding features to the Zig version that might not be trivially portable to the C++ version due to architectural differences.


I think because Zig compilers can compile C but not C++.


Zig can compile C++ just fine. Which is no surprise, since it's Clang doing the heavy lifting - Zig provides the convenient CLI.


I hope so, but it seems like even if they didn't maintain it you could build today's stage1 using LLVM 14 and then work from the resulting binary instead of a binary you downloaded from online.


They have a project for maintaining this: https://github.com/ziglang/zig-bootstrap


C++ stage1 compiler is a magic binary blob :)


It's possible to bootstrap GCC starting from only a 357-byte binary seed: https://github.com/fosslinux/live-bootstrap


This work is impressive, but ... that's still a binary blob, and isn't the system it has to be run on kind of one as well?


Yeah, it's a binary blob, but it's small enough to be easily auditable. Anyone with some knowledge of x86 assembly can read the annotated version [1] and verify that it does what it claims (which is to convert ASCII hex with comments into binary).

You're right, it also requires a Linux kernel, and of course, you also have to trust the hardware you're running it on. Still, it reduces the amount of stuff we have to take for granted as trusted, which I think is a good thing. (I'm not involved in the project, just an admirer).

[1]: https://github.com/oriansj/bootstrap-seeds/blob/b09a8b8cbcb6...


New programming languages are so much fun. I think it maybe scratches the same itch as home renovation.

It also makes me nostalgic for the first time I compiled a program (on the c64).


Oh boi, oh boi, oh boi, oh boi, this is a historic moment. I think at the current rate I will start focusing my energy on learning and using Zig when version 0.12.0 is out :)

Awesome work guys. I am so looking forward to this language!


You can use it today!


Is there any IDE support? Any debugger?


VSCode works well. There's a handful plugins which integrate the Zig language server, and any gdb/lldb frontend plugin works for debugging (not sure about the msvc debugger that's used on Windows in the MS C/C++ extension).


I found this IntelliJ IDEA plugin, the list of features looks pretty good already: https://plugins.jetbrains.com/plugin/10560-zig


That one hasn't been updated in a while, but this one is under active development: https://plugins.jetbrains.com/plugin/18062-zig-support


I saw that one as well, but there are hardly any features. I'm mostly looking for "go to declaration/definition", "find usages" and others which turn codebase into basically a browsable hypertext.


Warning: Pure speculation ahead.

Based on this issue [1] I have a feeling JetBrains is keeping their eye on Zig for an official plugin.

Zig + high powered IDE support would be magical.

[1] https://youtrack.jetbrains.com/issue/FL-10463


I use VSCode with the C/C++ extension on Windows, the debugger works great.


There's a Language Server: https://github.com/zigtools/zls


There's a language server. It seems to work pretty well.


any debugger support DWARF should work just fine.

Also imagine it's the same on Windows. Anything that can manage a PDB file should work


Congrats to the team, that's a big milestone!


Congratulations Zig! That's a pretty monumental moment.

Can't wait to see what comes down the pipeline for such a fantastic language!


Congrats to the zig team! This is a big milestone and they must have put an enormous amount of work into making it happen.

At the same time, the reason this milestone seems so significant is because the language is already quite complex. It seems likely to get even more complex over time. Some metrics of complexity that jump out are the amount of compiler source code, compiler compilation time and required resources (cpu/memory). There are other languages that have simpler and more efficient bootstrap compilers.

Ideally, a self hosting compiler contains only the minimum amount of information that is strictly necessary to compile itself. It is possible to write one for a simple, imperative, javascript style language in at most a few thousand lines of code in pretty much any reasonable turing complete language (something like lisp with barely any syntax can be done even more succinctly). Because the simple language is so small, the implementation can also be written in a high level language that targets that same high level language, i.e. the bootstrap compiler could be a transpiler to javascript written in javascript. There is little point in compiling to native assembly during the bootstrap process because the language is so small that a good javascript implementation should be capable of compiling the bootstrap source in under 200ms on a decent laptop. Once the compiler can compile itself, then one can add more features to it, such as a more robust type system, a native backend or automatic memory management without gc. A positive feedback loop emerges because these features are implemented as optimizations to the compiler itself. It gets faster as more features are added to it but it is always fast because it was fast from the beginning. With good benchmarking in place, it should never get slower at compiling itself.

Such a compiler/language would be very minimal by design but a more batteries included language can be built on top of it, much like how an os kernel is extended by user space programs.

A historical example of this kind of self-hosting compiler is Forth. It is easy to bootstrap (see e.g. JonesForth). Derived programs are written by extending the compiler with a new vocabulary specific to the problem at hand. Development is often done in a REPL environment for fast feedback. REPL sessions can be saved as source files once the program works as expected. Forth syntax is unfortunately inscrutable to most and the stack based design is not great for every problem so I wouldn't recommend actually using it, but it contains important ideas that can be adopted into more modern languages.

What the zig team has done strikes me as very, very difficult, so I tip my cap for the effort it must have required. In the long run though, it feels almost inevitable that a simpler language with the more desirable high level properties described above will eat its lunch. As painful as it would be, I believe the best thing the zig team could do to ensure the language's long term survival would be to completely rewrite the language from scratch (zag?) using the knowledge that they've gained during their initial bootstrap process to distill zig to its essence.



Originally it was meant to be https://github.com/ziglang/zig/issues/89#issuecomment-122118..., but somehow last aprt was stripped. But that doesn't matter, thank you so much! New URL is definitely better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: