Hacker News new | past | comments | ask | show | jobs | submit login
Zig self hosted compiler is now capable of building itself (github.com/ziglang)
752 points by marcthe12 on April 16, 2022 | hide | past | favorite | 274 comments

Hello HN! Here is some context to decorate this announcement:

The Zig self-hosted compiler codebase consists of 197,549 lines of code.

There are several different backends, each at varying levels of completion. Here is how many behavior tests are passing:

       LLVM: 1101/1138 (97%)
       WASM:  919/1138 (81%)
          C:  740/1138 (65%)
     x86_64:  725/1138 (64%)
        arm:  490/1138 (43%)
    aarch64:  411/1138 (36%)
As you might guess, the one that this milestone is celebrating is the LLVM backend, which is now able to compile the compiler itself, despite 3% of the behavior tests not yet passing.

The new compiler codebase, which is written in Zig instead of C++, uses significantly less memory, and represents a modest performance improvement. There are 5 more upcoming compiler milestones which will have a more significant impact on compilation speed. I talked about this in detail last weekend at the Zig meetup in Milan, Italy[1].

There are 3 main things needed before we can ship this new compiler to everyone.

1. bug fixes

2. improved compile errors

3. implement the remaining runtime safety checks

If you're looking forward to giving it a spin, subscribe to [this issue](https://github.com/ziglang/zig/issues/89) to be notified when it is landed in master branch.

Edit: The talk recording about upcoming compiler milestones is uploaded now [1]

[1]: https://www.youtube.com/watch?v=AqDdWEiSwMM

Congratulations with the milestone!

Does using Zig over C++ lead to "less memory, and represents a modest performance"? Or was the C++ implementation a bit sloppy? (lacking data oriented design for instance)

Also, what specifically are you most excited using Zig for?

Thanks :)

The new Zig implementation is certainly more well designed than the C++ implementation, for several reasons:

* It's the second implementation of the language

* It did not have to survive as much evolution and language churn

* I leveled up as a programmer over the last 7 years

* The Zig language safety and debugging features make it possible to do things I would never dream of attempting in C++ for fear of footguns. A lot of the data-oriented design stuff, for example, makes use of untagged unions, which are nightmarish to debug in C++ but trivial in Zig. In C++, accessing the wrong union field means you later end up trying to figure out where your memory became corrupted; in Zig it immediately crashes with a stack trace. This is just one example.

* Zig makes certain data structures comfortable and ergonomic such as MultiArrayList. In C++ it's too hard, you end up making performance compromises to keep your sanity.

Generally, I would say that C++ and Zig are in the same performance ballbark, but my (obviously biased) position is that the Zig language guides you away from bad habits where as C++ encourages them (such as smart pointers and reference counting).

As for less memory, I think this is simply a clear win for Zig. No other modern languages compete with the tiny memory footprints of Zig software.

Some of the projects I am exited to use Zig for:

* rewriting Groove Basin (a music player server) in zig and adding more features

* a local multiplayer arcade game that runs bare metal on a raspberry pi

* a Digital Audio Workstation

> * a Digital Audio Workstation


A little bit ago I went to write a little tool that mucked with the FL Studio FLP format. It was easy enough to guess out the bits that I cared about, so I pretty much just did that using a couple quick projects with specific things in them. However, I did check to see if anyone else had mucked around with the FLP format, and couldn’t help but notice your name. Was pretty surprised as a very curious onlooker to the Zig programming language. You certainly seem to get around :)

That’s a little tangential, but I guess I mention it because I was actually wondering if this was ever something you planned on doing given the fact that it was clear you had dabbled with DAW stuff (forgive me for not knowing if you have a more rich connection to music production than just that; I never bothered to check.)

A DAW in Zig sounds like a kick-ass idea. I tried to write a sort-of DAW toy with friends in Rust and it was a lot of fun even if it mostly made me realize how challenging it could be. (And also how bad at math I am. It took me so much effort to feel like I could understand FFTs enough to actually implement them.) It makes me wonder if your Zig DAW would be open source? It would be a fun project to attempt to contribute to, if or when it ever came to fruition.

Exciting stuff. Congrats on the Zig milestone.

> It took me so much effort to feel like I could understand FFTs enough to actually implement them.)

Do you understand how little DSP is involved in writing a DAW? It has almost nothing to do with DSP and everything to do with application architecture, data management, threading and more.

I think you may be reading too much into what I said; I was working on a DAW-like toy, not a full DAW that uses VST plugins, and because I chose to write DSP code, I had to understand it. I don’t know what a real DAW looks like because I am not a subject expert.

That said, Rust seemed pretty promising for writing the audio engine of a DAW due to the memory ownership model. It was relatively easy to come up with a way to architect a very basic lock-free audio thread and feel sure that it was at least memory correct. I have no idea how a real DAW avoids certain pitfalls in the audio thread; lock-free isn’t too hard, but avoiding allocations in all circumstances seems tricky.

Most DAWs do not avoid locks in RT context.

Take a listen to my interview with Justin Frankel of Reaper - somehow in our 3hr+ epic chat, he acknowledges that they don't avoid locks entirely. And Ardour, despite trying, also fails to do so. Anecdotally, from my conversations with other DAW developers, they also do no manage it 100%. As Justin put it, it's more about contention avoidance than lock avoidance.

Avoiding actual stack-based allocation is pretty easy in C++. You just have to want to do it, and remember to do it.

Memory correctness is really easy in the RT threads of a DAW precisely because they do (should) not allocate. You're dealing with pre-allocated memory blocks that do not change over time. It's almost the simplest case of memory mgmt that I know of within a DAW. I have seen several new-to-DAW developers adopting ill-advised schemes for memory mgmt in an RT context. The "list of blocks" is one pattern that I consider something to avoid (and unnecessady). Single-reader/single-writer lock-free (circular) FIFOs get you almost all the way there, for everything.

Interesting. I actually thought this was the case, but when I made a similar argument in a chatroom I got pretty severely roasted by people who were sure real audio engines never used locks in the audio thread. It seems like whichever way I go I’m wrong with audio :P but that’s OK, because I really am not pretending to be knowledgable here, for me it was only ever for fun.

The main reason I felt Rust was nice was simply the re-assurance that I could not, even if I wanted to, accidentally cause a data race within the confines of safe Rust. It’s not that the actual code was hard, but it did force me to ensure that what I was doing with the data model was actually safe. It wound up guiding how data flow worked in the playback engine.

Anyway, thanks for the pointer to the interview. I don’t expect myself to be writing the next Reaper or Ardour so it is probably OK if I don’t quite grok what you’re getting at. But, I do find this stuff rather interesting, so I would love to have a listen when I get a chance.

It's a fun exercise to implement FFT and other DSP algorithms, but for real-world software please stick to true and tested libraries like FFTW or FFTPACK instead of your own naïve implementation.

I feel like there’s some kind of PTSD in the audio field. I was only talking about a fun toy project among friends and I already have two replies critiquing it presumably by subject matter experts. Just an observation, I’m not really that sore over it. But still, even with cryptography, where it is well-known you never roll your own code (and yet people do anyways) I would not expect people to get all flustered if I had said “toy EC-SRP implementation” because the word “toy” is meant to signify “I know that’s not what you would do in the real world.

Thanks for the detailed answer! I have more questions, if you'll indulge me:

If I understood it correctly, you think smart pointers and reference counting are bad habits. Why? Especially the smart pointers bit.

Why does Zig use less memory than other languages? Is it inherent to Zig, or can it be reproduced in other languages?

I think Zig prefers explicit memory management, because allocations may fail and should be handled explicitly, and because automatic deallocations lead to hard-to-predict lifetimes (excess memory usage, and bugs for resource handles that are destructed at hard to predict moments).

These are things that a "systems language" programmer should put in the work to do correctly/near-optimally, and not ask the compiler to just do something "good enough", like Python would.

> because automatic deallocations lead to hard-to-predict lifetimes (excess memory usage, and bugs for resource handles that are destructed at hard to predict moments).

I don't really feel this is the case in Rust.

But it is. If you have a String on the stack, it's memory is only reclaimed at the end of the scope, while it often could be free'd before. This is especially bad in async code around await point since it means the memory need to be kept alive more than needed.

I disagree. It is substantially and unequivocally better to hold onto memory until the end of scope than to leak memory by default. How can anyone argue that leaking memory is a better default? That’s a ticking time bomb. Maybe someone is so optimistic as to believe they’ll catch every leak before shipping new code?

You can easily add a manual “drop” call in Rust at any point if you want to force an allocation to be freed sooner, but I speak from years of experience using Rust for work when I say that Rust’s RAII model is not problematic in practice. I’m not simply speculating or theorizing, and I have professional experience with a variety of languages at all levels of the stack. I personally don’t mind garbage collectors most of the time, but Rust is great when you need more control.

In C++, RAII can absolutely be problematic because you are able to easily do things that cause undefined behavior by accident, which is arguably worse than either leaking by default or the mere act of holding onto memory for the duration of a scope.

If you can propose a system which cannot be contrived to have any downside, that would be fantastic! In the real world, Rust’s approach to memory management is extremely pragmatic and beneficial. I’m sure someone will eventually improve on Rust’s approach, but “leak by default” isn’t it.

I honestly do enjoy following Zig… it is a fascinating language taking a really interesting approach to solving many problems, but its memory safety story is not where I want it to be yet. Leaking memory by default is technically safe, but it's not exactly endearing.

I’ve recently been writing personal code that leaks like a sieve. It’s just not worth my time to find every leak when the lifetime of the process is finite and short and it will only ever run on a machine with gigs of memory. I haven’t thought through your question enough but maybe a situation where memory usage would be super high if waiting until the end of scope? I’m probably trying to hard to come up with a situation but I have a gut feeling that freeing mid scope is important under certain circumstances to keep the code simple and understandable.

> I’m probably trying to hard to come up with a situation but I have a gut feeling that freeing mid scope is important under certain circumstances to keep the code simple and understandable.

I explained in my previous comment that you can explicitly "drop" any value at any time in Rust, if you choose.[0] But if you don't, it will still be dropped at the end of the scope. The developer has control, but the language will watch your back.

[0]: https://doc.rust-lang.org/std/mem/fn.drop.html

Heck, this is the entire memory management model of PHP, which I found shocking when I learned it, but makes sense given that the language is intended to generate web pages: just allocate, never reclaim the memory, then kill the process when you’re done.

Do you have any sources? Maybe some truly ancient version? PHP has done garbage collection (seemingly mixed with some reference counting, like Python) for at least 10 or 15 years... I didn't bother to keep searching for even older information, but nothing I saw indicated that PHP only released memory when the process for a request exited.

I don't even think most mainstream uses of PHP have done the process-per-request model for decades, but I could be wrong.

Correct, garbage collector was added to PHP 5.3, prior to 5.3 PHP did only reference counting. PHP 5.3 was released in June of 2009.

Here is a series of articles that describes it in detail


Especially look at this example where the same script is executed with and without GC


Of course you can call drop() manually, but almost nobody does or even think about it because that's not the way you program in a language with RAII.

Don't get me wrong. I do think that rust and c++ RAII is much more convenient and safe than the C or Zig way.

(I'd even prefer if you could annotate given struct in rust so the compiler could drop them as soon as it's no longer used, but that s not that simple)

I definitely wish that Non-Lexical Lifetimes would eagerly drop anything that doesn't implement Drop.

It would probably be a breaking change to automatically call an explicit Drop implementation anywhere other than the end of the current scope, so I think that would have to be left as-is. String doesn't implement Drop, so it could easily be dropped eagerly within the scope as soon as it won't be referenced again. Such a change would be roughly equivalent to any of the compiler optimizations that reorder statements in ways that should be unobservable.

100%. I would actually go further and say every value should be dropped immediately after its last use, including temporaries in the middle of statements, whether or not it implements Drop. Reuse the same rules that NLL uses. Breaking change yes, so do it next edition. It would lower memory usage in general and solve so many little pain points, not least of which is holding strings across await points.

    fn foo(xx : &Mutex<String>) {
        let lock = xx.lock();
        let ptr = &mut *lock as *mut String;
        unsafe { use_from_c(ptr) }
You get the idea: you don't want the lock or the string to be dropped before the unsafe code, even if the actual string is no longer used. That's the breaking change. It's hard to detect automatically, so hard to justify even in an edition.

I generally agree with you (which is why I made an exception for values with a manual Drop implementation), but to advocate on behalf of the idea… I don’t think it would be too crazy to make a rule that any function that invokes “unsafe” (or is itself defined as an unsafe function) would fall back to the old scope-based drop rules. Worst case scenario is that you’d be dealing with the current level of memory usage efficiency, and you might need to manually call drop if you want something dropped earlier, but in most cases things would be eagerly dropped even sooner.

Outside of uses of unsafe, are there any other serious problems with eagerly dropping values that manually implement Drop? Maybe this fallback mechanism would also have to be invoked simply for casting to a raw pointer anywhere in the function. Either way, a list of exceptions to eager drop would arguably be better than not having eager drop, as long as the list was sound.

It would still be a breaking change, and would definitely require at least being restricted to a new edition. Some people currently use Drop as a kind of scoped “defer”, especially for things like instrumentation, so maybe it would be time for the language to introduce a proper “defer” statement that exists for that purpose instead of making Drop so lazy for everyone.

I still don't get why one should do that in the first place

This doesn't match the experience of GC'd languages. I've never heard of problems in practice that arose from treating all values present on the stack/registers as GC roots, even if those values are dead in the data flow sense.

Stacks are a fixed size. Unless you are running out of stack space there is no point in “freeing” memory from the stack. Where in the stack the stack pointer points has no bearing on stack size for any OS I’m aware of.

I’m assuming by string you mean a stack allocated array of char and not a std::string.

I meant a std::string on the stack, which is just a handle for more memory on the heap

You can just put a single pair of bracket around it and that’s it.

For locks it is seriously the best way to manage them.

And as we know from C, every developer is quite capable of taking care of use after free possible bugs.

The point of C and Zig is that you can solve any kind of problems with them. In particular there is much more to memory reclamation than garbage collectors or Rust's borrow checker. Some of the reclamation schemes offer wait-freedom, or efficiently support linearizable operations.

Any kind of problem?

So how do you solve SIMD with C, without language extensions?

C is not a special snowflake.

SIMD is not a problem, it's hardware. Compilers tend to take advantage of the CPU-specific SIMD registers and instructions, or you can use them explicitly.

Writing generic SIMD code is more portable. C has libraries, and Zig also has vector primitives available through built-in functions.

Any language can have libraries, C is not special here, nor is Zig.

In fact, a few GC enabled languages have explicit SIMD support.

Are there languages that "solve" SIMD?

What can C and Zig do that Rust can't?

Nothing if unsafe Rust is considered, though Zig does not require you to fight the language nearly as much. This is especially obvious with embedded. Zig's philosophy of no hidden control flow or allocation makes things simple. Simplicity is power.

For certain problems you would want a TLA+ specification for safety and especially liveness either way. It's not like Rust absolutely guarantees correctness in all cases.

Rust sits in a sweet spot between C/Zig and languages like Java, but it's not an appropriate replacement for either of them.

Certainly Rust sits where C++ is. Zig and Rust are definitely best used for different things, but I think sweating rust off as in between C and Java is inaccurate.

This is kind of a trick question because they are all Turing Complete, but so is Assembly. The best way to interpret your question, then, is not “what is possible” but rather “what is simple to express correctly” and “what is simple to get correct eventually”. Those are questions about how tricky it is to get something to compile in the language and which tools exist to help determine whether a compiled program does what’s expected and does not do what’s not expected.

It wasn't a trick question. I'm not asking it in the same tone as someone would ask 'What can C do that Java can't?'.

I believe that Rust can be as low level as you want. If you have to fight the language to accomplish something, it's because you're probably trying to do something unsafe, and Rust will let you do that if you pay the price.

I was looking for examples of real cases where Zig is a better option than Rust. And while Zig is a lovely language, its main points against Rust are that it lets you do unsafe things more easily, and it's easier to learn.

The general purpose allocator in the zig standard library has protections against use after free bugs.

By quarantining all memory forever. This is not a scalable solution because keeping one allocation in a 4kB page alive will leak the whole rest of the page. And if you don't quarantine all memory forever then use after free comes back. If it were that easy to solve UAF then C++ would have solved it by now.

There is a scalable solution for UAF that doesn't involve introducing a lifetime/region system: garbage collection. Of course, that comes with its own set of tradeoffs.

The problem of a single allocation keeping a 4 KB page alive is something that you might commonly find with the C++ or Rust way of programming that encourages allocating many individual objects on the heap, but in Zig land this is pretty rare. In fact, compare a Zig implementation of a given application with an equivalent in any other language and you will find there is no contest with respect to the size of the memory footprint.

There are many cool possibilities that C++ has never explored and frankly I find your argument unimaginative.

An allocator that can leak (4kB - epsilon) of memory for each allocation is broken. It's not a question of whether the language makes such allocations unusual: free() not actually allowing memory to be freed is a violation of the contract of a memory allocator.

From the man page (which I believe quotes ISO C): "The free() function shall cause the space pointed to by ptr to be deallocated; that is, made available for further allocation." If free doesn't do that because another allocation is pointing into that page, that's a violation of the contract. Standard quarantine doesn't violate the spec because the space will eventually become available no matter what, but perpetual quarantine does.

One way to fix the problem would be to round up all allocations to 4kB, but that's very wasteful and slow due to cache and page table traffic; you'd be better off from a performance point of view with a garbage collector. More promising is ARM MTE, though the tag is currently too small to really be called a solution to UAF as opposed to a mitigation. A 64-bit tag would be enough, but I'm not sure what the performance costs of that would be--I wouldn't be surprised if the increased memory traffic makes it slower than a GC.

> way of programming that encourages *allocating many individual objects* on the heap

Quoting and bolding. I think this is the thing that I need to change most about my own style of programming.

Just like C and C++ debugging allocators, so what is the improvement here?

The lifetime of the memory a smart pointer manages is very predictable. It will have the same lifetime as the smart pointer itself (this is ignoring moves).

I think what he's saying is there is a way to carelessly use smart pointers in rust.

pub enum List { Empty, Elem(i32, Box<List>), }

instead of :

pub struct List { head: Link, }

enum Link { Empty, Some(Box<Node>), }

struct Node { elem: i32, next: Link, }

Can you explain the difference between these approaches? Is it just that the first example allocates an extra u16 (tag of the tagged union) (ignoring any overhead)?

Notice that if I make this alleged "List" with a single data item in it, my data lives in the List object I just made (probably on the stack), but an empty List gets allocated on the heap.

I thought Aria's "Entirely Too Many Lists" tutorial actually tries to build this, but it actually doesn't, she draws you the resulting "list" and then is like, OK, that's clearly not what we want, let's build an actual (bad, slow, wasteful) linked list as our first attempt.

I don’t really get your example, or how does it have to do anything with rust? It’s just a bad linked list, isn’t it?

Are you talking about the example by RustyConsul? Because I'm not RustyConsul so I can't tell you what specifically RustyConsul intended with this example. Yes this is a bad data structure in Rust, or C, or presumably Zig. In the context of this thread I'm sure Zig's proponents will say that you'd never make this mistake in Zig.

My original point was referencing the use of smart pointers. They are indeed very smart, but can be used stupidly as a bandaid kind of like .clone().

The difference really lies in the fact that i now have some data stored on the stack (The element) and some data stored on the heap because it's recursive. Was just a random example of where it's poor practice. As others have noted, linkedlists are a terrible data structure to begin with.

Great, thanks for those details! I primarily develop using C++ and avoiding pitfalls (smart pointers, exceptions, unintended memory allocations) takes a lot of effort.

I enjoy synthesizers (including Eurorack) and looking forward playing with a Zig DAW!

If the union is untagged, how can it be determined (at runtime) that you've accessed the wrong field?

The compiler adds a tag in debug modes, but not in release modes.

Neat, that's somewhere in between a `union` and a `std::variant`. You could build your own, but it's cool that it's a first class member of the language.

Can you clarify about the union part? Did you meant that zig allowed you replacing what would be untagged union in C++ with tagged union in zig? Or does the zig compiler has some kind of debug sanitizer mode which automatically turns untagged unions into tagged unions with checks?

In the past they've talked about the speed and memory improvements they've gotten from using MultiArrayList in the compiler, storing tags separately from the unions themselves. If you have a union with a size of 16 bytes and you add a tag to that which is 1 byte, a lot of space is wasted due to padding. If you keep the tags in a separate array, both arrays are individually densely packed. Less memory wasted due to padding == less memory use overall and better utilization of cache.

But in terms of the implementation, this means working with untagged unions, because the tags are maintained externally.

In debug builds, untagged unions become tagged behind the scenes as you described.

Thanks for all the context. Curious to know more about the concept/design of the DAW.


> * a Digital Audio Workstation


Maybe you can put together a better, faster team (like Presonus managed to do for Studio One), but most current DAWs have been in existence for 20 years or more. Catching up with that is a challenge, and if you don't catch up then it's an interesting toy or half-a-DAW.

So why?

ps. obviously I am biased.

This argument applies equally to programming languages, and Zig seemed to go OK?

Oh yes, it absolutely applies to all programming languages.

Impressive work getting Zig self-hosting. However:

> As for less memory, I think this is simply a clear win for Zig. No other modern languages compete with the tiny memory footprints of Zig software.

Is not true at all. There are several other modern languages that compete with Zig for small memory footprints.

Would you tell about those languages or they are left as an exercise to readers?

Sort of left it to the readers and hopefully encourage people to investigate for themselves. It's far too easy to get into "this benchmark vs this benchmark", etc. Memory usage is overall a complicated topic, which I think Andrew's comment doesn't do justice to. Granted the Zig team has done some impressive work, much of the memory usage & bloat in the C++ world comes from real world programs and libraries and the inevitable drift in program architecture over time.

However, I could believe Zig's stdlib and culture encourages low memory footprints, but there isn't anything novel in the language that makes it inherently lower memory footprint. Though to name a few languages I'd say that can be directly comparable are Rust, D (esp. BetterC mode), and Nim. Even Julia & Go in the right context. Though honestly I often prefer wasting a few hundred bytes of RAM here and there, even on a microcontroller, for pure convenience.

edit: forgot Odin. Another comment mentioned it, though I've never used it. It looks like it's used in production.

Zig has a few extra tricks up their sleeves compared to most languages. The ones that come to mind are

* MultiArrayList

* arbitrary size-integers that make it easy to pack data very tightly

* easy access to several special purpose allocators which reduce the need for strategies like reference counting

Perhaps, though nothing there that's not been available for a while. To be fair Zig has integrated some of these features well.

* MultiArrayList is convenient and easy to do in Rust, D, Nim. Odin seems built around it. * Arbitrary bit-size integers seem more gimmicky than anything and similar to struct bitfields in other languages (except Rust oddly). Most compilers don't even pack uint8/uint16's for performance reasons. * Special purpose allocators are a bit more interesting. Still they don't provide actual memory safety AFAICT and can have fragmentation and performance penalties.

It'll be interesting to see larger Zig code bases emerge in different fields and see how the memory footprint compares in practice.

What? 8- and 16-bit are never 32-bit aligned nor sized.

To clarify I meant comparing how int8/int16's are packed in structs vs struct bitfields. Can't recall about the stack rules. Here's more discussion:



Also ARM for example doesn't have 8/16 bit registers so int8 or int16 will use a 32bit register:


Curiosity got to me, perhaps Zig had improved significantly. So I compared the first benchmark I found (kostya/benchmarks/bf) with Zig with Nim. For the smaller input (bench.b) Zig did run with ~22% less RAM (about 20kB less).

However, for the larger input (mandel.b) Nim+ARC used ~33% less RAM in safe mode: Nim 2.163mb -d:release; Zig 2.884mb -O ReleaseSafe; Zig 2.687mb -O ReleaseFast. The Nim requires 0.5mb less ram and the code is ~40% shorter. I don't have time to try out the Rust or Go versions though.

edit: grammar

Are the two using the exact same algorithm?

Yes, as far as I can tell [1]. It's a simple Tape algorithm. Neither had any crazy SIMD, threads, no custom containers, etc. They use almost the same function names (and look the same as the Go version too). I used `time -l` https://stackoverflow.com/a/30124747 for memory usage.

Note I ran the benchmarks locally (MacBook Air M1) because the reported benchmark uses the older (default) Nim GC while I only use Nim+ARC. I also had to fix the Zig code and it took a few tries to get the signed/unsigned int conversions working. I tried tweaking flags for both a bit as well to see how stable they were. Zig's memory usage was pretty constant. Nim had a small speed vs memory tradeoff that could be tweak-able, but the defaults used the least memory.

Overall I'd expect exact memory usage by language(s) to vary some by benchmark and one random benchmark isn't conclusive. Still I didn't find anything to indicate Zig is clearly better than other new generation languages. Manual memory management might actually be worse than letting a compiler manage it in some cases.

1: https://github.com/kostya/benchmarks/tree/master/brainfuck

In that category (of C/C++ alternatives) is also Vlang (https://vlang.io/). It would be closer to Odin, than Julia, Go, or Nim. Although Vlang and Odin have a strong Go influence.

Odin doesn't work on 32-bit ARM, which was a disappointment to me.

Bummer, looks like an intriguing language.

> written in Zig instead of C++, uses significantly less memory, and represents a modest performance improvement

That's particularly interesting considering the rust compiler in rust has never been as fast as the original OCaml one

Huh? That's not true at all. It took over 30 minutes to compile the self-hosted Rust compiler with the OCaml compiler, when rustc was far smaller than it is today. rustboot was agonizingly slow, and one of the main reasons why I was so anxious to switch to rustc back in those days was compilation speed.

I was there and had to suffer through this more than virtually anyone else :)

My source: https://pingcap.com/blog/rust-compilation-model-calamity#boo...

> 7 femto-bunnies - rustboot building Rust prior to being retired

> 49 kilo-hamsters - rustc building Rust immediately after rustboot's retirement

> 188 giga-sloths - rustc building Rust in 2020

Well, OCaml Rust compiler also didn't use LLVM and used its own lightweight code generator and I think self-hosted Rust compiler frontend was in fact faster than OCaml Rust compiler frontend.

With both projects, how much of the improvement is simply building for the second time?

For Rust, I think improvement was almost entirely due to LLVM producing faster code. That's not applicable to Zig case, since both old and new compiler use LLVM. I don't know enough about Zig to answer.

The original OCaml compiler didn't have essentially any of the static analysis that Rust would eventually be known for. Rust in 2011 (when rustc bootstrapped) was dramatically different from what would later stabilize in 2015.

I wonder how much this statement still holds. I've never used the OCaml bootstrap compiler but performance wise, the rust compiler has improved incredibly since the 1.0 release.

An apple to apple comparison is impossible because rustboot compiled a very different language. But I suspect suitably updated rustboot would be still faster because compilation time is dominated by LLVM.

Rustboot's code generator was generally slower than LLVM. I think in some small test cases it might have been faster, but when implementing stuff like structure copies rustboot's codegen was horrendously slow because it would overload the register allocator.

> uses significantly less memory, and represents a modest performance improvement.

The reduced memory has significant value. Being able to do the same build on less expensive hardware or do more with the same hardware is a significant financial performance improvement

yup 18x memory reduction improvement 8.5GB -> 0.5GB according to the vid..

Happy to see the C backend coming along. LLVM is a major barrier to use on esoteric embedded devices.

You can also target C through Wasm.


My genuine question is what sort of code-size and/or performance impact the translation imposes.

The simple example in the README.md seems straightforward enough, but I wonder if there are any pathological explosions in practice.

That would be an interesting undergraduate paper. Perf, Size, by primary language, by toolchain, linker, post processor (dead code elimination, etc).

For a pathological explosion, you mean something like a Zip Bomb? Wasm and C are pretty close together in their semantics, Wasm hides the stack and prevents jumping into the middle of a function (CFI, Control Flow Integrity). I think the code bloat should be on the order of some multiple of the smallest interpreter.

I just did a quick scan of Wasm interpreters (3 in Rust, 1 in C)

    yblein/rust-wasm ~4kloc
    rhysd/wain ~16kloc
    paritytech/wasmi ~25kloc
    wasm3/wasm3 ~22kloc (C)
My hunch is that the expanded code would be approximately (2x-5x interpreter + bin.wasm). I just did a spot check with doom.wasm, I am wrong. The resulting expanded C code when compiled to Arm is 2x the wasm binary size.

    4.1M wasidoom.o
    1.8M wasidoom.wasm


* What kind of tests are these "behavior test"?

* Is that a list of compilation targets?

* If not all behavior tests pass, does that not mean that the compiler fails to compile programs correctly?

Please indulge those of us who are not familiar with self-hosting compiler engineering.

> What kind of tests are these "behavior test"?

Snippets of zig code that use language features and then make sure those features did the right thing. You can find them here: https://github.com/ziglang/zig/tree/master/test/behavior

> Is that a list of compilation targets?

Mostly. Pedantically, it's a list of code generation backends, each of which may have multiple compilation targets. So for example the LLVM backend can target many architectures. The ones that are architecture specific are currently debug-only and cannot do optimization.

> If not all behavior tests pass, does that not mean that the compiler fails to compile programs correctly?

Some tests are not passing because they cause an incorrect compile error, others compile but have incorrect behavior (miscompilation). Don't use Zig in production yet ;)

(edit: fix formatting)

To add to what Spex said: also some of those tests check language features that the compiler code doesn't exercise, like async/await. This means that the compiler is able to build itself, but is not able to build every possible valid Zig program. We're getting there though :^)

it's so surprising to hear there was a Zig Meetup in Milan, I'd not expect a large enough community to exist there, pretty cool!

Probably a significant chunk of the larger European community was represented there too - don't forget that traveling to EU countries is relatively easy for EU citizens


I think Zig’s compatibility with C is such a valuable feature.

I also wish we could rewrite everything in a modern language, but the reality is that we can’t and that if we could, it would take a LONG time. The ability to start new projects, or extend existing ones, with a modern and more ergonomic language—Zig—and be able to seamlessly work with C is incredible.

I look forward to the self-hosted compiler being complete, and hopefully a package manager in the future. I’d really like to start using Zig for more projects.

Zig as the “Kotlin of C” makes it very appealing. Kotlin has seen fantastic adoption in JVM projects because you can convert files one at a time from .java to .kt, with only a modicum of one-time build system shenanigans up front. Then your team can gain experience gradually, fill in missing pieces like linters over time, all without redesigning your software.

What zig offers is even better - because zig included a CC, you can actually reduce complexity with zig by getting a single compiler version for all platforms, rather than a fixed zig + each platform’s cc. And with it, trivially easy cross-platform builds - even with existing C code. That’s cool! Go has excellent cross-compilation, but Go with C, not so much.

Rust is a powerful tool, but it’s a complex ecosystem unto itself - both conceptually and practically. It has great interoperability frameworks, but the whole set of systems comes at a substantial learning cost. Plus, porting an existing software design to Rust can be a challenge. It’s more like the “scala of C” if we’re trying to stretch the analogy past the breaking point.

Kotlin has seen fantastic adoption in Android projects, because of the way Google pushes it, while stagnating Android Java on purpose.

On the JVM world not really.


Are they pushing? Most documentation for Android dev is still Java. Or by default Java. It's only the Intelij guys pushing for Kotlin by creating a lockin in their IDE. One reason I refuse to use it.

You have missed all the Kotlin only Jetpack libraries, NDK documentation now using Kotlin, Jetpack libraries originally released in Java now rewritten in Kotlin.

They are still using Android Java on the system layers, because they aren't rewriting the whole stack.

Even the update for Java 11 LTS subset on Android 13 is mostly likely caused by the Java ecosystem moving forward, than the willingness of Android team to support anything other than what can be desugared into Java 8 / DEX somehow.

You are right in relation to JetBrains, they even acknowledge it on their blog post introducing Kotlin.


"And while the development tools for Kotlin itself are going to be free and open-source, the support for the enterprise development frameworks and tools will remain part of IntelliJ IDEA Ultimate, the commercial version of the IDE. And of course the framework support will be fully integrated with Kotlin."

> You have missed all the Kotlin only Jetpack libraries, NDK documentation now using Kotlin, Jetpack libraries originally released in Java now rewritten in Kotlin.

Also, Oracle's lawsuit against Google for copying Java APIs.

This doesn't compute, because Kotlin is heavily dependent on the Java ecosystem, regardless how Google screw up Sun and the Java community with their Android Java.

I still see it as a passive aggressive move for Google to get back at Oracle. They can't back away from Java API completely but they can hurt Oracle by discrediting Java language.

As an Android Dev: noone of us wants to do java projects anymore. If we had support for some recent versions maybe, but as it is, there's no going back.

0 of our recent or current projects still use java.

Google is either moving/extending libs to natively integrate with kotlin (numerous -ktx libs) or they are kotlin-only (Compose) anyway.

I don't really see the Jetbrains lock-in thing, because: Android Studio is free, you can use any other IDE with syntax highlighting and the terminal to run tests & to compile.

If you want to blame someone for locking in android devs into Android Studio, it would be Google, because they build the previews into Android Studio afaik. But you would have the same criticism at Apple/XCode. Supporting one IDE is already tough I guess.

As someone not familiar with android development, this is somewhat confusing because I always thought Google was pushing dart/flutter for mobile these days?

Or is it both at once?

Dart/Flutter is their React Native competitor. It’s in the same space as Xamarin - A secondary language and UI toolkit who’s selling point is rapid development for multiple platforms.

Kotlin is to Java (on Android) as Swift is to Objective-C (on Apple) - the successor primary platform language.

Thanks, that's a solid summary.

Feels like an environment that moves so quickly (to someone like me anyway). Can barely keep up.

I think it is better to compare Flutter with Apache Cordova / Ionic Capacitor, as Flutter actually has a DOM tree behind the scenes, but instead of using a Web View to render the application Flutter renders the app directly to a GPU Accelerated surface using Skia.

From talking to a Googler the reason why Google devotes resources to Kotlin is that there was and still is a large external demand from the Android development community for Kotlin.

Kotlin adoption was triggered from inside, with some anti-Java attitude.


My information was from someone from the team working on jetpack compose. So maybe the answer I was given comes from a different context.

Not sure where you got that impression from, the Android team has never pushed, let alone mentioned, Dart/Flutter, ever. All the Flutter advertising you hear is from the Flutter team.

Kotlin and Java are the main languages supported on Android.

Huh. I read about this the other day as well - https://news.ycombinator.com/item?id=30842602:

> If only [Oracle] hadn't sued Google, Java would still have been the pre-eminent language for Android development. Sadly Android is stuck at legacy Java 8 permanently now. So, modern Java is stuck as a server-side language with dozens of competitors.

A reply argued that Android is on Java 11 now, and then you noted (hi!) that it's "a subset". Huh.

I'm trying to get a handle on understanding the ramifications of the legal/licensing situation, and the actual concrete impact on Java's use in Android. The subject seems somewhat murky and opaque. Is there possibly a high-level disambiguation about what's going on published anywhere?

The whole Oracle google lawsuit had nothing to do with modern Android’s use of Java — java has been open-sourced since and for a long time now (by oracle themselves).

The lawsuit was about Sun’s license that explicitly demanded a purchase for use on mobile devices for their Java programming language (as that was the area they wanted to get money from). Google instead copied most of the APIs and called it a day, and Oracle bought Sun and went after the lawsuit.

But since the license changed in the meantime so that OpenJDK is completely open-source and has the same license as the Linux kernel, it was all about an older state of things.

The set of Java 11 LTS features and standard library missing from Android 13 is left as exercise for the reader, they are relatively easy to find out, hence subset.

You can check the Javadoc for the standard library, and the JVM specification. Then compare with DEX/ART and the Android documentation for Java APIs.

That poll pours a cold bucket of water on “fantastic adoption” among all the respondents, but compare adoption of Kotlin and Java releases after Kotlin’s release. 1 in 6 respondents using language versions after 2016 are using Kotlin. I don’t think that’s too shabby.

Java is still Java, regardless of the version, otherwise we should add Kotlin versions to the discussion as well.

The "Kotlin of C", what a beautiful metaphor!

> I also wish we could rewrite everything in a modern language, but the reality is that we can’t and that if we could, it would take a LONG time. The ability to start new projects, or extend existing ones, with a modern and more ergonomic language—Zig—and be able to seamlessly work with C is incredible.

That's the Maintain it With Zig approach :^)


Sounds compelling. Is there a list of projects following this advice anywhere?

Zig is not the only one. Other newer languages like Vlang, Odin, Rust, Nim... offer strong C interop.

Can’t speak about the others, but Rust’s C interop is nothing like Zig’s, not to mention that Zig can also compile C.

For reference:

Tracking issue for overall progress on the self-hosted compiler: https://github.com/ziglang/zig/issues/89

Zig's New Relationship with LLVM: https://kristoff.it/blog/zig-new-relationship-llvm/

Are there any languages out there that can only be compiled by a compiler written in their own language? Presumably because the original pre-dogfood compiler stopped being maintained years ago. So that if we somehow lost all binaries of the current compiler, that language would effectively be lost?

Most modern languages cannot be compiled anymore with their original pre-dogfood compiler, but we have the sources of the older versions so you can bootstrap them in a sequence.

The Haskell GHC compiler is written mostly in Haskell. Rust's compiler is also written in Rust, these days. TypeScript's compiler is in TypeScript.

It's a pretty common state of affairs, actually. Often arises out of the second or third implementation of the compiler being much better than the first attempts, probably coupled with the momentum of people using the language who can contribute to the tooling because it's in the same language.

If your language (call it X) can only be compiled by a compiler written in X, then you can always create an X-to-C transpiler (it doesn't need to be efficient, and it can even leak memory, as long as it can complete the bootstrapping process).

GHC is nearly impossible to bootstrap if you don't consider the vendored transpiled C code to be source. Versions of GHC not dependent on GHC were never public AFAIK.



A lot of popular C compilers are actually written (at least partially) in C++ these days.

In this case, C++?


The Guix project likes bootstrappility very much. They basically host a tiny assembly C-compiler (only for a subset of C) which can compile a C compiler written in C for the whole subset that can bootstrap the whole ecosystem.

This link is better: https://savannah.nongnu.org/projects/stage0: bits to asm to C, and then everything else follows.

You're commenting on an article about Zig which became self-hosted and can compile C. (There's also lots of other C compilers available)

Zig compiles C/C++ by deferring the vast majority of the work to libclang, which is written in C++. Also note Zig is self-hosted when using the LLVM backend, which means deferring to C++ for much of the code generation. There is no "end-to-end Zig" self-hosted compiler yet, because the Zig native backends are not as near completion. See the creator's comment about the breakdown: https://news.ycombinator.com/item?id=31052234. (I'm excited about this progress, so this is not meant as any kind of knock on Zig, which I think is quite impressive)

But you're right that C is not a good example.


I was thinking of learning Rust but it seems a bit overkill due to manual memory management as compared to languages with similar speed like Nim, Zig, and Crystal. How would one compare these languages?

Is it worth learning Rust or Zig and dealing with the borrow checker or manual memory management in general, or are GC languages like Nim or Crystal good enough? I'm not doing any embedded programming by the way, just interested in command line apps and their speed as compared to, say, TypeScript, which is what I usually write command line apps in.

It's funny i came to Rust from Go, Python, NodeJS, etc after a combined .. 15 years or so. I've been using Rust full time (work & home) for ~2 years now.

Obviously i'm biased, but i quite enjoy it. I find i am more efficient now than before, because it manages to give me the ease of the "easier" languages quite often with a rich set of tooling when i need to go deeper.

Personally i feel the concern over the borrow checker is way overblown. 95% of the time the borrow checker is very simple. The common worst case i experience is "oh, i'm borrowing all of `foo` when i really want `foo.bar` - which is quite easily solved by borrowing just the `.bar` in a higher scope.

The lifetime jazz is rarely even worth using when compared to "easier" languages. Throw a reference count around the data and you get similar behavior to that of them and you never had to worry about a single lifetime. Same goes for clones/etc.

I often advocate this. For beginners to use the language like it was a GC'd language. A clone or reference count is often not a big deal and can significantly simplify your life. Then, when you're deeper into the language you can test the waters of lifetimes.

So yea. Your mileage will vary i'm sure. But i find Rust to be closer to GC'd languages than actual manual languages, in UX at least. You won't screw up and leak or introduce undefined behavior, which is quite a big UX concern imo.

The perspective of someone who is learning Rust (but not professionally) during the last few months. :

- The borrow checker is one of the easier parts of Rust to grok, it's just as you say, not that complicated in the end.

- Traits are more annoying to understand and find in source code when they can get added from anywhere, and suddenly you code gets extra functionality, or it's missing the right one unless you import the right crate but there is no "back-reference" so you're not clear what crate the code actually comes from.

- Crates/libraries are harder to grok with their mod.rs/lib.rs files and whatnot, in order to structure your application over many files.

- Macros are truly horrible in Rust, both to write and debug, but then my only experience with macros are with Clojure, where writing macros is basically just writing normal code and works basically the same way

- Compilation times when you start using it for larger projects tend to be kind of awful. Some crates makes this even worse (like tauri, bevy) and you need to be careful on what you add as some can have a dramatic impact on compilation speed

- The async ecosystem is not as mature as it seems on first look. I'm really looking forward to it all stabilizing in the future. Some libraries support only sync code, others only async but only via tokio, others using other async libraries. Read somewhere that eventually it'll be possible to interop between them, time will tell.

- Control flow with Option/Result and everything that comes with it is surprisingly nice and something I'm replicating when using other languages (mainly a Clojure developer by day) now.

My development journey was PHP -> JavaScript -> Ruby -> Golang -> Clojure with doing them all the capacity of backend/frontend/infrastructure/everything in-between, freelancing/consulting/working full-time at long term companies, so take my perspective with that in mind. Rust would be my first "low-level" language that I've used in larger capacity.

`rust-analyzer` lets you find the trait/impl that provides a method, if you don't have it in your IDE you should get it.

True, that does help when you want to dig into things. I think my main pain-point here is that without extra tooling, the reference is not surfaced. Compare this to Clojure, where every reference is explicit, things normally don't get "magically" created in your scope. Just by searching for a var in the current file upwards (up to the `ns` declaration), you can find out where code is coming from, which is not always possible in Rust.

It takes you to the trait definition but not the specific implementation. There is an open issue looks like a hard problem to solve, but intellij does it so must be doable.

> Personally i feel the concern over the borrow checker is way overblown. 95% of the time the borrow checker is very simple.

I have been using Rust professionally as well and had a different experience. For anything singlethreaded I agree with you. For any asynchronous behavior, whether it's threads or async or callbacks, the borrow checker gets very stubborn. As a response, people just Arc-Mutex everything, at which point you might as well have used a GC language and saved yourself the headache.

However, enums with pattern matching and the trait system is still unbeatable and I miss it in other languages.

The problem is that when lifetimes cause you problems, it can force the entire feature development to stop until the problem is fixed. There is no reasonable escape hatch or alternative (clone doesn't always work).

+1. My programming style normally consists of trial-and-error style prototyping for a couple of iterations, and then later refining that into something that's solid and robust. I find Rust's inofficial "prototyping mode" difficult to combine with it's regular "production grade mode" for practical purposes.

Rc/Arc is usually the escape hatch. I presume you put raw pointers under the "no reasonable" caveat, but if the borrow checker is wrong and you're right, they can be the right tool too. They're not unreasonable if the alternative was to use a different language that couldn't guarantee this safety either.

I know dealing with the borrow checker can feel like a dead-end sometimes, but dealing with it is something that you can learn. Things that it can't handle fall into a handful of common patterns (like self-referential structs). Once you learn what not to do, you can recognize and avoid the issues even before you write the code.

That sounds like good advice. I'll keep that in mind next time I attempt to use Rust on something.

Ada is another option without a GC. I wrote a search tool for large codebases with it (https://github.com/pyjarrett/septum), and the easy and built-in tasking and pinning to CPUs allows you to easily go wide if the problem you're solving supports it.

There's very little allocation since it supports returning VLAs (like strings) from functions via a secondary stack. Its Alire tool does the toolchain install and provides package management, so trying the language out is super easy. I've done a few bindings to things in C with it, which is ridiculously easy.

Of the ones you mentioned, Zig is the only one that has explicit memory management.

> Is it worth learning

Languages are easy to pick up once you understand fundamentals. The borrow checker is intuitive if you have an understanding of stack frames, heap/data segment, references, moved types, shared memory.

You then should be asking "Is it worth using?", then evaluate use cases.. pros/cons.. etc.

For CLI, Rust is likely the easiest given it's macros, but if you struggle with the borrow checker then it won't be. You will be fighting the compiler instead of developing something.

Depending what your CLI program is doing, you might want to evaluate what libraries are available, how they handle I/O, and parallelism.

JavaScript has incredibly easy and fast concurrent I/O thanks to libuv and v8.

>> Is it worth learning

> Languages are easy to pick up once you understand fundamentals. The borrow checker is intuitive if you have an understanding of stack frames, heap/data segment, references, moved types, shared memory.

I see this sentiment often. In the last 10 years, I have come up a level in raw C, learned Kotlin, Swift, Python, Elixir/Erlang, and a smattering of JavaScript, all coming from a background that included Fortran and Smalltalk.

My problem with the dialogue is what is meant by “learn.” I have architected, implemented, and maintain different components of our products in all these languages currently. I think that demonstrates I have “learned” these languages, at least at this level of “picked up.” But I can’t write Python the way Brett Canon does. Or Elixir the way Jose Valium does. Or any of their peers. And in that regard I still very much feel I have not “learned”.

I spent a couple days playing with Zig a month or so ago. I became familiar with the way it worked. I could spend another month or so in that phase, and then could probably comfortably accomplish things with it. But I don’t think I’d feel like I’d “learned” it.

It reminds me of my experience learning Norwegian. I lived in Norway for 2 years and did my best to speak as much Norwegian as I could. At six months I could definitely get by. At 13 months, as I embraced the northern dialect, I was beginning to surprise Norwegians that I was from the states. I started dreaming in Norwegian at some point after that. But even at 24 months, able to carry on a fluid conversation, I realized I still could “learn” the language better than I currently knew it.

So I guess, it always seems there needs to be more context, from both the asker and the answerer, when this “should I learn X” discussion is had. Learning is not a merit badge.

I've found that Go is not elegant enough for me and Rust is too difficult to write (I started using Rust in 2015 and after years of trying I eventually realized Rust doesn't make sense for most apps), so I'm all in on Crystal. Despite not having much prior Ruby experience, I absolutely love the language.

Crystal doesn't have built-in support for parallelism, let alone production-grade support. This is a significant lack for a modern language.

For a language that is around 8 years old, this may be a serious problem, since the surrounding ecosystem has been probably written without parallelism in mind, and it may take a very long time to be updated (if ever).

> Crystal doesn't have built-in support for parallelism

They do, but it is hidden inside a compiler flag, if you compile your prject with `Dpreview_mt` then it will come with multi-threaded support. This has been an experimental feature for a few year though, and there is not much improvement since it first got introduced.

Personally I don't use crystal for this kind of feature, and it runs stable enough when I use it for some cpu intensive tasks when I rarely need it.

Crystal really shines when you need something that you usually write a python/ruby script to do, especially for tasks that run for hours. Converting some script from ruby to crystal and run it in production mode typically reduce the time consumed to 1/5 or even 1/10 of the original depends on the job. As someone who have to read gigabytes of text files regularly, Crystal is currently the best one for the task.

The compilation time for released binary is something need much improvement though. And I'm not sure if they can even achieve incremental compilation.

> hey do, but it is hidden inside a compiler flag, if you compile your prject with `Dpreview_mt` then it will come with multi-threaded support

It depends on the domain. From a production perspective, an flag-gated functionality that has been experimental for two or more years, is not "built-in". Plus, as explained, the ecosystem (I think I've read even the stdlib) doesn't give guarantees about thread safety

For small-scall scripting, then sure, it could be useful - but anything will do. I've evaluated for use at my company, and discarded it, because of the lack of libraries. Sadly, this is a chicken-and-egg situation. I've also evaluated contributing to it, but I won't until multithreading is stable.

> I've evaluated for use at my company

Well this might be the problem. In corporate environment you can't afford to be too adventurous.

Personally I solve the "lack of libraries" problem by using more than one language, then connect them via child process call or some persistent storage like database or plain text files.

But it's entirely a different matter when the code need to be used by a lot of people.

I use PyPy for such cases. Is Crystal better than PyPy?

I think Crystal is better than Python in term of language design. Unlike Ruby and Python that were way older, crystal is relatively new, so they learned from other languages mistake and try to improve it, result in a more cleaner language.

For the cases mentioned, I think crystal is immensely helpful: - Reading/writing files are easy, usually a single method will give you the result you want. - Working with directories are nice, things like `Dir.mkdir_p`, `Dir.each_child`, `File.exists?`... all existed to make your life easier. - Like ruby, you can invoke shell command easily using backticks - There are some useful libraries to for console app, like `colorize` or `option_parser`. Crystal is a battery included language, so the standard library is filled with useful libraries. - Working with lists and hashmaps is a breeze, since the Enumerable and Iterable modules are filled with useful methods, mostly inspired from ruby land. - Concurrent is built in, so you can trivially write performant IO-bounded tasks like web crawlers.

For a project that made by a handful of people, I just can't praise the dev team enough for making a language this practical.

Modern Rust is much more straightforward than it was in 2015. It's effectively two different languages, albeit maintaining backward compatibility (i.e. code written for Rust 1.0 should still compile today, with proper edition settings).

Suppose I wanted to try learning Rust again; is there a resource for someone with a lot of (hobbyist) programming experience, and experience with low level languages and memory management (e.g. C), but not complicated low-level languages, like C++?

When I tried to work with Rust a few years ago I found it utterly impenetrable. I just had no idea what the borrow checker was doing, did not understand what the error messages meant, and honestly couldn't even understand the documentation or the tutorials on the subject. Understanding what is happening in C or Zig is pretty easy; in Rust it's always been a nightmare for me. I just really don't grok the "lifetime" concept at all, it feels like I'm trying to learn academic computer science instead of a programming language.

Rust feels to me like a powerful, expressive language for professional programmers at the top of their game. That's a complement for any language. But it comes at the cost of mind-numbing complexity for anyone who's not an expert.

> Suppose I wanted to try learning Rust again; is there a resource for someone with a lot of (hobbyist) programming experience, and experience with low level languages and memory management (e.g. C), but not complicated low-level languages, like C++?

The official Rust book is targeted at novices with some programming experience. There's also Rustlings https://github.com/rust-lang/rustlings for a more practical approach.

> When I tried to work with Rust a few years ago I found it utterly impenetrable. I just had no idea what the borrow checker was doing, did not understand what the error messages meant, and honestly couldn't even understand the documentation or the tutorials on the subject

The compiler diagnostics have improved a lot over time. It's quite possible that some of the examples you have in mind return better error messages.

> in Rust it's always been a nightmare for me. I just really don't grok the "lifetime" concept at all, it feels like I'm trying to learn academic computer science instead of a programming language.

Academic computer science calls lifetimes "regions", which is perhaps a clearer term. It's a fairly clean extension of the notion of scope that you'd also find in languages like C or Zig. It's really not that complex, even though the Rust community sometimes finds it difficult to convey the right intuitions.

Fair enough, I do need to have a look at the book again, although that was one of the sources I found impossible to understand a few years back. I think there's a temptation to talk about lifetimes in extremely abstract terms under the assumption that the reader already understands and appreciates the abstraction. I, however, was never able to build up an intuition for it, and so tutorials that didn't explain what was happening in detail sailed over my head.

I second zozbot234's statement about it being far better than it was in those days.

The language team has done a great job rounding rough edges, and this next roadmap is slated for even more polishing. They heavily prioritize dev experience which is why i think people like myself (a GC'd language person historically) use and love Rust so much.

Zig uses manual memory management too (even more manual than Rust), so that's a bit strange question.

It's really easy in Zig to be honest. Just put `defer thing.deinit()` in the right scope and you're done. You gain explicitness and know exactly what's going on in your Code. Everything is obvious. That's the reason Zig is so incredible simple and easy to read. Zig also has a GPA that will tell you about memory leaks or anything.

And in rust you just put `` in the right scope and you're done. This is perfectly explicit and you know exactly what's going on in your code. Everything is obvious.

You can execute arbitrary code on drop operations by implementing `std::ops::Drop` for a type.

A `.deinit()` function could also run some arbitrary code that does weird and unexpected things. The point is that reasonable people don't abuse functions like that — if they do abuse, don't use their code. Neither Rust nor Zig is a sandbox that could stop actively stupid/malicious code.

Rust's ownership rules are no less explicit. The object dies when the owner of the data goes out of scope.

Interesting, based on their code samples it looked to me like a GC language since (at least from what I saw) I didn't see anything regarding memory management.

It’s easy, IMHO, to mistake Zig as a GC’ed language or more broadly as a memory safe systems language. It’s neither but it is a nicer C.

I am not sure what code samples you looked at, but https://ziglearn.org/chapter-2/ should give you an idea.

> or are GC languages like Nim or Crystal good enough?

Any programming language is good enough for their own use cases :) It's a matter of understanding which the use cases are.

I'm a big Rust fan, but I nonetheless believe that the use cases for programming languages with manual memory management are comparatively small, in particular, since GC has been improved a lot in the last decade.

For undecided people, I conventionally suggest Golang. Those who at some point need deep control, will recognize it and change accordingly.

Why Go? It is quite terrible in expressiveness and if you do commit to a GC you have plenty of better choices. But to each there own, otherwise I agree that systems programming is a niche and a good GC is an overwhelmingly good tradeoff in almost every case.

Crystal: compilation speed is just too slow, sadly. Nim and Zig: I'd definitely just go with Zig. It's an extremely simple language, has no macros (but something much better than macros), is explicit, and in the long run it's just going to be worth it much more than Nim.

Memory has to be managed by something. The more decisions that are made for you in how that happens the less flexibility there is for certain situations.

Sure but my use cases would be stuff I'd normally write in TypeScript or Python that already have garbage collection. Like I said I'm not doing embedded programming so I don't have too much of a need to manage memory.

My question could be further constrained then to be, is learning Rust or Zig despite its manual memory management worth it for applications that are normally already garbage collected in their current implementations? Or are languages like Nim and Crystal enough? Does Rust and Zig have other benefits despite manual memory management?

The way you describe your use case I think you are fine with a language with garbage collection like Nim (which has has a syntax a bit like Python) or Crystal. I would also throw Go in the ring or if you are interested to learn a bit of functional programming then you also could look at Ocaml.

Zig has no garbage collection btw, but makes it easier than C to handle that. Another language without garbage collection that helps a lot to avoid memory issues is Ada (Looks a bit like Pascal). So there are alternatives to Rust.

imo, most code can do just fine with GC. modern GCs can be relatively low overhead even with guaranteed small pauses (10ms). furthermore, most code that can't handle pauses can be written to not allocate any memory (so GC can be temporarily turned off). as such, the only two places where you need manual allocation are for OS development, and hard real time requirements.

When tail latency (high-percentile latency) is important GC is not a good choice. Wait-free (threads progress independently) concurrent algorithms also need wait-free memory reclamation with bounded memory usage to be able to guarantee progress.

But most software are throughput-oriented.

Additionally, not all GCs are made alike, and languages like D, F#, C#, Nim, Swift, among others, also offer value types and manual memory management, if desired.

Also Swift and Nim w/ ARC use reference counting, which generally give much better latency and lower memory overhead. Reference counting is part of the reason iOS devices don’t need as much RAM.

Nim’s ARC system also doesn’t use atomic or locks which means it’s runtime overhead is very low. I use it successfully on embedded devices for microsecond timed events with no large tail latencies.

Reference counting is a GC algorithm.

I wouldn't buy into much Apple marketing regarding its performance though,


It makes sense in the context of tracing GC having been a failure in Objective-C due to its C semantics, while automating Cocoa's retain/release calls was much safer approach. Swift naturally built on top of that due to interoperability with Objective-C frameworks.

Nim has taken other optimizations into consideration, however only in the new ORC implementation.

Still, all of them are much better than managing memory manually.

> I wouldn't buy into much Apple marketing regarding its performance though,

I wouldn’t make claims on Swifts overall performance, but just it’s memory usage (really Obj-Cs) and particularly for GUI development. Java’s GCs have always been very memory hungry, usually to the tune of 2x. Same with .Net. Though to be fair Go’s and Erlang’s GCs have much better memory footprints. Erlang’s actor model benefits it there.

Agreed, they’re all better than manual memory management.

OSs and hard real time can also be written in managed languages — there are many research OS written in managed languages (with a bit of assembly, but you also need it for C as it is not low-level either), like Midori, and there are even hard-real-time JVMs used in military settings like jamaicavm.

Rust's memory management is "manual" but it feels automatic for most uses.

Nim strikes a great balance. No need for a low level language for cli and general software. I liked crystal but the lack of support on windows and lackluster dev experience made me stick to Nim.

Nim also can double as a web language by transpiling to JS.

Nim's super power is being ridiculously productive (at least for me). Hack stuff out like a Python script, yet it runs really fast and is a tiny self contained executable, so you can just use it as is and move on to the next task. If you want manual memory management, that's easy too. Want to use a C/C++ library? No worries you have ABI compatibility. As you mention compiling to JS lets you use it as a web language and share code and types between front and back end.

Then you can automate code generation with the sublime macros, which are just standard Nim code to create Nim code. No new syntax or special rules required - any Nim code can be run at compile time or run time, so you can use standard/3rd party libraries at compile time to write macros and give the user a slick syntax whilst removing boilerplate.

I really miss languages without straight forward metaprogramming after using Nim. It's something that multiplies the power of a language, rather than just adds to it.

I haven't properly looked into Nim yet, and the sibling comments here make for some interesting signalling.

I would not recommend Nim.

Thank you so much for such a profoundly insightful comment. I'm now even thinking of changing professions thanks to it.

I like Nim.

> just interested in command line apps and their speed as compared to, say, TypeScript

There are no fast languages, only fast language implementations.

NodeJS/V8 is pretty fast (even faster than you probably think)—particularly if you're already doing things like making the sort of compromises where you limit yourself to writing only programs that can be expressed under the TypeScript regime. It's usually the case that it's not the NodeJS runtime that is the problem but rather the NodeJS programming style that is the source of the discomfort with "speed" that you will have experienced.

I haven’t tried Zig yet, but I liked Common Lisp better than Go and Rust for fast CLI apps/scripts. You need to dump an image with all your deps already loaded, and then you can run small scripts with instant compilation very fast. It’s an ergonomic language with good IDE support in emacs. There are some rough edges around (inconsistent API design, the package manager not using HTTPS, …), but tolerable.

Common Lisp doesn't offer features like sum types right? I am under the impression that it is a dynamically typed language like most Lisps, but I could be wrong.

You might peek at Coalton.


I don’t think it has sum types, but it offers a rather sophisticated and very functionally oriented type system that compiles back to SBCL. It looks neat, haven’t found time to play yet…

It has optional typing, but as it’s fast without it, I haven’t dug into them. I don’t know about a union type though.

If you're doing things like command-line apps you'll probably find Rust much easier to work with than Zig. You can write high-level code that looks basically like TypeScript in Rust, whereas Zig is more manual. Nim and Crystal are easier from a language perspective, but they have much smaller ecosystems.

Besides Rust and Zig, you might want to check out Vlang and Odin, who are in the same category.

I remember reading a lot of controversy about V back when I first heard about it so I decided not to look into it further [0], but I'll take a look again. Odin looks interesting, Pony does as well.

[0] https://news.ycombinator.com/item?id=25511556

A lot of the so-called Vlang controversy appears to have been disguised allegiance and competition between the newer languages. Looks more like various people defending their interests. Languages like Odin, Zig, Nim, Crystal are far older than Vlang. When Vlang came along and got a lot of sudden popularity and funding, looks like various competitors sought to bash it and hoped it would disappear.

In addition, such detractors were acting like a brand new programming language would be a finished polished product from day 1, when that was not the case for their far older languages. For instance, Odin and Crystal are older than Vlang, but has been surpassed by it in various respects.

And don't get me wrong, I like Odin. That's because Odin is among the newer breed of languages that have continued the trend in which Go started of non class-based OOP and more generalized OO, that are contenders to be alternatives to C/C++ (like Zig).

In the case of Vlang, it has clearly been developing rapidly and consistently, and continually gaining in popularity. Simply looking at their releases (and release schedule) along with their documentation (which various newer contenders are lacking in even that), will show a lot of the controversy is without merit or distortions of language development reality.

https://github.com/vlang/v/releases https://github.com/vlang/v/blob/master/doc/docs.md https://modules.vlang.io/

It doesn’t help that plenty of claims on the V website was simply ridiculous, like transpiling C to V and making it memory safe (or some insane compile speed claim transpiling Doom, if I remember well?). Also, memory safe without a runtime and without a rust-like borrow-checker.. these are simply impossible lies. Not sure about its current state, hopefully these are all removed and the sane claims are in progress.

It is better to actually use the language, particularly as it is presently, versus refer to old disputable controversies from years ago. There are no false claims on their website (in my opinion), though of course there will be things people can dispute and argue about, as there is with all developing programming languages.

If a person does their research, then they would have clarity on the subjects. This is partly why I think it is better to refer people to the website or advise them to do their own evaluation, as oppose to reference critics or 3rd party websites who may have hidden motives and are advocates for a competing language. I think the competition between the newer languages has come to that point.

> There are no false claims on their website

Are you sure about this? Let's look at their first feature, "No null". Ironically every vlang project including compiler itself is full of `isnil`[0] check.

This is just trailer for their lies; once start digging you'd find misrepresentation in every major features they advertise.

0: https://github.com/vlang/v/search?q=isnil

Then could you please tell me the actual memory model of the language? Because that is the numero uno thing I want to know about a language, yet I constantly read contradicting thing on vlang’s model. (Is it boehm gc?)

Vlang has multiple methods of optional memory management. This should not be so hard to grasp, as this is the case for other languages too. Take Nim, for example. It has several: gc:refc, gc:markAndSweep, gc:boehm, gc:go, gc:arc, etc...

Vlang has both the options of Autofree and GC, or you can do it manually. As the language is still developing (like other languages such as Crystal, Odin, and Zig), users should know these memory management options are experimental and still being refined.

I recommend C. Once you get the hang of it, it's as fast as writing Rust code (and you don't have to think about borrow checks).

Yeah, I sure love debugging segfaults. I've used C before and it was cool to learn but I haven't used it since.

Once you've used C enough, you get the skill to avoid writing segfaults, and when a segfault happens, they're a lot easier to debug.

However, since you're just interested in writing a fast command-line app, what about the JVM or .NET? Those have a startup time issue, but once they're running they're very fast, less than an order of magnitude slower than C/Rust/Zig/etc.

My biggest gripe with C is not even the manual memory management, since with Valgrind and friends they are not that hard to debug. But how do you solve such a basic issue like having a vector-like data type for different types? Its “macro” system is just a disgusting hack, I could just as well write a sed script instead. And the other option is runtime-overhead or copy-pasting code..

> vector-like data type for different types

When using C, if you want something from a higher-level language, usually you'd just do what that language's runtime does internally. So for a vector of heterogeneous objects, you could do a linked-list of pointers to the data objects. Something like:

    struct VecItm { struct VecItm next; void *data; size_t data_len; char objTypeId; };
Is it more work? Yes. But at this point I would stop and consider if I really need a heterogeneous collection. Do we really need to store a list of objects with arbitrary sizes? What if we have 3 types of objects, in 3 arrays, and we store index into those arrays?

    struct Foo { unsigned byte x; };
    struct Bar { unsigned word x; };
    struct Baz { unsigned long x; };

    struct Foo foos[256];
    struct Bar bars[256];
    struct Baz bazzes[256];

    // objTypeId: 0 = foo, 1 = bar, 2 = baz
    struct VecItm { struct VecItm next; unsigned int idx; char objTypeId; };
Or what if we can do something even better? What if Foo, Bar, and Baz all have a max size that isn't too wildly different? Can we store them all in an array directly?

    // objTypeId: 0 = foo (x contains 8 bits), 1 = bar (x contains 16 bits), 2 = baz (x contains 32 bits)
    struct VecItm { unsigned long x; char objTypeId; };
    struct VecItm vecItms[256];
> Its “macro” system is just a disgusting hack

It's really not that bad. It can result in spaghetti-code where your macros are using other macros and they're spread all over a header file. But if you use them surgically, only when needed, they don't cause much trouble.

> I could just as well write a sed script

The benefit of C macros over sed is if you use some language-aware tooling like an IDE (such as CLion), it will syntax-check and type-check your macros.

I don’t want a heterogeneous “array”, I want a generic one. Your first example is the overhead I talked about, most high-level languages don’t do this at all. Pointer chasing is really pricey on this level.

Your second example misses the auto-growing part. Sure, I want an array, but if that becomes too small, I automatically want to realloc the underlying data to a larger array. And I want to use it for Foo, for int and for any sized objects, generally.

C++ can create the proper zero-overhead abstraction over it by generating code for each type I use it for, C to my knowledge can’t do so without copy-pasting code or doing things with runtime overhead, resulting in (often) much slower code.

In that case copy-pasting is the way, and macros make the copy-pasting much less onerous. This could be messy if there are a lot of types, but I have yet to see a program where having auto-growing vectors of many different types was absolutely essential.

Andrew how will the self hosted compiler maintain a known trust back to assembly?

i.e how can someone look at a self hosted zig compiler and build it themself from source, never needing to download blobs from the internet?

Otherwise you lose the ability to trust anything you build.

Zig specifics aside… build it from source using what? Another compiler. Okay, so you can compile that one from source. But what does that? Another compiler.

Unless you’re recording memory contents and executing instructions by hand, you’ve just discovered the Ken Thompson hack. At some point the pragmatic thing is to trust some bits from a trusted source (e.g. downloaded from an official repo w/ a known cert, etc.).

Rice's Theorem makes the Ken Thompson hack impossible in general. He only executed his hack in a single short-term demonstration against a specific target, but it's not possible to make a "long-con" against an open-source project with lots of activity and lots of people building it even if you find a way to infect nearly all of those people.

You just need 1 compiler written in assembly, and work your way up from there.

C17 -> assembly of your choice, written in assembly

Zig -> C17 “transpiler”, written in C17

Why do you trust the assembler you got from somewhere any more than a compiler?

Because hand written assembly is readable!

I believe GP used assembler to refer to the program which reads your hand-written assembly and produces a binary. That program was presumably given to you so you need to trust it.

No, the only way to solve this problem is to start with a computer entirely devoid of code and bit-bang your assembler into the machine with switches, the way the first users of the MITS Altair did it.

You can also bootstrap the way lisp did and write the fist compiler in the language and get a bunch of grad students to hand-compile it.

But, yeah if you don't have a bunch of grad students at the ready, an assembler hand-written in machine code is the only option if you want to trust the entire stack. Though I'm not sure what that would get you. I don't know of any higher language compilers that are written directly in assembly these days, so you'd never be able to compile your C/C++ compiler.

There was a C compiler written in assembly posted on HN 4 years ago [1]!

So yeah, if you wanted to, you could bit-bang an assembler capable of assembling a simple C compiler, then in your simple subset of C you could implement a full C compiler, and from there you can do anything you want!

The grad student approach also sounds interesting. A basic Lisp interpreter can easily fit on a single letter-sized sheet of paper. A single person could hand-compile that, it would just take longer. But, if you're living alone in a cabin in the woods with your own hand-built computer and a personal library of computer books in hard copy, that would be a totally feasible project.

[1] https://news.ycombinator.com/item?id=17851311

I wasn't aware that there were any existing C compilers made in asm for modern architectures. I guess I shouldn't be too surprised.

That's cool, thanks for showing me.

It might be, but have you also validated the microcode executing it?

Or the Verilog/VHDL for the logic gates used by the CPU, for that matter?

Security is multifaceted.

Ideally, yes. But you still will be more secure than having multiple security holes.

Checkout the video Andrew linked in this thread. He talks about this in detail.

A big achievement for any decently sized language. Nice work devs :)

Probably a stupid question, but is LLVM a sort of cheat for "self compiling"? Shouldn't you need to compile LLVM as well?

They’re working towards having native backends for x86-64, aarch64, etc., that are all written in Zig, making LLVM an optional dependency, and eliminating literally all uses of non-Zig code (including no mandatory use of C standard library, for example) for those builds. https://news.ycombinator.com/item?id=31052234

I’m assuming these will primarily be for super fast debug builds (at least to start), and LLVM (and maybe C backend too) will still be the favored backend(s) for release builds.

It used to be the case that a compiler compiled itself to bare metal machine instructions, but now with so many cpu instruction set targets that's no longer the case. LLVM uses an intermediate language like an assemby language. Another way to look at it is the TypeScript compiler compiles itself which is written in TypeScript but the intermediate language is JavaScript. V8 handles the translation of JavaScript to machine code, similar to LLVM translating LLVM IR to machine code

Okay, so we're treating LLVM intermediate language or javascript as if "assembly" in the bare metal days. Makes sense.

> It used to be the case that a compiler compiled itself to bare metal machine instructions, but now with so many cpu instruction set targets that's no longer the case.

Tons of compilers emit machine code.

Sure, I didn't say they didn't. Are you refuting what I said and implying that a compiler is not self-hosted if it doesn't compile to machine code?

> I didn't say they didn't

You said 'that's no longer the case'. Isn't that the same as saying they don't?

In the context of the question, it is no longer the case that a compiler has to compile to machine code to be considered self-hosting. I thought this was clear, but I guess it wasn't

Quick question, which maybe someone here can answer: I noticed that GitHub supports syntax highlighting in the Zig repo. How does that work for a new language such as Zig? Can you somehow upload a file which tells GitHub how you'd like programs written in your language to be displayed?

I believe this is the syntax highlighter’s repo: https://github.com/github/linguist/blob/master/CONTRIBUTING....

Thanks, this is what I was wondering! Seems a new language needs to have 200 repos which use it before GitHub will consider adding syntax highlighting for it..

I’ve spent the last few weeks building an 8080 emulator in Zig to learn both emulator programming and the language. Gotta say, it’s been a pretty pleasant experience. My only issue was with dynamic dispatch, which lead me down quite a rabbit hole which I didn’t ever fully come out of. Seems that the current situation is build your own using compiler functions and pointer casts.

I have been thinking about creating a little higher level language that targets server side web assembly. Zig looks very attractive, but concerned with how to handle things like strings and datetimes.

How suitable is zig for such a task compared to say rust?

I may be misunderstanding you, but it feels like strings and datetimes are library related more than language related.

I know that higher level languages will have primitives that aim to represent strings, but if you need to get into the weeds with Unicode then you'll be leaning on a library regardless

> I may be misunderstanding you, but it feels like strings and datetimes are library related more than language related.

Yes would agree and see them as platform related. It's just too large a task to create from scratch. Like say on JVM you can compile to bytecode and have strings already built into platform, and java.time, and ability to access an ecosystem of libraries.

With zig could one could use c or rust libraries?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact