Hacker News new | past | comments | ask | show | jobs | submit login
Zig, parser combinators, and why they're awesome (hexops.com)
351 points by todsacerdoti on March 10, 2021 | hide | past | favorite | 131 comments

Interesting read, especially as someone poking w/ writing a parser in zig for fun :)

One area of improvement worth mentioning in this area is that currently zig errors cannot be tagged with extra information, which tends to be important for surfacing things like line numbers in parsing errors. There was a proposal[0] to improve this and it's not the end of the world for me anyways (as a workaround, I'm toying with returning error tokens similar to what treesitter does)

On a different note, there's another cool zig thing I found recently that is mildly related to parsing: a common thing in parsers is mapping a parsed string to some pre-specified token type (e.g. the string `"if"` maps to an enum value `.if` or some such so you can later pattern match tokens efficiently). The normal way to avoid O(n) linear search over the keyword space is to use a hashmap (naively, one would use a runtime std.StringHashMap in zig). But I found an article from Andrew[0] about a comptime hashmap where a perfect hashing strategy is computed at comptime since we already know the search space ahead of time! Really neat stuff.

[0] https://github.com/ziglang/zig/issues/2647

[1] https://andrewkelley.me/post/string-matching-comptime-perfec...

The comptime switch idea has been expanded into a full fledged implementation in the standard library!



Oh, very cool, I somehow missed that! Thanks!

The zig standard library also has a ComptimeStringMap type for this use case which is used by the self hosted tokenizer for example.


Beat you by 3 full minutes :P

I'd be curious to hear your thoughts on Zig so far. I have a lot of respect for your design taste based on mithril.js, particularly when it comes to tradeoffs between functionality and simplicity.

I'll get the bads out of the way first: there are areas where the language isn't quite there yet (e.g. the error thing I mentioned) I ran into an issue where you can't do async recursive trampolines yet (think implementing client-side http redirect handling in terms of a recursive call).

The io_mode global switch plus colorblind async combo is something I'm a bit wary of since it's a fairly novel/unproven approach and there are meaningful distinctions between the modes (e.g. whether naked recursion is a compile error).

Another big thing (for me, as a web person) is lack of https in stdlib, which means realistically that you'd have to setup cross compilation of C source for curl or bearssl or whatever. There's a project called iguana-tls making strides in this area though.

With all that said, there's a pretty high ratio of "huh, that's a cool approach" moments. There are neat data structures that take advantage of comptime elegantly. There's a thing called copy ellision to avoid returning pointers. The noreturn expression type lets you write expressive control flows such as early returning a optional none (called null in zig lingo) from the middle of a switch expression. Catch and its cousin orelse feel like golang error tuples done right. Treeshaking granularity is insanely good ("Methods" are treeshaken by default; so are enums' string representations, etc). The lead dev has a strong YAGNI philosophy wrt language features, which is something I really value.

Overall there's a lot of decisions that gel with my preferences for what an ideal language should do (as well as what it should avoid)

> Another big thing (for me, as a web person) is lack of https in stdlib, which means realistically that you'd have to setup cross compilation of C source for curl or bearssl or whatever.

But that should be very easy in zig, as zig can compile c?


Linking to the system curl is indeed very easy, just pass `--library curl`, but cross-compiling means you can't just do that (since e.g. a windows dll is not going to work on macos). Instead you need either source code or a `.o` file.

Compiling C source is "easy" in the sense that the compiler can do it without huffing and puffing, but it comes with a bit of yak shaving (namely, setting up build.zig or fiddling w/ the respective CLI flags, and obviously you also need the actual source code files to be in the right place, etc.) It also means that maybe the memory allocation scheme will not be quite what you want (e.g. you can't pass your arena allocator to curl_easy_cleanup)

Awesome, thanks!

Super interesting, thanks for sharing! I'd be curious to learn more about how you workaround the lack of extra info in error types in practice? Are you just returning e.g. a struct with additional info?

Yes, e.g. `const Token = struct { kind: Kind, otherStuff: ... }`, where Kind is an enum where one of the values is Kind.error. Then since switch is exhaustive, you can just pattern match on kind as you iterate over tokens to handle the error case at whatever syntactic context is most appropriate.

The nice side-effect about this approach is that rather than following the standard error flow of bailing early and unwinding stack, you can keep parsing and collecting errors as you go.

There are some features exclusive to errors though (errdefer, stack traces, implicit error unions). Did you find yourself missing any of these by doing it this way? I'm partially asking because I was just making this decision the other day, and I went with errors for now.

For parsing specifically, I haven't felt the need for them (errdefer is not really relevant since I don't typically need to clean up resources half-way through parsing, and likewise, zig stack traces aren't necessarily as useful to an end user as contextual parsing metadata.

I do use errors for other stuff, and I wish I could, for example, attach actionable error messages to errors, to be dispatched to whatever logging mechanism is setup. Bubbling up an error from stdlib and printing it from main makes for poor end user experience, and pattern matching an entire application's worth of an error union in order to map an error to a descriptive message is not as ideal as writing the messages where they come from as you would w/ e.g. golang's `errors.New(message)`.

I absolutely love parser combinators, and years ago I did a huge rewrite of my compiler compiler so that it'd have the easiest possible parser combinators: simple string concatenation. grammarA + grammarB = grammarC. You can play with that here (https://jtree.treenotation.org/designer/). You still need to make a line change or two (good design after all, requires some integration), but it just works. Haven't seen anything beat it and not sure if that's possible. (Note: I do apologize for the crappy UX, and it's high on the priority list to give the app a refresh)

Awesome app. Do you plan on using it for anything in particular? Or are you just creating it as a passion project. It's totally cool.


Learning about https://treenotation.org/ (linking this for other people, not for you, Breck :P), and I like what I see. My first impression was: "tree notation is like lisp, but with python indenting".

But then looking at it more and I see it's much more like YAML or CSV or whatever. But then I read:

> We no longer need to store our data in error prone CSV, XML, or JSON. Tree Notation gives us a simpler, more powerful encoding for data with lots of new advanced features

And I didn't understand! Tree notation seems equivalent to these. Like at a certain level, it's all just data. Now, the major benefit is that you're supposed to think differently about what you're doing when using tree notation. Would love to hear your opinion about this conjecture.


Not getting into the beautiful and amazing work that's been done WITH tree notation (yet), that's a whole other conversation!

> Do you plan on using it for anything in particular?

It's hard to believe, but I use that thing everyday, haha. To me every problem is a nail solvable with my DSL hammer.

> are you just creating it as a passion project.

I want other people to use it, but it took a while for me to be truly confident that it will work and the underlying math is sound (that 2-d languages are better than 1-d langs). So just in the past week formed a corporation (https://publicdomaincompany.com/) and growing the team and are actually going to try and make a good user experience and help people use these technologies to solve their problems.

> Would love to hear your opinion about this conjecture.

I always think of code now in 2 and 3 dimensions. To me after many years this is just second nature but it's not an obvious thing and not sure I've ever met anyone else that does this. I gave an early talk about it in 2017 (https://www.youtube.com/watch?v=ldVtDlbOUMA) at ForwardJS (there should be an actual recording out there on the web somewhere but I can't find it).

Traditionally all programming languages are 1-D, read by a single read head that moves linearly from start to finish (with some backtracking, but always just along 1 axis in order), building up an AST and then going from there. There's no reason it has to be that way, it's just the way things developed with our 1-D register machines.

Human beings do not process language this way at all. Not even close. If you have a physical newspaper by you, pick it up and pay attention to the way your eyes parse things. You'll likely notice a random access pattern, with your eyes moving constantly across both the x and y axis, and you parsing the semantics by things like position, font size, layout, etc. To me it's so obvious that this is the way computer languages should work. Their shouldn't be lots of physical transformations of the code, using cryptic syntax characters like ( and " and [ and { as hacks to provide instructions to the parser. Code should be written in accordance with a strict grammar, yes, but it should also be written in way pleasing to the human eye and the way human brains work.

We should make human languages that machines can understand instead of making machine languages that humans can understand.

Anyway, I'm rambling on and on, but this is the bigger idea than simply the tree notation implementation, which is really just a subset of a whole new world of possibilities in 2d and 3d languages. (here's another recent one showcasing the new possibilities: https://www.youtube.com/watch?t=145&v=vn2aJA5ANUc&feature=yo... — when I write Tree Code I see spreadsheets and vice versa)

Also, when I play with legos now I see code, and when I write code I see legos. That's a hard thing to communicate and an early tool I wrote called Ohayo shows it off a bit, but need to write a single function that takes a tree program and spits out a lego vis (something like LDraw), and maybe that will help explain the idea https://github.com/treenotation/research/issues/33

I see a lot of articles posted on HN about Zig. What's so special about Zig, and how does it fill its own niche with other languages like Go and Rust?

As far as I know, zig is the only language that is able to output binaries on par in size w/ C (a hello world in zig is about the same size as a C one, whereas most other languages can only manage at a minimum an order of magnitude bigger binaries, sometimes several orders of magnitude). Zig interops cleanly w/ C ABI and C toolchains, can also cross compile AND can cross compile C proper. You can even drop all the way down to asm (standard C technically doesn't support this).

I often refer to zig as a "very sharp knife": it's "cool" for new languages to have more safeguards to protect you from yourself, but Zig feels a bit like it goes in opposite direction in the sense that it exposes the underlying plumbing more than most languages. For example, Go and Rust memory allocations and memory layout are fairly opaque; in zig you can control it idiomatically with obsessive precision.

But unlike C, zig offers a host of safety features, like integer overflow checks, compiler-checked optionals and exhaustive switch, as well as a well behaved compile-time system, and a bunch of syntactical sugar ("method" syntax, if/while/block/return expressions, etc).

And unlike C++, zig is a very small language.

It's possible to make very small Rust binaries: https://github.com/johnthagen/min-sized-rust 8kB for that Rust example, compared to 16kB for my optimized C hello world compiled with GCC. (Not that I think it makes a difference to anything.)

Furthermore, how are Rust memory allocations and layout "fairly opaque"?

For example, usually it's not obvious when memory is being freed. I recall a thread about esbuild where the author was saying they had better GC perf from go due to GC happening in a thread whereas in Rust it didn't and it was not clear how to make that happen or if it was even possible.

In zig, you can use an arena allocator, you can free things piecemeal while using an arena, you can free the arena on program exit or in a thread, you can enforce a stack allocator, have multiple allocators, etc and because of the orthogonality of the Allocator interface, most of this knowledge is something you can probably pick up within the first 2 hours of zig (assuming you know C).

Re: edit about rust binary size. When I say zig is on par w/ C in terms of binary size, I mean I get the small binary with the basic `zig build-exe hello.zig -O ReleaseFast --strip` command, without any attempts at optimization. Do you know if this 8kb number includes the UPX optimization? If so, I'm calling shenanigans :)

You can use arena allocators in Rust: bumpalo is popular. And it's easy to send objects off to a separate thread to be destroyed if that's what you desire: just send them over a channel to another thread (though this is really something that the malloc implementation should do internally if it's a win). There's an RFC for multiple allocators that has had some implementation work done lately: https://rust-lang.github.io/rfcs/1398-kinds-of-allocators.ht...

I don't believe that that 8kB number is with UPX, but again, even if Rust produces smaller binaries than C, it doesn't matter. Most users would prefer faster compilation times with codegen-units over a few kB smaller binaries, for instance, as long as there's a reasonable way to get the smaller binaries for those who need them.

I mean, I don't want to speak for the esbuild author, but he did mention that his rust implementation ran into a number of issues[0], one of them being related to making lifetime stuff happy with concurrency. Judging from other recent rust threads, I'm not super convinced that "easy" is necessarily a fair representation of how it feels to do not-quite-bog-standard-functional-stuff w/ rust.

[0] https://news.ycombinator.com/item?id=22336284

"A striped binary built this way is around 8KB."

It's before UPX.

> I mean I get the small binary with the basic `zig build-exe hello.zig -O ReleaseFast --strip` command, without any attempts at optimization.

You could set up all these things in just a few minutes, so it's not a very big deal. If you don't mind 20KB of string formatting code then you don't have to make "any attempts at optimization" either.

> If you don't mind 20KB of string formatting code

So this here is sort of why I brought up binary size. I know hello world size doesn't really say anything about real world, but as I was looking at different languages, a common theme was this defensive "well what d'ya expect from a thing that has [list of features]" tone when the question of binary size got brought up and people didn't know how to make it smaller. A normal rust build has a similar feel. The 8kb binary link above reads a bit like a hacking adventure to unravel the mysteries of yonder.

When I read stuff like this

    By using a C entry point (by added the #![no_main] attribute) , managing stdio manually, and carefully analyzing which chunks of code you or your dependencies include, you can sometimes make use of libstd while avoiding bloated core::fmt.

    Expect the code to be hacky and unportable, with more unsafe{}s than usual. It feels like no_std, but with libstd.
I'm undoubtedly awed at the cleverness of it all, but it's clearly way off the beaten path.

I should note that in addition to the zig build command being guessable by a newbie reading through --help, the hello world code itself is also the idiomatic thing a newbie would've reasonably written. And there's really no need for "air quotes", I really put no effort whatsoever in optimizing: I didn't even use `-O ReleaseSmall`

Among all the languages I looked at, Zig was somewhat unique in taking a default stance of proactively using compile time as a tool to take stuff away from the binary (in a sort of saint-exupery way).

I'm sure with enough effort it's possible to bring down binary sizes for most languages, but not many languages do this with the relentlessness that zig does out of the box.

> So this here is sort of why I brought up binary size. I know hello world size doesn't really say anything about real world, but as I was looking at different languages, a common theme was this defensive "well what d'ya expect from a thing that has [list of features]" tone when the question of binary size got brought up and people didn't know how to make it smaller.

Well, what do you expect? How much size does printf add to a Zig binary?

> When I read stuff like this

You can get the vast majority of the size savings without any of that stuff. This is someone trying to find the absolute minimum, starting with very practical suggestions before getting into itty bitty weeds. I'd suggest pretending the no_main and no_std sections don't exist.

Getting down to 50KB or 30KB is just a matter of telling the compiler to optimize for size and stripping unwind information.

If you want to keep all the unwind code and information I'm not sure how much space it takes.

> with enough effort

I will note again that the compiler settings in this repo, while they may be in multiple places, are still a very small one-time effort for a program.

Well, it's not so much about expectations about byte counts per se. YMMV but in my experience, for whatever reason the relative size of an average program happens to correlate with how big and complex a project gets as a function of its growth. For example, the Factor distribution is some 700MB and unsurprisingly a hello world image is also huge. A while back I also saw one analysis about jquery/vue/angular/react sites and curiously framework size was also a good predictor of average app size in the wild.

So I use the size metric as a coarse proxy for "complexity required to accomplish the task at hand". And if one is to believe the mantra "to be fast, do less stuff", then my instinct is to look for technologies whose proxy metric (i.e. idiomatic hello world size) is small.

Maybe there are better ways to measure this, but so far this heuristic has worked for me.

> How much size does printf add to a Zig binary?

It depends, because comptime. Basically, it emits different code depending on the types of the params being passed in. For hello world, it wouldn't include itoa logic, for example. The irony here of course is that zig could in theory compile to a very very large binary if it had a very very large number of printf calls, all with different permutations of types. But the thing is that to a zig developer, it's clear that this is explicit static polymorphism getting compiled into a plethora of monomorphic callsites. One could implement the equivalent of go's `interface {}` underlying {type, value} tuple and have a single polymorphic function at runtime if that's what they wanted. But this same argument about how it would require someone manually implementing such a thing goes both ways: it also means that most other possible things are likely not going to be hanging around quietly adding complexity under the rug (not saying this is the case w/ rust, but it is with various other systems).

> But the thing is that to a zig developer, it's clear that this is explicit static polymorphism getting compiled into a plethora of monomorphic callsites

To a Rust programmer it is clear that the toolchain defaults are set up to satisfy the 99.99999999% of programmers whose goal isn't to "write the smallest hello world possible".

You can change these defaults in your system config if you don't like them.

Like you mention, one size doesn't fit all.

To me that's one of the main differences between Zig and Rust.

It's also not entirely about clarity of facts per se, but rather which facts. It's clear for a javascript developer that doing NPM install is going to bring in half the universe worth of transitive deps and that they can opt out by writing things by hand.

Most js developers' goals are indeed not to write the smallest hello world, and here we are in a world where people complain of news websites pulling MBs of scripts. As I mentioned, one could conceivably address this by merely picking jquery over react in many cases, for example. There's a predictable sliding scale of performance between using electron, nodegui and qt proper via c++. Etc. All of this is to say: Defaults and base choices matter.

I think in terms of servicing end users, rust and c++ are somewhat unique in that they are relatively complex languages targeting ambitious performance goals, at the cost of more language complexity. That has its own appeal for sure, but I echo esbuild author's sentiment about a language having to be fun for side projects, and for me personally, I like the simple "very sharp knife" variety of technologies

It's not clear to me.

I'm not a Rust programmer, but I've written several projects where I'm pulling in a large Rust dependency, because the best available library for something is implemented in Rust, and it offers a C interface.

These libraries I've used have been very good, but also, extremely large. Like, over 100 MB .a files. They slow down compilation noticeably just by linking, and while a lot of the size is stripped out during linking, the resulting sizes tend to still be uncomfortably big.

So to me it seems Rust binaries tend to be huge both on the small end large ends of the spectrum, and they are large to the point where it becomes a real problem.

You should open an issue with whoever packages that library.

Nobody expects library consumers to be experts in how to build libraries.

But this isn't any different that depending on a C library, and discovering that the Makefile they are using compiles it with -O0, and makes it super slow.

To a non-C programmer, flipping -O0 to -O3 is wizardry.


Point being, you are not a rust programmer, and the defaults that are good for "development" are not good for "shipping" and therefore no good for you. If your library doesn't do this right, open a bug. The library can override this, and a package that builds it should override this.

No, this is a fully optimised release build, built according to official instructions.

Rust code is just very big. This seems to come with the territory.

> Rust code is just very big. This seems to come with the territory.

As someone that uses Rust every day to program STM32s with 128KB of space... i... disagree...

Rust is the only high level language I know that allows me to customize the language run-time at will, allowing me to provide my own unwinding, threading, memory allocation, ... runtimes, to be able to program tiny machines as if I was just programming any normal one.

With other languages either I can use the standard library implementation, or I can't use the standard library and some language features at all.

With Rust, I can't use the standard library implementation for desktops, but I can just swap its backend by an embedded one, and am still able to re-use all the high level code, libraries from crates.io that use it, etc.

for zig it's 1.8 kB on x86_64-linux with minimal fiddling. i386-windows comes out to 2.5kB.

    $ cat hello.zig
    const std = @import("std");
    pub fn main() void {
          std.io.getStdOut().writeAll("Hello, World!\n") catch {};
    $ zig build-exe hello.zig --strip -OReleaseSmall
    $ zig build-exe hello.zig --strip -OReleaseSmall -target i386-windows
    $ ldd hello
     not a dynamic executable
    $ strip hello
    $ ./hello
    Hello, World!
    $ wine hello.exe
    Hello, World!
    $ ls -hl hello hello.exe
    -rwxr-xr-x 1 andy users 1.8K Mar 10 21:15 hello
    -rwxr-xr-x 1 andy users 2.5K Mar 10 21:15 hello.exe

No min-sized-zig github repository needed.

I work in embedded these days, commented on this topic on Reddit a few days ago: https://www.reddit.com/r/rust/comments/m0irjk/opinions_on_ru...

Not parent, but from my perspective implicit RAII might fall into this category, though I suppose that's specific to deallocation. Zig also requires you to pass an allocator to any function that requires dynamic allocation, which while this could be considered a bit extreme, is certainly less "opaque".

Would very much like to read your thoughts on V[0]

[0] https://vlang.io/

Honestly, I was put off by the early marketing hullabaloo, so I haven't looked closely. What's most problematic in my mind is that from what I've seen, I don't feel that the V lead is as forthcoming about issues and limitations as other project leads are. I'd much rather talk to a project lead that will give it to me straight if their stuff doesn't work.

From a technical perspective, my understanding is that V similar to Nim, in the sense that it compiles to C source code. That's fine and all, but IMHO, zig has a huge leg up in this area because it does first class cross compilation of C source code to binary form. To my knowledge, no other tool does this (other than maybe cosmopolitan, if you ignore everything outside posix...)

If I were to use V, I'd probably end up using `zig cc` as the `cc` for it

> zig has a huge leg up in this area because it does first class cross compilation of C source code to binary form.

What's the advantage of this? Compilation speed?

Doesn't that then mean that zig can't gain compiler improvements the way using an external compiler can? For example, if C is the intermediate language for $HIGH-LEVEL-LANGUAGE, then $HLL benefits every time that the specific C compiler being used is upgraded.

The advantage is cross compilation. Gcc and friends are notoriously difficult to setup for cross compilation (so much so that some don't even bother trying and either only offer source or just compile from different machines running the target platforms, and even this can be a bit of an ordeal)

Zig uses LLVM under the hood so it does benefit from LLVM upgrades. Also, there's devils in the details. For example, the copy ellision thing is something that required deliberate implementation; it doesn't just come for free if you're naively emitting C.

Thanks, that's a good answer.

> For example, the copy ellision thing is something that required deliberate implementation; it doesn't just come for free if you're naively emitting C.

I don't know what copy elision is. Can you explain that too?

You can read more about it here: https://github.com/ziglang/zig/issues/2765

How does V handle reference cycles?

> Most objects (~90-100%) are freed by V's autofree engine: the compiler inserts necessary free calls automatically during compilation. Remaining small percentage of objects is freed via reference counting.

> The developer doesn't need to change anything in their code. "It just works", like in Python, Go, or Java, except there's no heavy GC tracing everything or expensive RC for each object.

So how does V handle reference cycles? A partial solution based on static-analysis (autofree) isn't going to catch all reference cycles, by definition, and automatic reference-counting won't catch reference-cycles either.

I have to agree with lhorie's comment: I get the impression I'm only getting half the story.

The Nim folks also often seem to skim over the question of reference cycles.

If the Nim folks seem to skim over this, it is probably because of the large diversity of options & concerns. You can --gc:none or --gc:arc to blow off the reference cycle problem, but the default gc handles it fine (and --gc:orc and --gc:markAndSweep and --gc:boehm and ...). Here is a nice, recent blog post about ORC [1].

So, in this case the skimming is just an "abbreviated story" which seems reasonable when the full story is long (the Nim project has been around since before 2008...).

For V, well it did have a huge initial hype/propaganda cycle and is also quite young, and I believe the V author made a bunch of extreme claims before publicly releasing the code for scrutiny/corroboration. So, that may be much more suspicious.

[1] https://nim-lang.org/blog/2020/12/08/introducing-orc.html

You're right, ORC does look interesting, I was too dismissive. Their conventional GCs are way behind the state of the art though (e.g. in HotSpot), but I don't really blame them for that, implementing a cutting-edge GC is an mammoth task and extremely technically demanding.

From your linked article:

> it turns out there are lots of ideas that the GC research overlooked. Exciting times for Nim!

I wish them luck, this is a field where great progress has been made over the last decade, hopefully more to come. If they really can beat the JVM head-on, that would be very impressive, especially if they do so while using C as an intermediate language.

Something often overlooked in comparisons is "The many dimensions of beat". In some ways ORC probably already beats the best JVM while in others it may never. The best benchmark is, as always, one's own code. :-) I've found Nim quite efficient, though.

The one time when I had trouble with Nim's GC perf was years ago implementing Wolfe Garbe's Symmetric Delete spell checking algorithm [0]. Even then, it worked with nim c --gc:boehm but blew up with others. So, there was nothing stopping me from doing it in Nim. This was my initial "in volatile RAM" solution before I went to a fully persistent on-disk memory mapped solution [1]. SymDel is an algorithm particularly demanding of prog.lang run-time basics as explained in that link. I've come to think of it as a good stress test. { Not the only one, of course :-) } I wonder how well Java's GCs would fare.

[0] https://github.com/wolfgarbe/symspell

[1] https://github.com/c-blake/suggest

If you can spare 60-90 minutes, I hosted [0] a conversation with the creator of Zig, as well as Odin, plus a former co-worker who's a compiler engineer.

They're all vibing in the same space that try to fill the niche we believe is missing.

[0] https://media.handmade-seattle.com/the-race-to-replace-c-and...

That's a great window on programming languages that exist in the systems programming space but that are not Rust, C or C++.

Thank you for doing this!

I've listened to this and a few other episodes with Andrew and Ginger Bill, and they're very interesting, I highly recommend them.

I've written a decent amount of Rust and Go. The reason Zig is on my watch list is because I see it as a potentially better (ie faster) Go, or C+=1. Go handles most of my needs, but the way I write it tends to involve a decent amount of manual resource management (mostly mutexes), so I don't see the manual memory management in Zig as a big downside.

Rust is a great language, and I love it, but it's hard for me. Writing in Rust reminds me a lot of test-driven development. It gives me such a great feeling of safety and control over your process, but at the end of the day it can be very tedious. The real killer is when I'm trying to prototype. If I don't already know what my interfaces are going to look like, Rust really slows me down. Compile times don't help here either.

If I were implementing a well-known protocol and had a general idea of how to architect it, or just really needed it to be rock solid, I would strongly consider Rust first. I've been working on a lot of protocol design the last couple years so it's been more prototype-heavy.

Note that most of my Rust experience involved pre-async/await asynchronous networking code. I'm sure it would be a better experience for me now. I should also note that programming in Rust yields some very magical moments, such as parallelizing loops by changing a single line. It's a special language, and not going anywhere. I hope to find reasons to use it again in the future.

EDIT: Oh, another place Zig may have a big advantage over Go is binary sizes, particularly for things like WebAssembly.

> The real killer is when I'm trying to prototype. If I don't already know what my interfaces are going to look like, Rust really slows me down.

This requires practice, but given the choice, competitive programmer teams picking Rust do quite well in competitions (ICFP has been won by teams using Rust 3 years in a row, and in the last 3 years, two teams in each year's the top 3 used Rust): https://www.reddit.com/r/rust/comments/ctzmo2/icfp_2019_prog...

There are teaching materials for competitive programming with Rust, you might want to check those. I think they are a good way to learn how to "prototype" in Rust.

It's different enough from other languages that this is a skill that must be actively learned :/

I have no doubt it can be learned, and I've tried to get over the hump multiple times, but still don't feel as comfortable with Rust as I do with every other language I've worked in, even those I've only spent a few hours in.

It's all about tradeoffs, and at a certain point you have to question whether the academic assurances offered by Rust are worth the complexity, given the problem at hand. As I said, it's obviously worth it for some problems, but I don't think it is for all problems. Which is great! I think the world would be boring if there was one language to rule them all.

feel you, just pointing out that trying competitive programing and their resources might give you a different point-of-view to attack the issue

but like you mentioned, its always nicer if there is no "issue" to be learned in the first place

Zig, like C, C++, Ada, and Rust -- but very much unlike Go -- is a low-level programming language, i.e. one that gives you virtually full control over implementation details such as function dispatch and memory management. I would say that Zig is the first (more-or-less) known low-level language that is revolutionary in its design. Rust adopts the C++ (and also, arguably, Ada) philosophy of a low-level language that attempts to appear high-level, and ends up being very complex. I would say that C++, Ada, and Rust are all easily among the top five most complex programming languages in software history. On the other hand, you have C, which is "simple" in the sense that it has few features, but is extremely unsafe.

Zig offers a completely new beast, and I'd say a vision for how low-level programming can be done. It is a very simple language -- a C/C++/Rust programmer can probably fully learn it in a day or two -- and yet it is about as expressive as C++/Rust, and much safer than C++. It does that with a single general partial evaluation feature called "comptime" that replaces, rather than augments, generics, traits, macros, while maintaining their capabilities but being arguably simpler than all of them.

Like Rust, Zig places a very high emphasis on correctness, but its approach is different. While safe Rust eliminates all undefined behaviour, safe Zig eliminates many/most, leaving others up to detection via simple testing. What Zig lacks in sound guarantees, it makes up for in being easy to understand, analyse and test.

Zig also has an exceptionally good tooling story around incremental compilation and cross compilation.

In short, I would say Zig offers a completely novel approach to low-level programming that would appeal to those who value a simple language. While Zig and Rust do target a similar niche, I believe they'd attract programmers with very different aesthetic preferences.

For ultra-low-latency programming, you generally want to avoid dynamic memory allocation on the hot path. Instead it's common to use a slab allocator, from memory on the stack or that was dynamically allocated on startup, to continuously reuse the same memory. E.g. when a web request comes in, allocate all temporary per-request state from a single bump allocator, then after the request has finished just "free" everything by setting the "allocateFromHereNext" pointer back to the start of the allocator's buffer. This is not only the fastest means of allocation/deallocation (just bump a pointer), but also means that new requests can reuse the same memory addresses (the allocator's internal buffer), achieving better cache locality as the memory's already in cache.

Zig is ideal for this as all stdlib functions that allocate take an allocator as a parameter (so there's no hidden allocation), and the stdlib provides a bunch of allocators. This is even better than C++, where some functions like std::stable_sort unavoidably allocate memory. Rust doesn't have much support for custom allocators in the stdlib although it's being worked on, but it's more difficult to implement in Rust as the compiler must be able to ensure that the allocator (or at least its buffer) outlives all the things being allocated from it.

I'd be interested in "Why Zig over Rust?".

Having dabbled with Rust, it seems to be the perfect solution/replacement for C/C++. I'm not clear why I should bother with an alternative.

I have spent some time with Rust and written some non-trivial Zig code (a simpler assembler). My simple take-away is that Rust is simply too complicated.

I spend some time on other technologies on occasion. I could get into Zig fairly quickly and do some useful stuff with it.

With Rust it felt like I had to really decide on some significant time investment to be able to get anything done. It really reminds me a lot of Haskell. Those kinds of pure, elegant languages which are awesome once you grok them, but which is never really going to be used apart from in a small niche because they are too hard to learn from an average developer with limited time on his hands.

Rust and Haskell is more like the languages I looked for when I was younger. When I had these utopian visions of THE best programming language. I have long lost any belief in that. I do think strong type systems are helpful, but I also think they tend to get overhyped and overrated.

I am with Rob Pike, one of the Go designers on this. Writing correct code is a lot about understanding and being able to reason about that code. A simpler language makes it easier to reason about code and understand it. That makes it more likely that you make the code correct or is able to maintain and fix it.

I do believe this complexity barrier begins at different places for different people. But for me I think Rust is too complex. Knowing myself I would get back to 3-4 weeks old code and wonder what the hell I wrote.

If you get into that situation you are likely to make mistakes. This is what I believe developers often forget when chasing down the BEST tool. That ultimately it is your brain that is supposed to solve problems not the tool. If a tool solves some problems but reduce your brains ability to do its job, then the tool isn't really an aid.

To be honest I am not actually certain if Zig has hit the sweat spot either. Go is quite good. You can pick up quite old Go code and still read it with relative ease.

Zig is definitely not as easy as Go to read. I feel like I have to wait until a 1.0 release before passing judgment. Some of the issues I felt I had with Zig is down to lack of documentation and rough edges in the standard library.

I really enjoyed your comment. Probably because I've been thinking the same for a long time now. I've had to learn and write a lot of C++ in the past year and the only way to stay sane is to use as small subset of the beast as possible. In many ways Rust is a better C++ - no undefined behaviour, no lvalues/rvalues/xvalues/..., complicated move semantics, tooling, etc. Rust feels like a nice subset of C++ with additional safety features on top.

However, once people reach for C, C++, Rust or Zig - performance is on the line. To push the maximum out of the program we need to do as much compile-time programming as possible. C++ template meta-programming language is Turing-complete but extremely hard to learn and effectively program with. Rust compile-time programming is evolving - two sorts of macros and a Zig-comptime-like `const fn` feature. What I like about Zig's approach is summarized well by pron[0].

[0]: https://news.ycombinator.com/item?id=22112382

> it seems to be the perfect solution/replacement for C/C++

Many languages tried for 50 years to replace C without success.

While Rust brings many improvements for low-level development on classical architecture, simplifying hard problems (especially memory safety), there is still a HUGE number of platforms (embedded ones mainly?) where C and even Assembly are still relevant.

Believing that any language will burry C/C++ is ignoring what C/C++ are.

It looks like Zig requires manual memory magagement, right? That seems like a pretty big downside compared to GC or the Rust approach.

True. That is however countered by a vastly simpler language. Language support for deferred cleanups[0] helps a bit. Also the GeneralPurposeAllocator[1] contains memory safety debugging features that make analyzing such bugs bit nicer.

IMHO it's a significant net gain over something like Rust because of the simplicity for most of the problems out there. I got 99 problems but needing absolute memory safety ain't one.

[0] https://ziglang.org/documentation/master/#defer

[1] https://ziglang.org/learn/samples/#memory-leak-detection

That's because the Zig approach is "no hidden control flow".

Everything in Zig is explicit, this is not a downside, this is a feature of the language.

Choose the tool that best fits your needs.

That's fair but that does mean that I wouldn't consider it unless I really needed that kind of control. That's a niche few devs need to worry about.

Well the choices so far seems to be manual memory allocation, RAII or GC. The complexity of C++/Rust indicates that RAII is certainly not a panacea either. GC offers distinct trade-offs as well.

Zig takes a stab at the problem by trying to make manual memory management more ergonomic. I've been pleasantly surprised at how easy I found it to do tasks in Zig I would otherwise have used a scripting language like Python for.

If go's fast GC is taking a stab at C's territory, then I think Zig is taking a stab at the managed runtime's territory. Ultimately the judge is still out.

One advantage of that kind of control is that your code does not depend of the implementation of any abstraction (except the operating system if any?) since everything is written "in stone".

I also think it makes a great learning tool, mainly to understand:

  - how a GC or memory management is/should be implemented
  - what pain points they solve (or: why they are important)
When you don't have to think about the abstractions but only your code, it prevents the student from being confused about what happens when.

But I would not use it in production, just like you said, because in production, I want to maintain as little code as possible, and delegate the complexity to other better tools/people.


Like what?


But that's unrelated to "social justice", isn't it? It is idiotic nonetheless.

> Rust, it seems to be the perfect solution/replacement for C/C++

If you are the sort of person who writes "C/C++" you'll have some trouble understanding the purpose of Zig... maybe start by grokking why the expression "C/C++" is nonsensical?

Argh I'm so tired of this "technicality". Yes, technically C and C++ are different languages, and C is not a subset of C++ (which is a pretty big problem btw). But C++ has been derived from C, then forked its C subset into an incompatible dialect, while not fixing any of C's problems (which would've been an actually good reason to not call it "C/C++"). I think it's fair to call C++ "C/C++", because it will always be a confusing mishmash of C and the features that C++ added on top.

Sorry about the rant.

The problem is that we have no idea what you're talking about when you say C/C++. For you it seems to be C++, for someone else it might be both or it might be "one or the other but I don't know which." For yet someone else it might be really just C.

If you were to think that Rust is a good replacement for C++, why not just say that Rust is a good replacement for C++? That makes it clear what you mean, and thus the obvious follow-up: a lot of people would like a replacement for C, which is arguably not what Rust is.

And this is very on-topic for a discussion where someone asked "why Zig over Rust?" Zig is arguably closer in spirit to C than Rust is.

When you say Rust is a good replacement for C/C++, a lot of people read it as if you were treating C and C++ as one language and that Rust is a good replacement for both. It should not be hard to see why this is very controversial, given that they are very different languages and people tend to like one and dislike the other. It's going to be very hard to please both camps.

I agree, and my reply was more in reaction to the snark in the reply above. Even a lot of C++ programmers don't know that C is actually quite a different language than the "subset of C++ that kinda looks like C" and are surprised when they see "proper" C99 code for the first time. If even "experts" don't quite know the differences between "actual C" and the "C-like subset of C++", it's a bit much to ask the same of people who only have casual knowledge of "C/C++".

You are right, my message did come out as snarky... sorry about that, it was not my intention. I agree with you, the Zig language appeals precisely to those people who enjoy proper, modern C, and want a better language. It does not appeal at all to people who want "a better C++". Thus I was suggesting to understand the distinction between C and C++ as the main point.

Not everyone agrees with the line "Rust is a C++ replacement, not a C replacement." I work with a bunch of C folks who never considered C++, but love Rust now.

It's sort of the same thing as people who say "Go is a C replacement." That is a thing some people say, but to me, it never really made sense.

The key is, what a language means to the person who uses it can differ between people. And what benefits they see out of another language can differ between people.

Look bro I like Haskell so Haskell is a perfect replacement for Scheme/Java.

Zig is much easier to learn and simpler and has much of the safety when compared to the Rust-C spectrum. It also seems to be fun to write for lots of people. The metaprogramming system is also first rate and easy to understand.

It's pretty dreamy for us who wanted Rust to become "better C" but instead it became "better C++". Granted, that's what it was trying to become from the get-go, I just got confused and hopeful.

Rust is to C++ what Zig hopes to be to C.

I don't mean "hope" disparagingly. We're just a few releases away to reach that point, IMO.

Go is what Thompson & Ritchie might have developed in an alternate universe had they used Limbo and Plan 9 before designing the C language.

well so far Zig looks to be one of the very few lang that got the async/sync two color functions sorted out.

It is also interestingly solved in Unison language (using "algebraic effects"), although it's just an early alpha version.

See https://jaredforsyth.com/posts/whats-cool-about-unison/

As a rough high level overview:

Rust is what you might imagine if starting over from scratch with C++, with a big emphasis on memory and type safety.

Zig is more like starting over from scratch with C, with a moderate emphasis on safety (no garbage collection or borrow checker, but undefined behavior handled more sanely than C)

I had the impression Rust was more compatible with C than C++.

Well, every language is more compatible with C than C++, because C is a subset of C++†. Even C++ is more compatible with C than with C++ compiled with a different compiler :)

† I know it's not a subset, but near enough for our purposes.

Another interesting parser combinator written in Zig is Mecha.


(author here) Mecha looks really nice!

One difference that makes Mecha's look really nice/simple is they are being built at compile time, whereas the ones I outline in this blog post are being built at runtime (with the intention of later building parsers at runtime _based on other parsers output_.

Or, at least that's what I gleaned from a quick skim of Mecha. Very excited to play around with it soon!

You got it right! I think it's nice to be able to contrast the two implementations :)

Zig is more capable than I thought.

It looks like there are definite plans to make it moreso, and more interesting.

The "more ergonomic C" crowd is missing some important nuance.

Yep. It can be "C but good" or it can be "a language with all these fancy features". It can't be both at the same time.

Zig is pretty close to being both at once, simply because they've found clever ways of combining existing features to avoid needing entirely separate features to accomplish the same goals.


They avoid the macro system by leaning hard on compile-time evaluation, leveraging dead code elimination to replace preprocessor/macro-driven conditional compilation.

They get generics in a similar way, by letting compile-time functions construct types.

Varargs are implemented with anonymous structs, and modules are also secretly just structs, which makes interop with C much simpler.

It's actually deeply impressive how far they've been able to go with a language that actually has quite a small number of separate constructs, and it doesn't feel hobbled by doing so.

This is all about creating orthogonal features. In any system where you are able to create a small number of features which combine in very flexible ways you can achieve a minimalist language together with immense expressiveness.

Zig is in that category. It takes some powerful ideas and combine them in ways that gives Zig a lot of power despite a small feature set.

What other languages are in that category? I think Lua is one, for a simple language I've seen insane magic built with it when it comes to metaprogramming.

The features are coming out more slowly and it feels like things are asymptotically converging to what will be the language in 1.0

I wish someone would teach low level programming from scratch with Zig

Can you clarify what "low level" means to you?

Code that manages and manipulates memory, network stuff, GUI. Basically anything more intensive than client-side JavaScript


While having nice libraries is always great, I'd also like to plug the simple zig std library functions.

I was very pleasantly surprised by how easy was to start parsing simple stuff using std.mem.tokenize() and std.mem.split(). Somehow watching Andrew programming some Advent of Code using these very simple functions made it 'click' for me how parsers are just ordinary programs.

The only parser combinator experience I have is with FParsec in F# and Haskell parsec. I am not sure if non (purely) functional programming language is suitable for parser combinators. Yes, it works but why?

Performance. I'm looking to try implementing a regexp-like engine via a runtime parser generator built using parser combinators and _hoping_ to get performance near what optimized regex engines get.

Seems like an interesting way to get nice, performant template parsing without having the performance overhead that higher level languages typically bring.

This is all speculative, though - I haven't actually done it yet, so maybe I'll find its incredibly slow and a bad idea :)

I'm not sure if this is done in practice, but in theory, if you compile a regex to a DFA, you can then minimize the DFA. I don't know if you can do anything similar with parser combinators, but if not, that could be a potential performance gap to watch out for.

I imagine this works for regex engines like RE2 that don't support backtracking, but would not be possible if supporting more advanced regex features like backtracking?

I definitely have more to learn about automata and FSMs :)

Hi again. :-)

In practice, production grade general purpose regex engines are never implemented with formal DFAs in all cases. They take too long to build and use way too much memory, especially when Unicode is involved. For example:

    $ regex-cli debug dfa dense '(?-u)\w{10}' -qm
                parse time:  45.418µs
            translate time:  14.626µs
          compile nfa time:  373.919µs
    compile dense dfa time:  564.7µs
          dense dfa memory:  5436
     dense alphabet length:  23
              dense stride:  32

    $ regex-cli debug dfa dense '\w{10}' -qm
                parse time:  41.344µs
            translate time:  27.072µs
          compile nfa time:  6.360692ms
    compile dense dfa time:  1.038092145s
          dense dfa memory:  2998332
     dense alphabet length:  115
              dense stride:  128
And this is including oodles of tricks for shrink the DFA's size. (Note though that the above examples minimize the DFA, which adds significant time. Without minimization, the latter is about an order of magnitude faster.)

So even this simple regex is 3MB in size.

Finally, DFAs can grow exponentially in the size of the regex, which means you can't really use them with untrusted regexes.

In practice, regex engines like RE2 use a hybrid NFA/DFA, or "lazy DFA." The DFA is computed at search time and cached. If too much cache is used, then the regex engine falls back to an NFA simulation.

Beating the performance of a DFA without going into more exotic things like vectorized algorithms or more specialized regex engines (bit-parallel NFAs) is pretty tricky.

And also, do not repeat the same mistake that the OP seems to make: conflating what DFA and NFA mean. A lot of people like to call things like PCRE an "NFA" implementation, but it's not, because meaningfully more powerful than what an NFA can do. General purpose regex engines tend to be split between ones that are based on finite automata and ones that are based on backtracking.

Thanks for the super detailed response again, Andrew! I have a ton to learn from you about how production grade regex engines work honestly :) Do you have any tips for learning as much as you have aside from mulling over papers and existing implementations?

Also, by "do not repeat the same mistake that the OP seems to make: conflating what DFA and NFA mean." do you mean in the article? I wrote that mostly with the intent to say _"this ain't how things are normally done, DFA/NFA exist and are used"_ but admittedly don't have more than a surface-level understanding of them. If you have tips on how to clarify that wording so I don't confuse others, at the very least, I'd much appreciate it! :)

Yes, I mean in the article. Perhaps I was too brief, but the TL;DR is to use "finite automata based regex engines" and "backtracking based regex engines."

Basically, there's a long tradition of calling the latter "NFA engines" and the former "DFA engines." But doing so is inaccurate and confusing. Especially since "DFA engines" also include things called "NFA engines" that are different from the former "NFA engines" as used when describing backtracking engines.

If that doesn't make sense, then just remember this: DFAs and NFAs have equivalent expressive power. So if you're describing a regex engine that gives you thinks like backreferences, recursion and all sorts of other bells and whistles, then it can't be an NFA. It is almost certainly implemented with backtracking.

> Do you have any tips for learning as much as you have aside from mulling over papers and existing implementations?

I started with Russ Cox's article series and read the RE2 source code. Then I built my own.

I'd also check out anything related to Hyperscan. It's a deep rabbit hole to tumble down though.

Understood, this helped me a lot, thank you. I definitely had confusion around DFA/NFA being a _type of regex engine implementation_ and see now why that is incorrect.

I updated the article[0] so as to hopefully not confuse others, really appreciate you taking the time to respond to me again, and sorry for not understanding your earlier comment :)

[0] https://github.com/hexops/devlog/commit/0159b510556c2943cbca...

No worries. Feel free to email me if you want to geek out about regexes. :-) My email is on my web site, linked in my profile.

Python has pyparsing: https://pypi.org/project/pyparsing/

Maybe it's because non-FP languages typically don't support operator overloading, so you'd end up with verbose grammars like:

    new Literal("true").or(new Literal("false"))
and then you'll probably use a parser generator like ANTLR.

Python is no Scala when it comes to operator overloading, but it has enough to make parser combinators work cleanly. funcparserlib [1] and Parsita [2] are two such libraries.

For example, this does what you probably expect in Parsita:

    real = reg(r'[+-]?\d+\.\d+(e[+-]?\d+)?') | 'nan' | 'inf' > float
(Full disclosure: I wrote Parsita.)

[1]: https://github.com/vlasovskikh/funcparserlib

[2]: https://github.com/drhagen/parsita

So... why are they awesome, again? Is this some kind of functional approach to parsing? (I know that functions compose very well.)

I was already slightly familiar with parser combinators but still really enjoyed this article to see how one might implement them in Zig, which I've looked at but not written anything in myself.

For another introduction to parser combinators, I can recommend the Parser Combinators From Scratch series on the Low-Level JavaScript YouTube channel. In general he has a lot of really nice videos that are great introductions to low-level programming concepts, especially for people who mainly know JavaScript.

[0] https://www.youtube.com/playlist?list=PLP29wDx6QmW5yfO1LAgO8...

Checked Zig. Doesn't have a memory manager. It's "free all at program termination" or manually manage memory like C.

It is much better structured than C. Everything in the standard library that allocated memory takes an allocator as first argument.

And you got cleanup with defer statements. Together this makes the memory management story very different from C.

It does have some nice improvements over C though, such as defer-style deallocation, which can help with many potential bugs.

One of the best thing about parser combinators is that I was writing them before knowing what it was.

My first programming language being C[1], all you had was functions and composition, it's the natural pattern to write parsers.

[1] - Well, in truth it was GML from Game Maker 6, but I quickly went to learn C :p

Would Zig be considered suitable for application programming? Or for programming a web API over HTTP? Or is it mostly suited for low-level programming?

Zig is, very roughly speaking, competing with the C language. We might say that Zig is to C what Rust is to C++, again, very very roughly.

There's at least one Zig HTTP server framework out there [0] but it's not really what the language is intended for.

[0] https://github.com/frmdstryr/zhp

OT: If anyone is looking for good parser-combinator lib in rust, here's one lesser-known (not mine) https://github.com/J-F-Liu/pom

Is Zig the new cool kid on the block?

It is! Almost as cool as Julia ;-) There are some pretty cool ideas in there, which I think are worth getting familiar with even if you never end up using Zig at all. I suspect some of the ideas will get copied by other languages.

the coolest

I haven't dug into this yet, but the table of contents appears to be broken.

Author here, looks like I messed up one of the links by accident. Should be fixed now :)

Meta: the final word in the target page's title ("awesome") is missing from the submission title. It becomes a little hard to

We've made the title awesome again.

I'm confused about "callconv(.Inline)" on the function pointer. Isn't a function pointer not inlinable by its very nature?

Whenever you make a submission, always check the title. The automatic title editing from HN often makes sense, but not always.

At first I didn't understand, but now I think I get

read it

ps: people, it wasn't an order, just a metacomment about pruned suffixes

Unfortunately, the library story is quite limited, so for me this doesn't really justify spending more than a day looking into. Currently using Nim, which has better library support. Maybe it's possible to cross-compile Nim/Zig libraries? Also, it's quite difficult to find pure C libs, seems everything is C++ at some level.

Guess it depends on what you want to do. I imagined I'd might use it for small Unix tools, simple 2D games, micro-controller stuff, maybe simple compilers.

But I get your point. Would be nice with the kind of selection you find in the Go standard library. But then again they don't have Google resources.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact