Hacker News new | past | comments | ask | show | jobs | submit login
Hobby x86 kernel written with Zig (github.com)
355 points by netgusto 4 months ago | hide | past | web | favorite | 223 comments



Hi, author here! I've just finished writing the pre-emptive multitasking [0](only round robbin though, nothing fancy).

I'm currently writing an ATA driver [1], the idea is to implement ext2.

I used to do this in Rust but I switched to zig for maintainability and readability over Rust. It seems that with `comptime` I'm able to make a lot of things optimal.

Overall I have to say kernel programming is _hard_ but very rewarding when it finally works, it's really demystifying computers for me!

[0] https://wiki.osdev.org/Brendan%27s_Multi-tasking_Tutorial

[1] https://wiki.osdev.org/IDE


> I used to do this in Rust but I switched to zig for maintainability and readability over Rust.

Can you expand on this ? I'm asking out of curiosity because I want to learn a "system" programming language (for whatever definition there is to this term). So far I briefly tried Rust and Nim and found the former more difficult to read. I know nothing about Zig, how would you place it between these two ?


In general you'll find that zig is easier to read than Rust (see the first version of this project in Rust [0]) because it's a simpler language. For kernel programming this is even more so the case:

* zig has native support for arbitrary sized integers. In Rust I used to do bitshifts, Now I just have a packed struct of u3/u5/u7 whatever (see `src/pci/pci.zig`). Of course Rust has a bitflags crate but I didn't find it handy, this is a case where native support vs library support means a world of difference.

* zig has native support for freestanding target. I used to have to build with Xargo for cross compiling to a custom target, I also was forced to `[nostd]`, and some features I was using forced me to use nightly Rust. In zig I have a simple `build.zig` and a simple `linker.ld` script, just works.

* zig has nicer pointer handling. A lot of kernel programming is stuff that Rust considers unsafe anyway. It's not uncommon to have to write lines like `unsafe { (ptr as const u8) }` to deref a pointer in Rust, which is a pain because this kind of thing happens all of the time. Also you have to play around with mut a lot like this: `unsafe { &mut (address as mut _)`. It just felt wrong a lot of the time, where in zig you have either `const` or `var` and that's the end of it.

* zig is really fun to write! this is something that comes up often in the community, after years of C it's just very refreshing.

Some things zig is missing:

* Package manager, this is coming soon. [1]

* Missing documentation for inline assembly (I think this part is going to get overhauled, as Andrew Kelley is writing an assembler in zig atm [2]).

I don't know Nim, but I believe it has a garbage collector so it could be tricky to use for kernel programming.

[0] https://github.com/jzck/kernel-rs

[1] https://github.com/ziglang/zig/issues/943

[2] https://www.youtube.com/watch?v=iWRrkuFCYXQ


It's true that for bare metal programming Rust is quite far from 1.0. You can't even do inline assembly in stable rust...

Arbitrary sized integers do sound very convenient, although I guess the counterargument would be that they simply can't map to machine types so you'll have to have magic behind the scenes which may not make complete sense in low level languages. Still, C has bitfields in structs which are sort of the same thing so why not.

Pointer handling however I'm not sold on. I'd argue that the very cumbersome pointer handling in Rust is a feature, not a bug. It's explicitly discouraged, and for good reasons. Map your pointer into a clean and safe reference, slice or wrapper object ASAP and leave raw pointers to the messy details of C FFIs and ultra low-level code. If you end up having to mess with raw pointers everywhere in your codebase you're doing it wrong, as far as Rust is concerned.

I think this is reasonable even for kernel code. After all typically you wouldn't dereference register pointers directly in C code because of volatility issues and making it obvious that you're accessing hardware and not RAM (readl/writel or similar), so having a safe wrapper adds no overhead in my experience.


Ok maybe I was user pointers wrong in Rust. And I must confess that I never really understood lifetimes either, however

* I don't think that low level code needs to be "messy", and

* I don't think that adding an abstraction on top solves anything complexity wise.

For memory paging I used to have a whole library in Rust that handled hierarchy of page tables with very abstract pagetable classes and interfaces [0]. Now I just have 80 lines of zig that handles everything [1]. I much prefer the latter.

> I think this is reasonable even for kernel code.

It's not, for some of the reasons that Linus doesn't want C++ code in the kernel [2]

[0] https://github.com/gz/rust-x86/blob/28d5839933973e0b639ef354...

[1] https://git.sr.ht/~jzck/kernel/tree/master/src/arch/x86/pagi...

[2] https://yarchive.net/comp/linux/c++.html


The Rust code you wrote seems to be extremely heavy on boilerplate. A couple of macros could have drastically reduced the line count.

I don't think what Linus wrote in 2004 on C++ has much relevance for 2020 Rust, especially for a new kernel that doesn't have a ton of contributors. Most of his complaints seem to be on how C++ abstractions can make it hard to review unknown code and how 2004 C++ code bases often had extremely messy code. Rust IMO doesn't suffer from those problems near as much as 2004 C++ did.

Also, Google is using Rust for part of their new Fuchsia kernel so at least they think Rust has something good to offer kernel dev.


Small nitpick: Google isn't using Rust for the Fuchsia kernel. The kernel, Zircon, is written entirely in C and is based on Little Kernel. Google is using Rust for some userspace drivers (Fuchsia being a micro-kernel).


> The kernel, Zircon, is written entirely in C

No, it's written in C++[1]. They're very different languages.

That said, Google seems to think C++ has something to offer to all kinds of development. It seems to work for them, but they are a heavily C++ shop. I wouldn't read too much out of their use of C++ anywhere; it'd be more surprising and interesting to see them use anything else.

[1]: https://fuchsia.googlesource.com/fuchsia/+/master/zircon/ker...


Oh that's interesting. Zircon is based on LK[0] which is in C, which is why I thought it was still written in C. I wonder how much is left of the original LK code then.

[0]: https://fuchsia.dev/fuchsia-src/concepts/kernel/zx_and_lk


Interesting, I didn't know about LK. Thanks for sharing :-).


You might find it helpful to look at Redox OS and Tock to see how they're handling kernel development in Rust. May or may not be easier than what you described.

https://www.redox-os.org/

https://www.tockos.org/


The problem with C-style bitfield values/arbitrary-sized integers is that they don't have addresses, so you can't take references to them. This means that you need special cases everywhere, and it increases the complexity of the language a lot. In Rust, integers often have methods that take self by reference, which demands an actual address. For these reasons, in Rust I think it's better that arbitrarily-sized integers aren't first class. They're never really first class anyway.


In Zig they do; here is an example: https://clbin.com/HlJ0X

Output:

    Test [1/1] test "bit field access"...
    *align(:3:1) u3
    All 1 tests passed.
Zig's pointer type has optional metadata to describe alignment, sub-byte offset, and SIMD vector element index. This generally makes things just work, and you'll get a compile error if you expected an unadorned aligned pointer. If I were to swap:

    - fn deref(ptr: var) u3 {
    + fn deref(ptr: *u3) u3 {
Then I'd get:

    test.zig:13:23: error: expected type '*u3', found '*align(:3:1) u3'
        assert(deref(&data.b) == 2);
                          ^


How do you deal with atomicity though? A big deal in Rust is that you can only have one mutable reference to an object at any given time which prevents all sorts of aliasing issues, but here if you have two mutable fat pointers, one to and and one to b, and you attempt to write them without synchronization you're going to have a big problem, won't you?

This is the main risk with bitfields in C, it makes reads and writes that appear to be independent actually access the same memory cell behind the scenes. I'm not convinced by the idea of adding even more magic on top of it to be honest, it is true that packing and unpacking bitfields is annoying and requires some boilerplate but in the end that is how it the hardware does it anyway.


The builtin functions for atomic operations provided by the language do not allow unaligned pointers, so it would be a compile error.


I don't think bitfields are even that portable in C? You end up with a weird mix of bit-endianness and byte-endianness issues. The best approach AFAICT, if you need to model a binary format with arbitrary-sized integers, is to keep the binary structure opaque and have a function to "unpack" it to a record of machine-native types. Modern compilers can easily optimize the resulting code into shifts+bit operations, and it is a reasonably foolproof approach.


You can certainly use Nim for kernels. I've created a proof of concept long ago, and things are only improving with ARC fast becoming a great alternative to Nim's GC.

https://github.com/dom96/nimkernel


> I don't know Nim, but I believe it has a garbage collector so it could be tricky to use for kernel programming.

Have a look at Project Oberon, Singularity, Interlisp-D, Smalltalk, Modula-3 Topaz, Blue Bottle (AOS), CosmosOS, lilith (Crystal variant) given that their source is available.


Thank you.

> I don't know Nim, but I believe it has a garbage collector so it could be tricky to use for kernel programming.

You're right. Still good for libraries though (or apps, but that may be outside of "system").


you're right, it is confusing, but it is optional: some toy kernels already work in nim , and with latest work on memory, you should be able to use most of the language for kernel development ! not the perfect language for that yet though, but i hope we should see more nim os examples


Are you saying that the GC is optional? If you don't use it how do you allocate/free memory?


You call malloc/free, or if working with GPU cudaMalloc/free. You can write your own memory pool or object pools, you can use destructors, and even implement your own reference counting scheme.

This is what I use for my own multithreading runtime in Nim and the memory subsystem makes it faster and more robust than any runtime (including OpenMP and Intel TBB) that I've been benchmarking against, see memory subsystem details here: https://github.com/mratsim/weave/tree/master/weave/memory

Example of atomic refcounting in this PR here: https://github.com/mratsim/weave/blob/025387510/weave/dataty...

Also one important thing, Nim current GC is based on TLSF (http://www.gii.upv.es/tlsf/) which is a memory reclamation scheme for real-time system, it provides provably bounded O(1) allocations. You can tune Nim GC with max-pauses for latency critical applications.


Does the standard library use malloc/free or does it depend on the GC? This is the part that's puzzling to me, if the stdlib depends on GC then it's harder to say that GC is optional. Technically optional but not super practical.


The majority of stdlib modules does not depend on GC.

Also, the new ARC memory manager replaces GC and can run in a kernel.


No that's not true.

As soon as you use sequences or strings or async you depend on the GC.

You can however compile with gc:destructors or gc:arc so that those are managed by RAII.


I used various modules with gc:none

I meant that new ARC GC, that will replace the current one, can be used for the kernel.

It's still a GC, technically, but, quoting Araq on ARC:

Nim is getting the "one GC to rule them all". However calling it a GC doesn't do it justice, it's plain old reference counting with optimizations thanks to move semantics.


overally i think for small os-es you can write easily a micro libc core in C or even in Nim where you define malloc/free etc and just use them directly as in C and zig.

otherwise you should be able to use something like destructors eventually


Didn't test this but some clues: https://nim-lang.org/docs/gc.html

Then I guess you would use new/dealloc



Nim's GC is optional


How can it be optional if there is lots of code that assumes you are using GC? For example, as far as I can tell the stdlib doesn't do its own allocations. Does this mean you can't use the stdlib with GC disabled? Or am I missing something here?


A friend of mine wrote a DSL for audio using Nim with GC off.

https://github.com/vitreo12/omni

I don't know enough to comment but it may be useful to look at things in the wild. This project also heavily relies on calling in C to interface with environments and the SuperCollider scsynth.


overally if you write an os you need to write your malloc/free .. and then you should be able (i think..) to use many of the gc-s anyway and more of stdlib.

But you can also think of Nim as macro-able higher level C and write it like that (but indeed you probably still need this minimal allocation support)


Your point about unsafe pointer handling in Rust is specifically what dissuaded us from using it in an upcoming project. It really feels bad prepending all of the code that you actually care about being safe with `unsafe`.


`unsafe` before a block simply means "the following code has been manually checked for memory safety, because the compiler is unable to automatically do so". Before a function it means "this can only be called from an unsafe block, because the compiler cannot enforce the preconditions it requires to ensure memory safety". What alternative term would make you feel less bad?


> * zig has native support for arbitrary sized integers

That's really cool.


>"zig has native support for arbitrary sized integers."

I apologize if this is a naive question. Might you or someone else elaborate on where arbitrary sized integers are used or necessary in kernel programming? Don't they all ultimately need to get padded out for the CPU registers to work with them anyway?


Packed binary structures. Common in device register space (the author alludes to PCI driver(s)).


I looked over zig when it was first announced and really liked the idea, it was just too fresh for me to work with.

But I've always had a mental note to go back once it got a bit more stable because I really do like the ideas behind it.

It sounds to me like it's definitely getting there and that's awesome.


Can Zig's enums and tagged unions be used to write algebraic data types?


Yes it can.


Very nice!


On the other hand safe Rust code will be memory safe..


It's Zig's goal to make it easy to write safe code, it's just that its approach to safety is very different from Rust's.


not if you have to keep using unsafe blocks.


As an ancestor in this thread pointed out: it's hard/impossible to go without unsafe in a kernel.

Proof (and they have a policy to only use unsafe when not otherwise possible):

https://github.com/redox-os/kernel/search?q=unsafe&unscoped_...


yes, that's the point i'm making.

saying "if you write your kernel in rust, it will be memory safe" is a little goofy, because writing kernels in rust requires you to drop into unsafe blocks way too often.


Not as often as you would think, Phill Oppermann has shown this: https://os.phil-opp.com/

Unsafe is definitely used, but it does not mean what I think is being implied, which is that it makes the software unsafe.


As proven by the Oberon and Xerox PARC workstations, the amount of unsafe code is quite minimal.

Source code is available.


The ammount of unsafe blocks is pretty small compared to the rest of the code. Which means those parts can be reviewed more extensively. If anything your argument is pretty goofy.


What I find is that, while Rust unquestionably improves the security of software, the argument that you can minimize unsafe code and thoroughly inspect it doesn't tell the full story, because in my experience people end up writing general-purpose unsafe blocks, and while they are controlled by the "safe" outside, the sequence of actions the "safe" code tells them "unsafe" code to execute can lead to subtle bugs.


yep, 100%

I personally think the Rust community has missed the biggest advantage of unsafe, and that's tooling. Imagine an IDE that could highlight and warn when touching state that is accessed in unsafe blocks, or when state being accessed in unsafe blocks isn't being properly tested.

Or tools that kick off notification for a required code review because code that touches state used by an unsafe block got changed.

To me, these sorts of things will have a far greater effect on security and stability than just the existence of the unsafe block in code.


But isn’t the most important step having a compiler-enforced keyword for code that seems to be doing unsafe stuff?

The rest are just plugins and tools that look for said blocks and “do stuff”. I don’t see why these tools should be part of the language.


I never said they should be part of the language, I said the rust community over-emphasizes the "safe" part of having the unsafe keyword and neglected the part that's actually useful in keeping things safe.

Another poster responded to me stating they are starting to work on those tools, so it looks like the rust community is starting to come around.

It's just that I've seen a lot of people tout the safety of rust as if the unsafe keyword was a panacea that automatically prevented problems. That attitude is really what I was responding to.


Ah, gotcha. Yes, improved tooling in this area would add a lot of value to the feature.


Those tools exist or are being worked on! Servo (last I checked) uses a “unsafe code was added or modified in this PR, please review extra carefully” bot, and Miri is on its way to becoming a fantastic tool.


That's good to know.

The Rust community can be a bit... fanatical, lets say. I've seen so many people argue that the unsafe keyword by itself makes rust so much safer than alternatives, when really safe code can cause unsafe code to explode. So you end up writing modules to protect the unsafe code, which is what you do in any other language as well.

Which means the unsafe keyword is valuable, just not nearly as valuable as I've seen a lot of people claim.

Now the unsafe keyword combined with the sort of tooling I've mentioned? That to me is a killer combination. Unsafe isn't a silver bullet, it still requires work, but it enables tools to make that work immensely easier to deal with.


> The ammount of unsafe blocks is pretty small compared to the rest of the code. Which means those parts can be reviewed more extensively. If anything your argument is pretty goofy.

I've talked about this at length before, but just because you have 'less' `unsafe` code does not mean you code is any better off. Very few operations are actually `unsafe`, and you're still not allowed to break any of the (poorly or not documented) constraints that Rust requires even in `unsafe` code.

Point being, unless you can make "safe" abstractions around your `unsafe` code that cannot be broken (Which is largely what you do in any language anyway, when you can...), the distinction between "safe" and "unsafe" is thin, because the parts of the code you declare `unsafe` are not the parts that are actually going to have the bugs. And because of the fact that the constraints you are required to fulfill in `unsafe` code is unclear, there's little way to guarantee your code is broken in some subtle way the optimizer might jump on (either now, or in a later release).

And to be clear, I like Rust, but `unsafe` is poorly thought-out IMO.


If you aren't making a safe abstraction around your unsafe code, you're supposed to mark the surrounding function "unsafe" as well, and document its expected constraints in a 'Safety' comment block. In fact, you're supposed to do the same for any piece of code that cannot provide safety guarantees about its use of 'unsafe' features. 'Plain' unsafe{ } blocks should only be used as part of building a safe abstraction.


> If you aren't making a safe abstraction around your unsafe code, you're supposed to mark the surrounding function "unsafe" as well, and document its expected constraints in a 'Safety' comment block.

Sure, but now we're back to the issue that it's unclear what constraints `unsafe` code actually has to hold, meaning ensuring your abstraction is safe is just about impossible to do with certainty (And OS kernel code is definitely going to hit on a lot of those corner cases). You may think you have a safe interface, only to find out later you're relying on internal details that aren't guaranteed to stay the same between compiler versions or during optimizations.

With that said, while I agree with what you're proposing about marking surrounding code "unsafe", it leads to lots of strange cases and most people get it wrong or will even disagree with this proposal completely. For example, it can lead to cases where you mark a function `unsafe` even though it contains nothing but "safe" code. And at that point, it's up to you to determine if something is actually "safe" or "unsafe", meaning the markings of "safe" and "unsafe" just become arbitrary choices based on what you think "unsafe" means, rather than marking something that actually does one of the "unsafe" operations.

One of the best examples is pointer arithmetic. It's explicitly a safe operation, but it's also the first things most people would identify as being "unsafe", even though it is dereferencing that is the "unsafe" operation. Ex. You could easily write slice::from_raw_parts without using any `unsafe` code at all, it just puts a pointer and length together into a structure (it doesn't even need to do arithmetic!). It's only marked `unsafe` because it can break other pieces of "safe" code that it doesn't use, but it itself is 100% safe. You could just as easily argue the "other code" should be the `unsafe` code, since it's what will actually break if you use it incorrectly.

Perhaps the biggest annoyance I have is that the official Rust documentation pretty much goes against the idea you presented, saying

> People are fallible, and mistakes will happen, but by requiring these four unsafe operations to be inside blocks annotated with unsafe you’ll know that any errors related to memory safety must be within an unsafe block. Keep unsafe blocks small; you’ll be thankful later when you investigate memory bugs.

Which is just incorrect - memory safety issues are likely to be due to the "safe" code surrounding your unsafe code - when you're writing C code, the bug isn't that you dereferenced NULL or an OOB pointer, the bug is the code that gave you that pointer in the first place, and in Rust that code is likely to all be "safe". Point being, most of the advice on keeping your unsafe blocks small just leads to people making silly APIs that can be broken via the safe API they wrap (Even in weird ways, like calling a method containing only safe code), and unfortunately there are lots of subtle ways you can unintentionally break you Rust code that most people aren't going to have any idea about.


> Sure, but now we're back to the issue that it's unclear what constraints `unsafe` code actually has to hold

The Rust Nomicon actually documents these constraints for each 'unsafe' operation. Code that can ensure that these constraints hold can be regarded as 'safe' and rely on an unsafe{ } block. Code that can't, should be marked unsafe and document its own constraints in turn.

> You may think you have a safe interface, only to find out later you're relying on internal details

If you're relying on internal details, you're most likely doing it wrong, even by the standards of 'unsafe'. There are ways to nail down these details where required, at which point they're not even "internal" details anymore, but this has to be opted-into explicitly, for sensible reasons.

> It's only marked `unsafe` because it can break other pieces of "safe" code that it doesn't use, but it itself is 100% safe. You could just as easily argue the "other code" should be the `unsafe` code, since it's what will actually break if you use it incorrectly.

No, "other code" should not be marked unsafe because this would mean that relying on slices is inherently unsafe. Which is silly; the whole point of a "slice" type, contrasted with its "raw parts", is in the guarantees it provides. This is why slice::from_raw_parts is unsafe: not because if what it does, but what it states about the result.

> Which is just incorrect - memory safety issues are likely to be due to the "safe" code surrounding your unsafe code

This is a matter of how you assign "blame" for a memory safety error. It's definitely true however, that the memory unsafe operations play a key role, and I assume that's the point that these docs are making. I agree that people shouldn't be creating faulty abstractions around "unsafe" blocks, but the reason we can even identify this as an issue is because we have "unsafe" blocks in the first place!


> The Rust Nomicon actually documents these constraints for each 'unsafe' operation. Code that can ensure that these constraints hold can be regarded as 'safe' and rely on an unsafe{ } block. Code that can't, should be marked unsafe and document its own constraints in turn. > > If you're relying on internal details, you're most likely doing it wrong, even by the standards of 'unsafe'. There are ways to nail down these details where required, at which point they're not even "internal" details anymore, but this has to be opted-into explicitly, for sensible reasons.

It's not that simple, what I'm getting at is that 'unsafe' code is still required to meet all the constraints that 'safe' code has to meet, even though those constraints are not fully documented (Because in general, in 'safe' code there are lots of things that are impossible to do but not explicitly UB), and they have changed them in significant ways in the past. You can see an incomplete list of them here[0]. There has been work to nail these details down for years, with the most recent being this one[1], but in general I would argue there hasn't been much progress since I first looked into it years ago now. Fun fact, the first Rust issue I remember reading that touched on this (and evolved into basically everything going on today) is this[4] one, which was made exactly 4 years ago today (And I mean exact!).

The 'Safety' sections for 'unsafe' operations are actually just extra requirements for those particular API on top that you also have to keep in addition to whatever safe Rust constraints apply. And I would argue in general they're not actually exhaustive, and they're mostly just a "best guess" - it's not a breaking change to introduce or realize there are more constraints. Ex. A little more then a year ago, they added the info that a slice can't be larger than `isize::MAX`[2], and a few months ago was this[3] one, where they added that the pointer length must be from a single allocation.

My point is that when you're doing something like writing an OS kernel, you end up running into these types of issues and corner-cases because Ex. They determine what your allocator is allowed to produce, or if some particular pointer can be validly put into a slice.

> No, "other code" should not be marked unsafe because this would mean that relying on slices is inherently unsafe. Which is silly; the whole point of a "slice" type, contrasted with its "raw parts", is in the guarantees it provides. This is why slice::from_raw_parts is unsafe: not because if what it does, but what it states about the result.

Sure, my point is that this is at best unclear, and I would argue wildly misunderstood. We've basically made two definitions of 'unsafe' - one for code that has to perform one of the five 'unsafe' operations (Which is a clearly defined 'yes' or 'no', and also the one described by the Rust docs), and one for code that can potentially break some type of invariant the will potentially cause UB somewhere else in the program (even if the code in question is completely safe and can't on its own cause any problems).

The problem is that Rust only guarantees the type of safety that it does if you do the second, but yet lots of people (I would almost argue most...) assume the first is sufficient and 'unsafe' by itself ensures all your memory bugs are in `unsafe` code. The Rust docs even describes safe vs. unsafe as two completely different languages (which they are) which brings into question why you would write `unsafe` code when you don't even need any of the `unsafe` operations. You arguably want a separate keyword for this - one that marks a function `unsafe`, but doesn't actually allow you to perform any of the `unsafe` operations in it.

[0] https://doc.rust-lang.org/nomicon/what-unsafe-does.html [1] https://github.com/rust-lang/unsafe-code-guidelines [2] https://github.com/rust-lang/rust/commit/1975b8d21b21d5ede54... [3] https://github.com/rust-lang/rust/commit/1a254e4f434e0cbb9ff... [4] https://github.com/rust-lang/rfcs/issues/1447


> - it's not a breaking change to introduce or realize there are more constraints.

These breaking changes are allowed precisely because they're meant to address soundness concerns. Even if at any given time they're just a "best guess" of what's actually needed, that's still wildly better than what e.g. C/C++ do, which is just to not guess at all. Even wrt. formal memory models, perhaps the most complex issue among the ones you mention here, C/C++ only got an attempt at a workable memory model with the 201x standards (C11, etc.). It's not unreasonable to expect that Rust may ultimately do better than that.


I highly disagree with that assessment, in regards to C you're comparing apples to oranges IMO, because C is very lax compared to what Rust enforces now and could enforce in the future, just by virtue of having so many less actual features. With that, if you commit to being `gcc` (and `clang`) specific, you have a very high number of guarantees and flexibility, even more depending on what extra feature flags you pass.

> It's not unreasonable to expect that Rust may ultimately do better than that.

But, when? Rust is almost 10 years old, and questions about this stuff has been posed for years now - I was having these same conversations over two years ago. Last year's roadmap included working on the 'unsafe guidelines' I posted, and as an outsider looking in it's unclear to me how much progress has actually being made. Don't get me wrong - they've made a fair amount of progress, but there's still a lot to be done.

My big concern (which I don't consider to be unfounded) is that because there aren't that many big users of Rust actually developing really low-level stuff, the work on defining it isn't getting done. But to me it feels like a self fulfilling prophecy - a project like the Linux Kernel isn't going to end-up using Rust if there's big issues with connecting what they're doing now to Rust idioms (Ignoring the LLVM thing...), even though when Rust was first maturing it was being billed as the replacement for C (and still is).


Well, it took 40 years for C to have any kind of memory model, so Rust has still some time available.


That's a bit hand wavy - The Linux Kernel has been doing atomics in C for I believe over 20 years now, well before C11 came out. And even after it came out they don't use C11 atomics. `gcc` gives them enough guarantees on its own to implement correct atomics and define their own memory model, in some ways better then the C11 one (and in some ways worse, but mostly just from an API standpoint).

When that said, my concerns are more with the state of `unsafe` overall, the memory model is only one part of that (though it somewhat spurred conversions about the other issues).


Linux kernel is just one kernel among many.

I bet that the pre-C11 memory semantics on Linux kernel aren't the same as e.g. on HP-UX kernel, across all supported hardware platforms.

It is also ironic that Java and .NET did it before C and C++, with their models being adopted as starting point.


> Linux kernel is just one kernel among many.

> I bet that the pre-C11 memory semantics on Linux kernel aren't the same as e.g. on HP-UX kernel, across all supported hardware platforms.

Yeah, but if they both work, why does it matter? There exists more than one valid possible memory model. The point I was making is that with C, it has been possible to define valid semantics well before C11 because compilers and the language give enough guarantees already.

To that point, while I focused on atomics (Since atomics and threading are largely what was added in C11), the bulk of the actual memory model that made that possible was defined well before then. Strict-aliasing was a thing in C89 (Though I'm not sure compilers enforced it at that point) and is probably the only real notable "gotcha", and `gcc` and `clang` let you turn it off outright (And if you're not doing weird kernel stuff, it generally is easy to obey).

`gcc` and `clang` also have lots of documentation on the extra guarantees they give, such as very well documented inline assembly, type-punning via `union`, lots of attributes for giving the compiler extra information, etc.

Compared to what Rust offers right now, there is no comparison. With Rust, the aliasing model (Which `unsafe` code is not allowed to break) is still unknown, and a lot of the nitty/gritty details like the stuff I listed for `from_raw_parts` are simply leaky implementation details they're inheriting unintentionally from LLVM (And effectively from C - C is where things like "not allowed to go past one past the end of an allocated block" restriction comes from, along with a host of other things).


While I do agree that Rust still needs to improve in this area, what happens when you port pre-C11 from gcc on Linux (x86) to aC on HP-UX (Itanium) and xlc on Aix (PowerPC), to keep up with my example?


Well, I would clarify - In reference to the memory models I was talking about, it only applies to code written as part of those kernels, userspace code does not "inherit" that memory model. And I don't think portability of kernel code is much of a concern, the memory model is only one of a large variety of problems.

That said, for userspace, pthreads already gives enough guarantees on ordering via mutex's that there aren't really any problems unless you're trying to do atomics in userspace, which before C11 would have been hard to do portability. And the rest of the things I was talking about like the aliasing model are defined as part of the C standard (C89 and C99), so it's always the same regardless of the compiler (ignoring possible bugs).


It doesn't mean your code is better off, but it makes auditing your code a lot easier. Rust gives you lots of good tools so that you can fence in your unsafe and restrict it to a module (source code file), such that if you audit the module containing the unsafe, you should be good to go.


The value is in knowing where the unsafe operations are.

It could be argued that's not a huge value gain since you're so far down the stack, but that's the value of the unsafe keyword.


the million dollar question is what % of your code has to be unsafe in a kernel. At some %, it ends up not being worth the trouble. But if the % can be kept relatively low (say 5% or less) then you get a lot of value out of minimizing the surface area of the dangerous code.


I was expecting way more “unsafe” uses than that!

I see two advantages to explicitly marking code as unsafe:

(1) You are very clearly marking the areas of code that are most suspect

(2) You can still have safe code outside of those “unsafe” blocks


Unsafe blocks don't mean it's unsafe: just that the language's safety scheme can't prove anything about that particular block. You prove each safe externally with Rust's safety handling everything else (i.e. majority of the code).

You can also go further like I did here:

https://news.ycombinator.com/item?id=21840431

You're still worrying about a smaller portion of your code.


Same for many other languages... as far as memory safety goes because there are performance tradeoffs in a kernel.


>I'm asking out of curiosity because I want to learn a "system" programming language

If you mean that you don't know any right now, just learn C. This isn't web programming, where everyone is always hopping to the latest fads. C is the lingua franca, the sine qua non of systems programming. It will be a long time before that changes. The C machine model is what everyone is working with anyway. C is small, despite some devious and fun corners of it, and mostly just exposes you to that model and way of interacting with the machine.

You can’t even appreciate the new systems languages unless you have a firm grasp of C and systems programming. They are all a response to C, addendum and proposed evolutions to it.


Thank you. I sure tried C too (also C++ for gui programming), though I wouldn't say I "know" it (which to me would imply at the very least one significant real-world experience with it), I do understand why some projects try to modernize "system" programming. I just want to evaluate alternatives, but I may very well go for C in the end...


C is portable assembly, I’d recommend learning it together with the assembly for the platform you’re learning on, so you can actually understand the stack, heap, calling conventions, etc.


C is portable assembly until it isn't:

- no access to carry/overflow flag in registers making it a chore to write bigint libraries

- no way to guarantee emitting add with carry or sub with borrow even though those are simple instructions and some architecture (6502) don't even provide a normal add/sub

- need to drop down to assembly to implement resumable function/fibers to avoid the syscall costs of ucontext/setjmp/longjmp

- no access to hardware counters like RDTSC

- no way to get the CPU frequency in a portable and reliable way

- no portable threadID function and then, those that are available (pthread_self, and Windows') rely on expensive syscalls.

- no way to do CPU feature detection (SSE, AVX, ARM Neon, ...)

- no way to control emission of common intrinsics like popcount, bit-scan-reverse, lowest bit isolation in a portable way.

The portable assembly narrative breaks down rapidly when you actually need specific code to be emitted.


> C is portable assembly until it isn't:

> - no access to carry/overflow flag in registers making it a chore to write bigint libraries

C is portable assembly: so it provides access to a minimum common set of features which allow the code to be portable:

good luck trying to use the carry flag on the RISC-V..


The following is a genuine question (not a flippant remark):

Is there any way to have those concepts be portable across ISAs/architectures. I don’t think those types of ops are exposed in LLVM-IR or GCC’s various intermediate levels (but could be wrong). I’d love to be able to investigate a language that enables a level of hardware access possible in a portable manner.


need to drop down to assembly to implement resumable function/fibers to avoid the syscall costs of ucontext/setjmp/longjmp

Which syscalls would be involved in setjmp/longjmp?



For inexplicable reasons, ucontext/setjmp/longjmp include signal masks, which, for lack of VDSO implementation, must be obtained by invoking syscalls.

(FreeBSD provides the non-portable _setjmp/_longjmp, which do not preserve signal mask state and avoid the syscall. There are also the POSIX sigsetjmp/siglongjmp, which with savesigs=0 may bypass the syscalls as well. Both FreeBSD and Linux provide the POSIX routines.)


Precisely my point - learning both together as they compliment each other instead of C abstracting the underlying computer away.


I'm curious too, I'm quite familiar with Rust and never written any Zig in my life so I went digging through the source and I find the syntax remarkably similar for the most part. The only thing that stood out is that apparently you can drop the braces for single-line `if` bodies like in C whereas Rust makes them always mandatory but I'm firmly on Rust's side on this one.

The part where Rust can get really messy is when you involve generic programing and traits, especially when lifetimes are involved, but I couldn't find any obviously generic code in this codebase.

EDIT: after digging a bit deeper into Zig it looks like it uses C++'s style duck typing for metaprograming instead of a trait-based approach like Rust? It definitely removes some overhead to writing generic code but I'm not sure about readability and maintainability... But I'm going to stop here because I'm about to go full fanboy for Rust.


I wouldn't call Zig's comptime "C++-style." Unlike Rust, there's very little in Zig that is borrowed from C++. Zig's error reporting and comptime makes it easy to write arbitrary compile-time checks, so Zig uses a single construct and keyword, comptime, to replace all special instances of partial evaluation: generics, concepts/traits, value templates, macros and constexprs.

The main difference between Zig and Rust is a huge disparity in language complexity. Zig is a language that can be fully learned in a day or two. Rust has the same philosophy of "zero-cost abstraction" as C++, i.e. spending a lot of complexity budget to make a low-abstraction language appear as if it has high abstraction. Zig, like C, does not try to give the illusion of abstraction.

There is also the difference in their approach to safety, but that's a complicated subject that ultimately boils down to an empirical question -- which approach is safer? -- which we don't have the requisite data to answer.


> There is also the difference in their approach to safety, but that's a complicated subject that ultimately boils down to an empirical question -- which approach is safer? -- which we don't have the requisite data to answer.

We do have the requisite data to answer whether preventing use-after-free is better than not preventing it.

You can argue (not successfully, in my opinion) that it's not worth the loss in productivity to prevent UAF and other memory safety issues, but it's impossible to argue that not trying to prevent UAF is somehow safer.


> We do have the requisite data to answer whether preventing use-after-free is better than not preventing it.

I expect Zig will prevent use-after-free. It will be sound for safe code and unsound for unsafe code (by turning this on only in debug mode for testing).

> but it's impossible to argue that not trying to prevent UAF is somehow safer.

First, see above. Second, it is not only possible but even reasonable to argue that not trying to completely eliminate a certain error is very much safer. The reason is that soundness has a non-trivial cost which can very often be traded for an unsound reduction in a larger class of bugs. As an example, instead of soundly eliminating bugs of kind A, reducing bugs of kinds A, B and C -- for a similar cost -- may well be safer.

There has been little evidence to settle whether sound elimination of bugs results in more correctness than unsound reduction of bugs or vice-versa, and it's a subject of debate in software correctness research.


It's not interesting to talk about a system that might or might not exist in the future. (There is a word for that—"vaporware".) The point is that Zig doesn't even try to prevent UAF now, so you can't say that it's safer than languages that do prevent the problem.

> As an example, instead of soundly eliminating bugs of kind A, reducing bugs of kinds A, B and C -- for a similar cost -- may well be safer.

Hasn't this essentially been what C++ has been trying for memory safety for decades, without success? The C++ approach has been "smart pointers are good enough, and they prevent several other problems too", and the experience of web browsers (among others) has pretty much definitively shown: no, they really aren't. For memory safety, I would not bet on this approach.


Just for clarity for anyone reading, the Zig author does not claim that Zig is safe and has in fact said that is unsafe. Could change in the future, but there's no denial about what it is today.


There is a difference between safe code, which is the goal, and a safe language (that's a statement the language makes on sound safety guarantees). Using a safe language is definitely one way to write safe code, but it is not necessarily always the best way, and it's certainly not the only way. Zig is not meant to ever be a safe language, but it is very much intended to be a language that helps write safe code. That is what I meant when I said that the two languages have a very different approach to safety.


> The point is that Zig doesn't even try to prevent UAF now

I wouldn't say Zig exists at all right now, but just as it strives to one day be production-ready, it strives to prevent use-after-free. Safety is a stated goal for the language.

> Hasn't this essentially been what C++ has been trying for memory safety for decades, without success?

No. I'm talking about a mechanism that can detect various errors at runtime, and is turned on or off for various pieces of code and/or for all code at various stages of development. Rust, BTW, doesn't entirely guarantee memory-safety, either, when any unproven unsafe code is used, and even when it isn't (e.g., have you proven LLVM's correctness?). We always make some compromises on soundness; the question is where the sweet-spots are.

Software correctness is one area where there are no easy answers and very few obvious ones.


> No. I'm talking about a mechanism that can detect various errors at runtime, and is turned on or off for various pieces of code and/or for all code at various stages of development

What you are describing exists: ASan. We have a pretty good answer to the question "is ASan sufficient to prevent memory safety problems in practice": "no, not really".

> Rust, BTW, doesn't entirely guarantee memory-safety, either, when any unproven unsafe code is used, and even when it isn't (e.g., have you proven LLVM's correctness?). We always make some compromises on soundness; the question is where the sweet-spots are.

Empirically, Rust's approach has resulted in far fewer memory safety problems than previous approaches like smart pointers and ASan, with only garbage collectors (and restrictive languages with no allocation at all) having similar success in practice. Notice that the working approaches have something important in common: a strong system that, given certain assumptions, guarantees the lack of memory safety problems. Even though those assumptions are never quite satisfied in practice, empirically having those theoretical guarantees seems important. It separates systems that drastically reduce safety problems, such as Rust and GC languages, from those that do so less well, such as ASan and smart pointers. This is why I'm so skeptical of just piling on more mitigations: they're helpful, but we've been piling on mitigations for decades and UAF (for instance) is still a big a problem as ever.


> We have a pretty good answer to the question "is ASan sufficient to prevent memory safety problems in practice"

That is not the question we're interested in answering, and elimination of all memory errors is no one's ultimate goal, certainly not at any cost. By definition, unsound techniques will let some errors through. The question is which approach leads to an overall safer program for a given effort, and soundness (of properties of interest) always comes at a cost.

(also, Zig catches various overflow errors better than ASan)

> Empirically, Rust's approach has resulted in far fewer memory safety problems than previous approaches like smart pointers

I don't doubt that, and if minimization of memory errors was programmers' primary concern (even in the scope of program correctness or even just security), there would be little doubt that Rust's approach is better.

As someone who currently mostly programs in C++, lack of memory safety barely makes my top three concerns. My #1 problem with C++ is that the language is far too complex for (my) comfort, where by "complex" I mean requires too much effort to read and write. That, and build times, have a bigger impact on the correctness of the programs I write than the lack of sound memory safety. Would I be happier if, for a similar cost, I could eliminate all memory safety errors? Sure, which is why, if C++ and Rust were the only low-level languages in existence, I'd rather people used Rust. But I would be happier still if I could solve the first two, and also get some better safety as a cherry-on-top. Memory safety is similarly not the main, and certainly not the only, reason I use languages that have a (tracing) GC when I use them.


P.S.

> Notice that the working approaches have something important in common: a strong system that, given certain assumptions, guarantees the lack of memory safety problems.

That's a very good point and I'm not arguing against it. It's just that even if it's true -- and I'm more than willing to concede that it is -- it still doesn't answer the question, which is: what is the best approach to achieving a required level of correctness?

The "soundness" approach says, let's guarantee, with some caveats, certain technical correctness properties that we can guarantee at some reasonable cost. The problem is that that cost is still not zero, and my hypothesis is that it's not negligible. My personal perspective is that Rust might be sacrificing too much for that, but that's not even what has disappointed me with Rust the most. I think -- and I could be wrong -- that Rust sacrifices more than it has to just to achieve that soundness, by also paying for "zero-cost abstractions," which, for my taste, is repeating C++'s biggest mistake, namely sacrificing complexity for the appearance of high-level abstraction that may look convincing when you read the finished code (perhaps more convincing in Rust than in C++), but falls apart when you try to change it. Once you try to change the code you're faced with the reality that low-level languages have low abstraction; i.e. they expose their technical details, whether it's through code -- as in C and Zig -- or through the type system, as in Rust. Zig says, since the abstraction in low-level languages is low anyway (i.e. we cannot really hide technical details) there is little reason to pay in complexity for so-called zero-cost abstractions.

Language simplicity goes a long way, even as far as sound formal verification is concerned. For example, there are existing sound static analysis tools that can guarantee no UB for C -- but not the complete C++, AFAIK -- with relatively little effort. It's not yet clear to me whether Zig, with its comptime, is simple enough for that, though.

It is my great interest in software correctness, together with my personal aesthetic preferences, that has made me dislike language complexity so much and made me a believer in "when in doubt -- leave it out."


I would like to chime in by noting that Rust's mature form is borne of a very specific scenario, which is the web browser, software so complex that the "zero-cost" element is more like "actually possible to optimize" in practice. And a great deal of that complexity is accidential in some form, a result of accreted layers.

And in that respect, it's not really the kind of software anyone needs to aspire to; aspiring to write programs simple enough that Zig will do the job is much more palatable.


I don't take issue with the "zero-cost" part -- Zig and C, like every low-level language, have that -- but with the non-abstraction-"abstraction" part, which is rather unique to C++ and Rust. Rust has become a modern take on C++, and I'm not sure it had to be that for the sake of safety; I think it became that because of what you said: it was designed to replace C++ in a certain application with certain requirements. It's probably an improvement over C++, but, having never been a big fan of C++, it's not what I want from a modern systems programming language. It seems to me that Rust tries to answer the question "how can we make C++ better?" while Zig tries to answer the question "how can we make systems programming better?"

Of course, Zig has an unfair advantage here in that it is not production-ready yet, and so it's not really "out there," and doesn't have to carry the burden of any real software (there's very little software that Rust carries, but it's still much more than Zig). I admit that when Rust was at that state I had the same hopes for Rust as I do now for Zig, so Zig might yet disappoint.


> For example, there are existing sound static analysis tools that can guarantee no UB for C -- but not the complete C++, AFAIK -- with relatively little effort.

The only static analysis tools that can guarantee anything about C are doomed to have plenty of false positives. They are much less used in practice than tools ASan & co which don't guarantee anything but have way fewer false positive, if any.

They have their use-case, but calling them “little effort” is disingenuous.


> The only static analysis tools that can guarantee anything about C are doomed to have plenty of false positives.

That's where the effort comes in. Those false positives are removed by adding annotations or changing code.

> They have their use-case, but calling them “little effort” is disingenuous.

I said they require relatively little effort, because it's definitely less effort than a rewrite in Rust. This isn't hypothetical. >1MLOC programs in the industry today are checked in this way. If you have an existing large C program and decide that you need your program to have no UB, those sound static analysis tools are the most cost-effective way of doing that today.


> That's where the effort comes in. Those false positives are removed by adding annotations or changing code.

With the former you lose all guarantees and fall back to a safe/unsafe duality, with the latter you need to rethink how your code works to comply yo the analyzer's mindset: in the end it's pretty much like Rust, but in an ad-hoc way, much less ergonomic.

> I said they require relatively little effort, because it's definitely less effort than a rewrite in Rust.

This is a ridiculous way to save your argument. You are well aware that this is not what “relatively easy” means.

It is comparatively easier to deploy such tools than to rewrite a whole project in Rust, but the payoff is also lower in the long run (Rust offers more than zero-UB), so we might get to a point (when tooling[1] and hiring pool have reached a point where it becomes sustainable) where the latter option makes more sense for most people (unless C is mandatory, for portability reasons for instance).

[1]: I'm especially thinking about C2Rust here https://immunant.com/blog/2020/01/quake3/


> With the former you lose all guarantees and fall back to a safe/unsafe duality

This is simply not true. The annotations are checked. It's exactly like adding type annotations when inference fails.

> with the latter you need to rethink how your code works to comply yo the analyzer's mindset

No. I'm talking about adding something like a bounds check in a function entry.

> in the end it's pretty much like Rust, but in an ad-hoc way, much less ergonomic.

Except that it is cheaper than a rewrite in Rust, which is one of the several reasons why this is currently the preferred approach in industry segments that require certain correctness guarantees. I don't know if you know this, but Rust isn't exactly making big headways in the safety/security-critical software world, especially, though not only, in embedded (for a multitude of reasons). Those sound static analysis tools, on the other hand, are showing nice growth.

Also, even if Rust were more ergonomic than this, there are alternatives that I think will be more ergonomic than Rust. I.e. it's not enough to be better than C++; if you want to get the people currently using C/C++ you need to be better than C/C++ in a way that justifies the transition cost and better than the other alternatives.

> You are well aware that this is not what “relatively easy” means.

Relatively easy means easier than all or most other available options. Anyway, that's what I meant.

> but the payoff is also lower in the long run (Rust offers more than zero-UB)

Nobody knows about the long term payoff Rust gives you because few people have had sufficient long term experience with it. It could be large, small, nil, or negative. And there are other languages as well. Zig, when it's available, might well have a bigger payoff than Rust (that's my current guess), and in any event, few people in the low-level programming space who aren't currently using C++ are even thinking about, let alone considering, Rust. Not that anyone is thinking about Zig, but at least Zig is aiming at C shops as well.

Again, as someone who has been using formal methods for some years now, I can tell you that nothing is obvious in software correctness, and no one knows what the best way to achieve it is (although we do know some best practices, and we have some answers to more specific questions).

> so we might get to a point (when tooling[1] and hiring pool have reached a point where it becomes sustainable) where the latter option makes more sense for most people

You are assuming that Rust is the preferrable choice. I no longer think it will be. BTW, here's "C2Zig": https://youtu.be/wM8vz_UPTE0


> Except that it is cheaper than a rewrite in Rust, which is one of the several reasons why this is currently the preferred approach in industry segments that require certain correctness guarantees. I don't know if you know this, but Rust isn't exactly making big headways in the safety/security-critical software world, especially, though not only, in embedded (for a multitude of reasons). Those sound static analysis tools, on the other hand, are showing nice growth.

Rust is way too new for that obviously. And as I said, because tooling and hiring pool are not here yet, it would be a critical mistake to attempt such move at the moment.

For new projects however, Rust is a really interesting bet. (I was until recently working on a new medical robot, whose software was mostly Rust and the speed at which we got it working was really exciting!).

> in any event, few people in the low-level programming space who aren't currently using C++ are even thinking about, let alone considering, Rust.

That's not my experience. Not many are considering it for a lot of reasons (too new, not enough people mastering it, resistance to change etc.) But “thinking about” is another story ;).

> You are assuming that Rust is the preferrable choice

Zig isn't really a choice at this point. It might become it in a few years, but there still a long way to go.


> I think -- and I could be wrong -- that Rust sacrifices more than it has to just to achieve that soundness, by also paying for "zero-cost abstractions," which, for my taste, is repeating C++'s biggest mistake, namely sacrificing complexity for the appearance of high-level abstraction that may look convincing when you read the finished code (perhaps more convincing in Rust than in C++), but falls apart when you try to change it.

The argument here seems to be that there is can be no real abstraction in low-level languages, so there's no point providing language features for abstraction. The premise seems clearly false to me, because even C has plenty of abstraction. Functions are abstractions. Private symbols are abstractions. Even local variables are abstractions (over the stack vs. registers).

People often argue that Rust is too complicated for its goal of memory safety. It's easy to say that, but it's a lot harder to list specific features that Rust has that shouldn't be there. In fact, as far as I'm concerned Rust is an exercise in minimal language design, as the development of Rust from 0.6-1.0 makes clear (features were being thrown out left and right). Most of the features that look like they're there solely to support "zero-cost abstractions"—traits, for example—are really needed to achieve memory safety too. For instance, Deref is central to the concept of smart pointers, and, without smart pointers, users would have to manually write Arc/Rc/Box in unsafe code every time they wanted to heap-allocate something.

> Language simplicity goes a long way, even as far as sound formal verification is concerned. For example, there are existing sound static analysis tools that can guarantee no UB for C -- but not the complete C++, AFAIK -- with relatively little effort. It's not yet clear to me whether Zig, with its comptime, is simple enough for that, though.

The most important static analyzers used in industry today are Clang's sanitizers, which work on both C and C++. The most important such sanitizers actually work at the LLVM level, which means they work on Rust as well [1]! The days of having to write a compiler frontend for static analysis are long gone. We have excellent shared compiler infrastructure that makes it easy to write instrumentation that targets many low-level languages at once. (Even in the world of C, this is necessary. Plain old C99 is an increasingly marginal language, because the real important code, such as Windows and Linux kernels, are written in compiler-specific dialects of C, which means that a static analysis tool that isn't integrated with some popular compiler infrastructure will have limited usefulness anyway.)

> It is my great interest in software correctness, together with my personal aesthetic preferences, that has made me dislike language complexity so much and made me a believer in "when in doubt -- leave it out."

Again: easy to say, harder to specify specific Rust features you think should be removed.

[1]: https://github.com/japaric/rust-san


> The argument here seems to be that there is can be no real abstraction in low-level languages, so there's no point providing language features for abstraction.

My argument is that low-level languages allow for low abstraction, i.e. there's little that they can abstract over, where by abstraction I mean hide internal implementation details in a way that when they change the consumer of the construct, or "abstraction", does not need to change; if it does, then the construct is not an abstraction. With "zero-cost abstraction," C++/Rust offer constructs that syntactically appear as if they were abstractions (e.g. static vs dynamic dispatch; subroutine vs. coroutine call), but in reality aren't. I am not aware of any other language (unless Ada has changed considerably since I last used it in the early '00s) that values this idea to such a great extent.

The things you mentioned are abstractions only in the sense that the user doesn't need to know how the compiler implements them; in that respect, every language construct, including `if` (e.g. in Java, not every if is compiled into a branch) is an abstraction. I speak of the language's ability to allow users to abstract, and in that regard all low-level languages provide for poor abstraction. Without a JIT, the caller needs to know the calling convention; without a tracing GC the caller needs to know how the memory pointed to by a returned value is to be deallocated. The question is how much you try to make sure that all this knowledge is implicit in the syntax.

> People often argue that Rust is too complicated for its goal of memory safety.

I didn't know people often say that. I said it, and I'm not at all sure that's the case. I think that Rust pays far too heavy a price in complexity. It's too heavy for my taste whether or not it's all necessary for sound memory safety, but if it isn't, all the more the shame.

> Most of the features that look like they're there solely to support "zero-cost abstractions"—traits, for example—are really needed to achieve memory safety too.

OK, so I'll take your word for it and not say that again.

> The most important static analyzers used in industry today are Clang's sanitizers

I'm talking about sound static analysis tools, like Trust-in-Soft, that can guarantee no UB in C code. I think that particular tool might support some subset of C++, but not all of it. The sanitizers you mention rely on concrete interpretation (aka "dynamic") and are, therefore, usually unsound. Sound static analysis requires abstract interpretation, of which type checking and type inference are special cases. Just as you can't make all of Rust's guarantees by running Rust's type-checker on LLVM bytecode, so too you cannot run today's most powerful sound static analysis tools -- that are already strong enough to absolutely guarantee no UB in C st little cost -- on LLVM bytecode; they require a higher-level language. Don't know about tomorrow's tools.

> Again: easy to say, harder to specify specific Rust features you think should be removed.

I accept your claim. In general, I don't like to isolate language features; it's the gestalt that matters, and it's possible that once Rust committed to sound memory safety everything else followed. But let me just ask: are macros absolutely essential?


> But let me just ask: are macros absolutely essential?

Yes, for two main reasons: (1) type-safe printf; (2) #[derive]. Nothing is "absolutely essential" in any Turing-complete language, of course, but printf comes pretty close. The only real alternative would have been to use the builder pattern for formatting like C++ does (i.e. the << operator), but nobody seriously proposed that as the aesthetics are really bad and setting formatting flags like precision is problematic. And without custom #[derive], you couldn't serialize types, which is pretty important in a modern language.

Note that these are both cases (maybe the primary cases) in which OCaml has built-in ad-hoc solutions that feel very un-"systemsy". In OCaml, format strings are built in to the language, as are functions like PartialEq and Debug, the latter of which are implemented via magic built-in functions that call private internal reflection APIs. There was a desire to do better in Rust, as it was felt that these are the ugliest parts of OCaml, and so macros were part of Rust from the very early days.


OK. I mean, personally I would swallow a lot of ugly special cases before adding macros, but it's certainly in line with other people's aesthetic preferences and Rust's "C++ spirit" of a low-level language with high-level language features. I can see the logic in saying that since we need lots of fancy mechanisms for sound safety anyway, what's one more gonna hurt?

BTW, Zig manages to do both things without macros and without any special cases in the compiler. TBF, Zig's approach was unfamiliar to me until I saw it in Zig, and it is Zig's "brilliant idea" (even if it had originated elsewhere), so it's OK if Rust simply didn't consider it; Rust certainly has its own brilliant idea.


> As someone who currently mostly programs in C++, lack of memory safety barely makes my top three concerns.

Isn't having a UB-free JVM a noteworthy goal though? Especially if it gets used in life-critical systems such as avionics or autonomous cars.


UB-freeness is not a goal in-and-of-itself. It's shorthand for a certain kind of technical (i.e. non-functional) correctness, which, in turn, is related in some ways to functional correctness, and it's improving functional correctness (and I include security here) that's the goal. Is the most effective way to achieve that is by working to completely eliminate undefined behavior? I'm not at all sure.


> Rust, BTW, doesn't entirely guarantee memory-safety, either

> We always make some compromises on soundness; the question is where the sweet-spots are.

Excellent point. There are complex tradeoffs and the "rust is safe" slogan is just a slogan.


It's not a slogan; there's a published type safety proof of the core of Rust. You may not like the assumptions that underlie that proof, but it's unarguable that "safety" has a very concrete meaning in the context of the language.


> reducing bugs of kinds A, B and C -- for a similar cost -- may well be safer.

What kind of bugs do your have in mind that Zig would prevent and Rust doesn't ? I can think about several Rust does and Zig doesn't thanks to its afine type system, but I've yet to see a safety feature benefit in Zig.


Preventing bugs with soundness is not necessarily the best, and certainly not the only, way to reduce bugs. You can reduce bugs by making code easier to read (code review has been empirically shown, time and again, to be the most effective, in terms of cost/benefit, bug reduction technique), and by making code faster to write and to compile, thus leaving more time for tests and other verification techniques.

If sound elimination of bugs were the best way to write programs, we'd all be writing in Isabelle, Coq or Idris, except even those of us who do -- in fact, especially those of us who do -- know that it's not the best way to write programs.


> but it's impossible to argue that not trying to prevent UAF is somehow safer.

It might be possible. Someone might prove that preventing UAF necessarily increases complexity somewhere else. Like one of those laws of thermodynamics, but for software.


comptime does not replace macros though...there is no way to manipulate AST in zig, nor will there ever be, according to the author.


You can't manipulate ASTs (ie, Zig code), but nothing stops you from parsing string literals however you want.

For example, I wrote a PEG-like parser combinator library in Zig. Using it currently [looks like this](https://github.com/CurtisFenner/zsmol/blob/87de4c77dd8543011...). However, as a library, I ^could provide a function that looks like

    pub const UnionDefinition = comb.fromLiteral(
        \\ _: KeyUnion
        \\ union_name: TypeIden
        \\ generics: Generics?
        \\ implements: Implements?
        \\ _: PuncCurlyOpen
        \\ fields: Field*
        \\ members: FunctionDef*
        \\ _: PuncCurlyClose 
   );
etc. But, I find reading the code as it is good enough for now that I didn't want to spend time implementing such a library.

[^]: Being able to create brand new types at `comptime` isn't [yet implemented](https://github.com/ziglang/zig/issues/383), so this can't quite be done yet, though you could fake it with `get`/`set` methods instead of real fields


Well, it is intentionally weaker than macros (and I agree with Zig's designer that that's a very good thing, though it is a matter of taste), but it does replace many of the cases where in Rust you'd have to use macros (or the preprocessor in C/C++). So it replaces macros everywhere where it deems their usage reasonable.


It is a legit design choice, but it does detract from your comment about language complexity. Not having AST macros inherently adds complexity to a language by requiring features to be built into the compiler rather than be implemented as libraries.


Those are different kinds of complexity. You're talking about the effort required by the implementor of the compiler. I'm talking about the effort required by the programmer using the language.


Then you aren't talking about complexity (an objective quality), you are talking about difficulty to read (a subjective quality relative to the reader). There is no doubt that macros can make a given piece of code harder to read, if the reader is unfamiliar with the macro being used. Complexity describes how intertwined different pieces of something are internally, which has nothing to do with a given vantage point.


No, that's just how Rich Hickey describes complexity; it's hardly a universal definition. For example, in computer science, the complexity of a task is often a measure of the effort, in time or memory, required to perform it.


English is certainly not free of ambiguity, but in the past i've seen you heavily emphasize precision in word choice, so it's surprising to see you de-emphasize it here. Nobody is a final arbiter of definitions, but the distinction i'm making is not a trivial one, and Hickey isn't the only one to have made it. Even thinking about it colloquially, how often do you follow the word "complex" with an infinitive verb describing an action? A rube goldberg machine is complex...and hard to build!


I'm not deemphasizing it, I'm just saying that we're talking about different meanings of "complexity" here and there is no well-accepted definition. My "complexity" refers to the effort required by the programmer when understanding programs written in the language.


Fair enough, ron. I won't belabor it further. I'll just leave this: Long ago, after coming across a very useful distinction between the words "practical" and "pragmatic," i intentionally changed my usage of those words as a result. Not because a charismatic person told me to, but because it was useful. If a distinction is useful to make, start making it, my man!


Which is the right call really. It is valuable to have the code you're looking at actually be what it appears to be.


I don't agree.

For many domains, being able to implement a domain-specific language with a set of expressive rules for the domain is an incredible productivity boost and also prevents many mistakes because you cannot represent them.

Not being able to manipulate the AST means that you are restricted on the embedded DSLs that you can provide. And embedded DSLs encompasses code generation for:

- state machines

- parsing grammars (PEGs for example)

- shader languages

- numerical computing (i.e. having more math like syntax to manipulate indices)

- deep learning (neural network DSL)

- HTML templates / generators

- monad composition

There is a reason most people are not building in Assembly anymore, there is a right-level of abstraction for every domain. A language that provides building block for domain expert to build up to the right level of abstraction is very valuable.


The question is, as always, at what cost?

Low-level languages (aka "systems programming" languages) already suffer from various constraints that increase their accidental complexity. Is it really necessary to complicate those particular languages further to support embedded DSLs?

I don't think there's a universal right or wrong answer here, but there is certainly a big question.


Ironically, the lack of macros leads to an explosion of ad-hoc, extra-language DSLs. Look at rules engines, for example. In Drools (java), you have to write rules with a special language, DRL. Meanwhile in Clara (clojure) you write your rules in clojure. Macros simplify languages, they don't complicate them.


It's always a trade-off and placing the cursor correctly is tricky. Things like macros, operator overloading, metaprograming, virtual calls, exceptions or even function pointers are effectively "obfuscating" code by having non-obvious side effects if you don't have the full context. On the other hand if you push the idea too far in the other direction you end up with basically assembly, where you have an ultra-explicit sequence of instruction to execute.

It's very easy to come up with examples of terrible abuse of these features that lead to bad code (like for instance if somebody was insane enough to overload the binary shift operator << to, I don't know, write to an output or something) but it also gives a lot of power to write concise and expressive code.


It uses "comptime" which, loosely, is a compile time version of the language. I would say it's a huge improvement over C in that quite frankly string interpolation for a preprocessor is terrible. I can't compare to rust since my brain can't parse rust syntax; too many bells and whistles.


Traits are a part of the stdlib (under std.meta) as opposed to the language. This is unlikely to change


If you want more syntax weirdness: tab characters are illegal.


I don't mind opinionated coding style (my Rust integration scripts all enforce that stock "rust clippy" doesn't return any error) and I do think that using tabs for indentation simply doesn't work in practice regardless of how great they are in theory because hardly anybody uses them correctly (including the vast majority of code editors by default). It might be strange to back it straight into the compiler but I don't mind it.

That being said it does make it even weirder to allow braceless ifs IMO, but that's bikeshedding.


> including the vast majority of code editors by default

Which ones exactly? Not really a problem I've encountered often except when someone tries to mix both spaces and tabs, and in general editors are built with the existence of this very common character in mind. People tend to hit the tab key to indent anyway, and one tab char meaning one level of indent is perfectly intuitive and allows users to individually configure how large they want their indents to be.


the problem i encounter is when you try to break a long line into multiple lines. if you want to use tabs and align the continuation, you should be mixing spaces and tabs.

for example ('-' is tab, '.' is space):

    --function_with_lots_of_arguments(arg1, arg2, arg3,
    --................................arg4, arg5, arg6);
it can be done, but a lot of editors get it wrong and it requires paying attention to the whitespace.

of course, another style would be to just indent the continuation twice, without aligning it. (i personally prefer to align continuations.)


Some food for thought: https://youtu.be/ZsHMHukIlJY?t=633

For example, this is the best way to define functions with argument lists long enough not to fit on a single line:

  fn doThing(
      argument1,
      argument2,
      argument3,
  ) {
      <code>
  }
Visually clear, doesn't have any spaces/tabs issues, produces minimal diffs when adding/removing/renaming arguments.


Very interesting talk! Thanks for the reference.

More food for thought:

The style in your example is more consistent with the way nearly every programmer formats code that has more than one statement. For example:

  if( shouldDoThings ) {
      doOneThing();
      doAnotherThing();
      doLastThing();
  }
Why do we want to format statements that way, but function calls and expressions a different way? No one would write this:

  if (shouldDoThings) {doOneThing();
                       doAnotherThing();
                       doLastThing();}
I have a theory about that but will have to save it for another day.

Somewhat related, the Rust coding style guidelines used to follow a heavily column-aligned style, but changed to indentation-only a year or two ago. I posted an example from the Servo source code with the old and new formats here:

https://news.ycombinator.com/item?id=18962177


Many editors (including the venerable Vim and Emacs) indent and align with tabs by default, which means that if you want your code to look right you standardize tab width which in turn removes one of the only (meager) advantages tabs have over spaces: configuring the indentation width to match your personal taste.

>Not really a problem I've encountered often except when someone tries to mix both spaces and tabs

Mixing spaces and tab for indentation and alignment is how it should be done if you want things to remain aligned when you change the tab width.


So then it's the editor that's at fault. Most editors let you do a "run formatter on save", and zig has(had? at least it did last I played with it) a formatter that can do correct "indent with tabs and align with spaces" with only a minor tweak


I'm not aware of any editor or auto-formatter that handle the case of https://news.ycombinator.com/item?id=21969323 correctly.


Visual studio does it for C#, at the very least. Other formatters choose other options, and it's a matter of preference in any case (I personally think it's a code smell to have that many arguments or that long of names, and I also don't limit myself to 80 columns, cause I've got a widescreen monitor)

Either way, it's a trivial algorithm to implement. Calculate len("function_with_lots_of_arguments("), add that many spaces after your indentation tabs on the next line, continue arguments. The fact that editors/auto-formatters don't do that speaks far more to their lack of desire for that code style then it does to the difficulty of doing it.

The simple fact remains that using tabs provides many accessibility benefits that spaces are just unable to provide, like being able to drop tab size when you boost font size (legibility for poor vision) so your code doesn't indent off the side of the screen, or far better support for indentation for proportional fonts (another thing that people swear by for increasing legibility).


Just to be clear, I prefer tabs over spaces myself, so I'm not arguing against tabs. What I'm arguing against is mixing tabs with spaces, with the expectation that every contributor to the codebase uses sufficiently smart editors to not break the formatting.

With that out of the way...

>Either way, it's a trivial algorithm to implement. Calculate len("function_with_lots_of_arguments(")

That breaks for `foo(bar(a,\nb,\nc))`

And if you fix it to be "len till the last unmatched (" then it breaks for `foo(bar(a,\nb),\nc)` because the two lines have to indent to different levels.

Just saying it's not as trivial as it seems.

>The fact that editors/auto-formatters don't do that speaks far more to their lack of desire for that code style then it does to the difficulty of doing it.

And is a reason to not use such a style in the first place, as I said.


> Just to be clear, I prefer tabs over spaces myself, so I'm not arguing against tabs. What I'm arguing against is mixing tabs with spaces, with the expectation that every contributor to the codebase uses sufficiently smart editors to not break the formatting.

That's still just an argument for "everyone use a specific formatter with these settings". Go's formatting is simple, built into the main implementation, and has no knobs, as a result, everyone uses it, and all code is formatted with tabs and spaces. It can work, esp. if a language adopts it early (like Zig could have).


>Mixing spaces and tab for indentation and alignment is how it should be done if you want things to remain aligned when you change the tab width.

How it should be done is to never write code that needs alignment if you're using tabs for indentation.


> I do think that using tabs for indentation simply doesn't work in practice

It has worked just fine for decades for projects with millions of lines of code.

It might not work for babby's first patch and how do I configure an IDE??


Depends of your definition of "fine". The linux kernel works fine indenting with tabs but it mandates 8-space tabs and a lot of code ends up looking borderline unreadable if you use anything else, meaning that the only thing they get by using tabs is slightly smaller file sizes. Besides if you want to enforce the 80column limit you have to standardize a tab width anyway.

It's simply not worth it IMO. The pros are simply too small and inconsequential to justify not using spaces on new projects. Although now that automatic code formaters are becoming ubiquitous it really doesn't matter what you use in your editor I suppose.


It's linted out iirc. Is it really strange to mandate that code look explicit and have no ambiguous whitespace?


Honestly, I like tabs for indent in C, but still support this language choice by Zig. It's hard to do tabs wrong if you don't allow tabs at all. Python3 went part of the way to disallowing tabs completely, but instead only disallowed mixed tabs and spaces. (And in Python, it's a bigger problem as the indentation level is significant to the syntax.)


Even weirder than that, it intentionally fails on \r\n newline, so windows text files straight up don't work by default.


Good! Even Windows Notepad, the extreme example of "not a code editor editor," supports \n newlines now: https://devblogs.microsoft.com/commandline/extended-eol-in-n...


Why is it good to force all your users to figure out how to set their text editors to do something different and unnecessary just because you refuse to do what essentially every other non toy language has always been able to do? How is a decision like that not a giant red flag of user hostility and bad judgement?


Is your metaphor about Windows since XP or about Zig?


This isn't a metaphor. Zig has gone to more effort to make itself break for windows users than it would have taken to make it work and not cause problems. Zig says it wants to replace C and it puts out builds that purposely break by default.


Ugh, I was planning to learn zig at one point but things like this mean I'm going to pass. This doesn't even seem to be in the documentation that I could find.

Throw in their plans to add a package manager and it's own make system and it seems to be getting further and further from the C replacement I was interested in.


I'm assuming it's only illegal as whitespace? In other words, they're allowed in string literals, right?


Correct


Because the language is attempting to have a standardized formatter. I'm all for it, personally.


Really? I might have to take a look at this language then. Making tabs illegal whitespace might very well be a proxy for other good design decisions.


Zig is lower level than either (it's more like C than C++: expect to manually call 'free'on allocated values). It has a lot of nice improvments over C however, like proper arrays, no null, a module system, and proper compile time evaluation (no need for the hacky preprocessor).


I found the module system for C interop, quite innovative and was surprised to see that Rust didn't even have such a thing. If you compare bindgen to this approach, bindgen just seems like double work and harder to maintain autogenerated bindings.

I have seen both Zig and Swift use the module system very well despite several devs strangely saying its somewhat complicated, but when used for low-level development this makes Zig an interesting language to use for language bindings.


I wrote a library for ffi (nifs) in elixir using zig, and a long term goal is for it to be easier to use zig as an intermediary for c library ffi than it is to use c.

Already importing blas into elixir with it is crazy easy.


That sounds interesting. Do you have a link to a hex package or repo?



May I ask why people invent new languages rather than trying to extend or improve existing ones, like C?


You could argue that it's what these languages do, the C heritage is very strong in every one of them.

The problem is: do you want to maintain full backcompat or are you willing to break things to do it cleanly? If you do the former you end up with something like C++ which retains almost complete compatibility with C (but they you have a lot of baggage to carry around, which may get in the way at times) or you're willing to break things and then why not take the opportunity to improve the syntax and get rid of the cruft?

C is ancient now, its type system and its many quirks are quite far from the state of the art or language design, a modern language wouldn't have to mess with the nonsense that are, for instance, C arrays (including C++ which can't outright remove them but does everything it can to render them obsolete with std::vector and std::array).

An other big problem with C interop for newer languages is that while C itself is relatively small and easy the C preprocessor isn't. That's usually where the friction is, because if you want to maintain compatibility with C macros you have to choice but to implement the language syntax wholesale.


C was already ancient when it came to be.

In retrospective, the aversion of Go designers to current practices in other programming languages is quite similar to the aversion of them back when designing C, versus the other system programming languages that were being developed since 1961, like ESPOL, NEWP, PL/I, PL/S, BLISS.


C had a standard published in 2018. It's "ancient" in that it's not new, but not ancient in that it's abandoned or unused.


> If you do the former you end up with something like C++

That's a pretty awesome place to be, as C++ is one of the most successful programming languages in history.

Perhaps there are lessons there.

> or you're willing to break things and then why not take the opportunity to improve the syntax and get rid of the cruft?

Break things just for the hell of it is not much of a tech argument.

C is already pretty light, and already gave origin to successful programming languages such as C++ and Objective C. Unless you find a compelling reason to break backward compatibility in a very specific way, I don't see how that argument makes any sense.

> An other big problem with C interop for newer languages is that while C itself is relatively small and easy the C preprocessor isn't.

Arguably, the preprocessor is orthogonal to the programming language itself. I fail to see how that's relevant.


C++ is hugely successful, that's true, but is there a need for an other C++-style language? C++ already feels like 10 languages under a trenchcoat anyway, whatever your style you'll probably find a subset of it you'll like. I think it showed how powerful "C-with-classes-and-the-kitchensink" can be, and also the limits of the concept.

C is light but it does have some things worth breaking IMO. Type inference is something I dearly miss when I write C these days (and I do that a lot). C didn't have any generic programing for a long time (if you don't count macro soup, that is), now it has some very limited support but it still looks like banging rocks together compared to more modern languages.

C's unsafety is legendary, and segfaults a common problem even for experienced programmers. Rust's lifetimes makes them impossible by design for safe code.

You may not like that of course, but those are all good reasons for experimenting with other paradigms.

>Arguably, the preprocessor is orthogonal to the programming language itself. I fail to see how that's relevant.

Arguably it is, practically it very much isn't.


Multiple attempts have been done to fixing C security issues, but the community at large tends to refuse to adopt them, so the only way forward is to create other languages for the same domain.

Note that Objective-C and C++ are extensions to the C language, and both started as pre-processors that would generate C code.

Also C wasn't the first on its domain, just got lucky that UNIX got widespread adoption and then found its way outside UNIX, just like JavaScript eventually found a way outside the browser.


> May I ask why people invent new languages rather than trying to extend or improve existing ones, like C?

Because extending or improving a language nearly always means breaking compatibility. Nearly every language is "stuck in a local optimum". To get out of it, you have to kill assumptions of the language that lead to this local optimum.

Also it takes a lot more work to convince other people to adapt my changes than to write an implementation of a language. Add to this the fact that being a good programmer and being a good politician are rather independent skills.

Finally, there do exist issues that cannot be fixed, for example that the official ISO standard of the C language is not freely available (I am aware that there exist drafts in the internet).


A great feature I find with Rust not being superset of an unsafe language, is that the only option for you is to write safe code (unless you explicitly opt for `unsafe`).

Compare this to merely extending a language e.g adding smart pointers - you could always (mistakenly) implicitly fall back onto bad coding practices.


Existing languages can't be meaningfully improved without breaking compatibility with all existing software. Python 3 is a better language in every way but it took many years before it saw significant adoption.


Here is a talk I gave that directly addresses this question:

https://www.youtube.com/watch?v=Gv2I7qTux7g

(the title & abstract in the youtube description is the one I gave for the RFP; I ended up going in a slightly different direction than it once I actually made the talk)


Many of the features of new languages are their restrictions. How would you restrict C in a new version to e.g. remove null pointers in favor of optional types?


Yeah, why did people not fix ALGOL instead of writing new ones like C?


C can't break backwards compatibility and remain C. So... how would you make language changes without resulting in the entire C++ mess new languages are trying to get away from?


I find Zig's approach to generics to be very cool. Types are a first class citizen, so you basically get generics out of the box. Combined with Zig's `comptime` makes for writing very cool code. This solves lot of things you'd need macros in other languages to do.


this exactly, the language doesn't know about generics but having first class types + comptime allows you to do generics. Writing C feels like a chore now.


The Road to Zig explains the essence of the language:

https://youtu.be/Gv2I7qTux7g

C but with the problems fixed.


It doesn't fix use-after-free, double free().


There is some aphorism about perfection, the enemy of good, and some way in which they relate. It is relevant to understanding "why Zig?"


Object Pascal and Modula-2 dialects already offer better than Zig memory "safety", so "Why Zig" indeed.

I am all for any language that helps reduce C's usage, including Go, even when I dislike its design decisions.

Now Zig, if it doesn't fix what are about 60% of yearly expenses fixing security exploits, according to Google and Microsoft security reports, then one needs to analyse how much is it worth actually.


Bouncing on recently featured HN thread "Hello World": https://drewdevault.com/2020/01/04/Slow.html (and the HN thread: https://news.ycombinator.com/item?id=21954886)

According to this resource, Zig produces code that is very close to the hand written assembly for the simple use case of outputing "hello world" on the stdout.

This is to be taken with a grain of salt though, as of course, caring about the assembly output is a spectacular case of premature optimization. I guess it tells a bit about the goal of Zig as a low-level programming language though.


For a language that _competes_ with C, caring about how semantics map to hardware instructions is absolutely within the domain of concern. It may not be a top priority, but if you _ignore_ it too long, you'll make uninformed high level decisions that prevent entire classes of important low level optimizations without extensive re-thinking. So just do the thinking upfront and save yourself and your community the hassle.


> caring about the assembly output is a spectacular case of premature optimization

I don't agree! at least for operating systems.

Consider that some operating system code may be run a million times per second, on a million different machines (e.g: block system IO on linux). We very much want our assembly to be pristine in this case.

I also like the idea of "optimality" brought forward by zig. In the post you link, there's an ideal hello_world in asm, can we have a higher level language that doesn't sacrifice this ideal achieved by assembly?

Consider that some network cards are now capable of 400Gibps, and modern OSes are not capable of handling these linerate. I strongly believe the bottleneck should be in the hardware, if the software can't max out your hardware then your software has failed.

I like that zig focuses on optimality.


> Consider that some operating system code may be run a million times per second, on a million different machines (e.g: block system IO on linux). We very much want our assembly to be pristine in this case.

I would believe that most of the time of a processor is dedicated to run userspace code, not OS code (regarding to the work that must be done). At the end of the day, the OS is "only" a scheduler to share hardware resources between unrelated tasks.

> I like that zig focuses on optimality.

Optimality isn't required in many real world businesses. Optimality is often a tradeoff : with optimality you lose in flexibility. An optimal program with SISD instruction set isn't optimal anymore when SIMD instructions are introduced. Your "optimal" asm program is still optimal on x86, but also totally obsolete because it cannot use the latest NEON instructions from ARM or 64bit instructions.


> I would believe that most of the time of a processor is dedicated to run userspace code, not OS code (regarding to the work that must be done). At the end of the day, the OS is "only" a scheduler to share hardware resources between unrelated tasks.

Yeah, that's true if the OS is well-written and has had continual and extensive performance work done. Passing 200 Gbit of network traffic in software with firewalling is non-trivial and the number of cycles you get per packet is pretty modest. These things do matter.


> Passing 200 Gbit of network traffic in software with firewalling is non-trivial and the number of cycles you get per packet is pretty modest

If that's the only function of the program, are we still talking about an "Operating System" ?


> ... caring about the assembly output is a spectacular case of premature optimization.

This is a hobgoblin. Assembly output of a compiler is the very baseline of performance. It requires no effort on behalf of the programmer (aside from learning a language).

Now, if you're skimming the assembly after every compilation and, say, fuzzing your implementation to coax the compiler to emit the best possible assembly... that's probably premature. If you're writing inline assembly before doing a higher level implementation, that's probably premature.

But choosing a language on the basis of its performance:effort ratio is downright pragmatic.


Assembly output is important when you are trying to understand an exploit, or to make sure none can be produced.

Can be important for kernel stuff.


and somehow you forgot about D


Author, if you decide to build more of this kernel any thoughts on providing live screencasts of the implementation like Andrew Kelly (Zig) and Andreas Kling (SerenityOS) on Youtube and/or Twitch. I never realized how effective it is for me to watch others go through the mental process of coding/debugging.


Hi, that's a great idea, it would be a good exercise for me especially. I always enjoy watching Andy's live zig coding on youtube.


Great work! I'm just behind you working on my own alternate language x86 kernel in Ada: https://github.com/ajxs/cxos

Admittedly I don't know much about Zig, but it's good to see people investigating languages other than C. I'm still not convinced of Rust's merits in this area, we'll see how this develops with time.


Zig is so much better than Rust for these sorts of tasks. Readable and maintainable. The Rust astroturf-brigade tries hard to make it fit any situation. I'm glad you have provided a substantial concrete counterexample.


Looks awesome! Planning to do a similar thing in Jai once it comes out


I don't follow JAI but maybe Jonathan Blow should try zig, it's a great fit for game development and I remember him mentioning that he wants a language that is fun to write in.


zig is following pretty much different path from jai afaik. zig prefers everything explicit and is very verbose whereas jai has many implicit things and also has macros or something similar. Anyways, more will be seen when it is released.


Maybe you can enlighten me, but from my view it seems the biggest difference is that zig exists and jai doesnt (yet?); at least for practical purposes.


Blow is pretty familiar with Zig, there has been some collaboration between the two creators.


There's still zero evidence that the compiler or language ever have been or ever will be called "Jai," aside from the fact that it is the tenuous file extension being used in development streams.


It's the best short name we can use right now so it's the one we should use IMO. If it gets a different name, we'll start calling it differently, I don't mind.


This looks really cool, well done. Do you mind sharing the resources you’ve used so far? I see many, many tutorials on OSDev...


OSDev can be hit or miss. All of the bootstrapping was done following this [0] tutorial for Rust, which translates easily to zig. For the more advanced parts you can find usefull ones in [1] or use the OSDev search bar. For even more advanced topics though you'll find that there are no tutorial and only a few open source implementations to take inspiration from!

ps: I've updated my readme with a few references

[0] https://os.phil-opp.com/

[1] https://wiki.osdev.org/Tutorials


I'd rather you expose a FUSE interface first than expose ext2.... Just a totally random suggestion.


Love the fact that people are developing kernels in new languages!


Are you planning to write up a Tutorial about this?


I'm more on the reading end of tutorials right now. If I come up with something original that doesn't have an osdev page I'll contribute to the wiki.


Hi Jack, just wondering why not use Dlang since it's more stable than Zig and you can use the subset of D as better C if you wish to stay simple?


Great work!


I tried to do this with Zig about 13 months ago. It was not where it needed to be at that time; the biggest impediments were its rudimentary handling of C pointers to one vs pointers to many (which has long since been fixed), and its meta programming issues (lack of a macro language or pre-processor) that made OS development tedious. I have not revisited it as much as I would have liked simply because I chose to step back a bit on implementation and focus on theory.

I'm pulling for Andrew. He busts his rear-end, livestreams, and is generally a good dude. Zig has a TON of potential.


I've been waiting for something that looks like a firm step forward in the domain of games programming and has ideals which align with the domain. I'm extremely excited for zig and have been messing around with getting a smallish simulation running with SDL on Windows.

At work in C++ I've switched from working on fairly isolated types, where my changes had fast recompile times to lower level changes which cause a good chunk of the engine to recompile. 10 minute compile times, with a lot of tech trying to get that time down as much as possible, are a huge killer to productivity and I can feel myself getting much less done than I was before.

Zig tossed away a lot of the constructs that make C++ slower to compile. I haven't had a chance to see its timings on large projects, but stuff like Jai compiling 90k LoC full commercial game project live on a laptop in 1.4 seconds (which caused Jonathan Blow to say "what? That's weirdly slower than it should be...") gives me hope that Zig is similar.


Here is a microkernel written in Zig: https://github.com/AndreaOrru/zen


I really wish Zig picks up and goes mainstream. I also wish it gets a little (not too much) of higher level features. Something in line of simple OOP.

I would even start using it now for some smaller projects that are not vital but I was stalled on Zig not being able to compile some C (it claims to do that and it does but not in my cases). Sure I could do import but would rather prefer for Zig feature to work


Check out zig's "Safety" project https://github.com/ziglang/zig/projects/3#card-27896159 to get a view of what sort of safety is being planned.

I'd like to state that after significant time and effort with Rust, I also think it's too complex for what it's protecting us from. Zero-cost indeed does not refer to the cognitive price. It's better than C++ though, in every way except popularity.


Very interesting. Keep up the good work!


It would be nice if you put up a design document.

I'd rather read that first to get my bearings, as opposed to opening up a folder full of gibberish code.

Usually a well written design document explains the high level constructs of the project. Then some mid-level documents ties in the design document, with the implementation code.

This makes the project more intelligible, and allows for a layman to jump in, and follow along.


I'm also trying to do something very similar. I can use it as a reference. Thanks for sharing!


The hello world examples for master and 0.5.0 seem quite different - any trouble keeping up with the changes?


migration from 0.4 to 0.5 was ok. AFAICT there the big change for 0.6 syntax wise is the drop or varargs in favour of tuples. I'll take some time to do the switch when 0.6 hits but I'm confident it won't take long because of how concise the language is.


0.6.0 is scheduled for April 13, 2020. Let's make sure your project builds successfully and correctly with master branch 1-3 weeks before that, and I'll prioritize any bugs before the release that affect you.

(This goes for all zig community members with active projects)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: