There's solid reason for the translation here; the Zig core team is aiming to eliminate duplicated code and C functions, and avoid the need to track libc from multiple sources. In the future, LLMs could serve as part of this, but they are currently quite terrible at Zig (as far as I understand it, it's not a lack of Zig code samples, it's an imbalance of OLD Zig to NEW Zig, as Zig changes quite frequently).
You would need to consider if it is even worth it translating your C code. If the paradigm is identical and the entire purpose would be "haha it is now one language," surely you could just compile and link the C code with libzigc... In my opinion, it's not worth translating code if the benefit of "hey look one language" requires the cost of "let's pray the LLM didn't hallucinate or make a mistake while translating the code."
In the theoretical world where a subset of English could be formalized and proven and compiled, the complexity of the language would reduce my willingness to use it. I find that the draw of AI comes from it's "simplicity," and removing that (in favor of correct programs) would be pointless - because such a compiler would surely take forever to compile "English" code, and would not be too different from current high level languages, imo.
It is my opinion that even if Zig were nothing more than a syntactical tweak of C, it would be preferable over C. C has a lot of legacy cruft that can't go away, and decades of software built with poor practices and habits. The status-quo in Zig is evolving to help mitigate these issues. One obvious example that sets Zig apart from C is error handling built into the language itself.
Can you be more precise? Memory safety does not only affect C programs and - despite people repeating this over and over - I do not believe it is true that it is actually harder to build safe programs in C compared to many other languages. Ada and Rust have certainly some advantage, but I think also this is exaggerated.
Memory corruption caused by lack of bounds checking, or vocabulary types like the ones provided by SDS and glib.
Microsoft had to come up with SAL during Windows XP SP2, Apple with -fbounds-safety alongside Safe C dialect for iBoot firmware, Oracle with ADI on Solaris/SPARC, Apple's ARM PAC extension, ARM and Microsoft's collaboration on Pluton and CHERI Morello, Apple, Microsoft, Google and Samsung's collaboration on ARM's MTE.
Lots of money being spent on R&D, for something WG14 considers exaggerated.
Those extensions are mostly to be able to enhance safety of legacy code though and not necessary when writing new code that is safe. But it is only the later that is relevant when comparing to new alternative languages.
Where can we find examples of such newly written C code that is memory safe, given that language features that would make it possible are only now under discussion for future standards, after governments and industry pressure?
Basically any code that does not use pointer arithmetic or raw pointer dereferences and instead puts string handling and buffer management behind abstractions.
The new feature we will put in are not to enable safe programming, but to make it more convenient and to make safety demonstrable.
And I wish there was actually some real industry interest to pushing this forward. Industry seems more interested into claiming this an unfixable problem and we all have to switch to Rust, which gives them another decade of punting the problems with existing code.
I could help with searching some public examples if you really do not know any code that uses safe string abstractions in C. But my original aim was to understand what specific "legacy cruft" in C is seen as problematic, and why its presence requires an entirely language to fix. So far, I did not get a good answer to this. I certainly do know some legacy cruft I want to see go but its presence does not prevent me from writing bounds-safe code.
WG14 is a very small number of volunteers. It would help if the industry would actually invest development resources for safety on the compiler / language side and in cleaning up legacy code.
Exactly, there are very few pearls of best practices in C, and those that exist, are probably in high integrity computing, with the relevant certification costs.
When all major OS vendors, some of whom are also compiler vendors, see more return into investment, contributing their money to alternative language foundations, or open source projects, than sending their employees to either WG14, or WG21, it is kind of clear ISO isn't going the way they would like to.
I would not call this an exaggeration, rather not listening.
Additionally, it would not surprise me if one of Zig, Odin, Rust eventually started popping up on console DevKits, or Khronos standards as well.
I don't know. WG21 is very big. WG14 is very small, so it is different. But in neither there is ISO going in some direction. Whoever shows up can influence the standard. Some of my proposals towards safety were opposed by compiler vendors because they do not want to put up too much pressure on their customer upgrading their legacy code. But of course, rewriting the code in another language would be much more effort than maintaining it... So I think the true answer is that nobody want to invest in maintenance of legacy code. But this will increasingly also be a problem for other languages once they are not young anymore.
As far as I know, Zig has a bunch of things in the works for a better development experience. Almost every day there's something being worked on - like https://github.com/ziglang/zig/pull/24124 just now. I know that Zig had some plans in the past to also work on hot code swapping. At this rate of development, I wouldn't be surprised if hot code swapping was functional within a year on x86_64.
The biggest pain point I personally have with Zig right now is the speed of `comptime` - The compiler has a lot of work to do here, and running a brainF** DSL at compile-time is pretty slow (speaking from experience - it was a really funny experiment). Will we have improvements to this section of the compiler any time soon?
Overall I'm really hyped for these new backends that Zig is introducing. Can't wait to make my own URCL (https://github.com/ModPunchtree/URCL) backend for Zig. ;)
For comptime perf improvements, I know what needs to be done - I even started working on a branch a long time ago. Unfortunately, it is going to require reworking a lot of the semantic analysis code. Something that absolutely can, should, and will be done, but is competing with other priorities.
I am not sure, but why can't C,Rust and Zig with others (like Ada,Odin etc.) and of course C++ (how did I forget it?) just coexist.
Not sure why but I was definitely getting some game of thrones vibes from your comment and I would love to see some competition but I don't know, Just code in whatever is productive to you while being systems programming language I guess.
But I don't know low level languages so please, take my words at 2 cents.
I am just watching the Game of Throne series right now, so this comment sounds funnier than it should to me :D.
The fight for the Iron Throne, lots of self-proclaimed kings trying to take it... C is like King Joffrey, Rust is maybe Robb Stark?! And Zig... probably princess Daenerys with her dragons.
The industry has the resources to sustain maybe two and a half proper IDEs with debuggers, profilers etc.. So much as we might wish otherwise, language popularity matters. The likes of LSP mitigate this to a certain extent, but at the moment they only go so far.
That's where extensible IDEs like VSCode (and with it the Language Server Protocol and Debug Adapter Protocol) come in.
It's not perfect yet, but I can do C/C++/ObjC, Zig, Odin, C3, Nim, Rust, JS/TS, Python, etc... development and debugging all in the same IDE, and even within the same project.
For Virgil I went through three different compile-time interpreters. The first walked a tree-like IR that predated SSA. Then, after SSA, I designed a linked-list-like representation specifically for interpretation speed. After dozens of little discrepancies between this custom interpreter and compile output, I finally got rid of it and wrote an interpreter that works directly on the SSA intermediate representation. In the worst case, the SSA interpreter is only 2X slower than the custom interpreter. In the best case, it's faster, and saves a translation step. I feel it is worth it because of the maintenance burden and bugs.
It's a funny question because, as far as I'm aware, Zig Software Foundation is the only organization among its peers that spends the bulk of its revenue directly paying contributors for their time - something I'm quite proud of.
Hot code swapping will be huge for gamedev. The idea that Zig will basically support it by default with a compiler flag is wild. Try doing that, clang.
Totally agree with that - although even right now zig is excellent for gamedev, considering it's performant, uses LLVM (in release modes), can compile REALLY FAST (in debug mode), it has near-seamless C integration, and the language itself is really pleasant to use (my opinion).
Yes, at a huge cost. That only works on Microsoft platforms.
MSVC++ is a nice compiler, sure, but it's not GCC or Clang. It's very easy to have a great feature set when you purposefully cut down your features to the bare minimum. It's like a high-end restaurant. The menu is concise and small and high quality, but what if I'm allergic to shellfish?
GCC and Clang have completely different goals, and they're much more ambitious. The upside of that is that they work on a lot of different platforms. The downside is that the quality of features may be lower, or some features may be missing.
I ended up switching from Zig to C# for a tiny game project because C# already supports cross-platform hot reload by default. (It’s just `dotnet watch`.) Coupled with cross-compilation, AOT compilation and pretty good C interop, C# has been great so far.
That is basically what they do when using Lua, Python, C#, Java, but with less parenthesis, which apparently are too scary for some folks, moving from print(x) to (print x).
There was a famous game with Lisp scripting, Abuse, and Naughty Dog used to have Game Oriented Assembly Lisp.
I had exactly the same title in mind, remember my very young self being in shock when I learned that it was lisp. If you didn't look under the hood you'd never be able to tell, it just worked.
Is comptime slowness really an issue? I'm building a JSON-RPC library and heavily relying on comptime to be able to dispatch a JSON request to arbitrary function. Due to strict static typing, there's no way to dynamically dispatch to a function with arbitrary parameters in runtime. The only way I found was figuring the function type mapping during compile time using comptime. I'm sure it will blow up the code size with additional copies of the comptimed code with each arbitrary function.
Yes, last time I checked, Zig's comptime was 20x slower than interpreted Python. Parsing a non-trivial JSON file at comptime is excrutiatingly slow and can take minutes.
I would argue that it's not meaningful to do so for larger files with comptime as there doesn't seem to be a need for parsing JSON like the target platform would (comptime emulates it) - I expect it to be independent. You're also not supposed to do I/O using comptime, and @embedFile kind of falls under that. I suppose it would be better to write a build.zig for this particularly use case, which I think would then also be able to run fast native code?
Is it easy to build out a custom backend? I haven't looked at it yet but I'd like to try some experiments with that -- to be specific, I think that I can build out a backend that will consume AIR and produce a memory safety report. (it would identify if you're using undefined values, stack pointer escape, use after free, double free, alias xor mut)
URCL is sending me down a rabbithole. Haven't looked super deeply yet, but the most hilarious timeline would be that an IR built for Minecraft becomes a viable compilation target for languages.
To be honest your snippet isn't really C anymore by using a compiler builtin. I'm also annoyed by things like `foo(int N, const char x[N])` which compilation vary wildly between compilers (most ignore them, gcc will actually try to check if the invariants if they are compile time known)
> I haven't really understood the advantage compared to simply running a program at build time.
Since both comptime and runtime code can be mixed, this gives you a lot of safety and control. The comptime in zig emulates the target architecture, this makes things like cross-compilation simply work. For program that generates code, you have to run that generator on the system that's compiling and the generator program itself has to be aware the target it's generating code for.
It also works with memcpy from the library: https://godbolt.org/z/Mc6M9dK4M I just didn't feel like burdening godbolt with an inlclude.
I do not understand your criticism of [N]. This gives compiler more information and catches errors. This is a good thing! Who could be annoyed by this: https://godbolt.org/z/EeadKhrE8 (of course, nowadays you could also define a descent span type in C)
The cross-compilation argument has some merit, but not enough to warrant the additional complexity IMHO. Compile-time computation will also have annoying limitations and makes programs more difficult to understand. I feel sorry for everybody who needs to maintain complex compile time code generation. Zig certainly does it better than C++ but still..
> I do not understand your criticism of [N]. This gives compiler more information and catches errors. This is a good thing!
It only does sane thing in GCC, in other compilers it does nothing and since it's very underspec'd it's rarely used in any C projects. It's shame Dennis's fat pointers / slices proposal was not accepted.
> warrant the additional complexity IMHO
In zig case the comptime reduces complexity, because it is simply zig. It's used to implement generics, you can call zig code compile time, create and return types.
Then the right thing would be to complain about those other compilers. I agree that Dennis' fat pointer proposal was good.
Also in Zig it does not reduce complexity but adds to it by creating an distinction between compile time and run-time. It is only lower complexity by comparing to other implementations of generic which are even worse.
C also creates a distinction between compile-time and run-time, which is more arcane and complicated than that of Zig's, and your code uses it, too: macros (and other pre-processor programming). And there are other distinctions that are more subtle, such as whether the source of a target function is available to the caller's compilation unit or not, static or not etc..
C only seems cleaner and simpler if you already know it well.
My point is not about whether compile-time programming is simpler in C or in Zig, but that is in most cases the wrong solution. My example is also not about compile time programming (and does not use macro: https://godbolt.org/z/Mc6M9dK4M), but about letting the optimizer do its job. The end result is then leaner than attempting to write a complicated compile time solution - I would argue.
I don't know, to me it seems the blog tries to make the case that comptime is useful for low-level optimization: "Is this not amazing? We just used comptime to make a function which compares a string against "Hello!\n", and the assembly will run much faster than the naive comparison function. It's unfortunately still not perfect." But it turns out that a C compiler will give you the "perfect" code directly while the comptime Zig version is fairly complicated. You can argue that this was just a bad example and that there are other examples where comptime makes more sense. The thing is, about two decades ago I was similarly excited about expression-template libraries for very similar reasons. So I can fully understand how the idea of "seamlessly weaves comptime and runtime together" can appear cool. I just realized at some point that it isn't actually all that useful.
> But it turns out that a C compiler will give you the "perfect" code directly while the comptime Zig version is fairly complicated.
In this case both would (or could) give the "perfect" code without any explicit comptime programming.
> I just realized at some point that it isn't actually all that useful.
Except, again, C code often uses macros, which is a more cumbersome mechanism than comptime (and possibly less powerful; see, e.g. how Zig implements printf).
I agree that comptime isn't necessarily very useful for micro optimisation, but that's not what it's for. Being able to shift computations in time is usedful for more "algorithmic" macro optimisations, e.g. parsing things at compile time or generating de/serialization code.
Of course, a compiler could possibly also optimize the Zig code perfectly. The point is that the blogger did not understand it and instead created an overly complex solution which is not actually needed. Most C code I write or review does not use a lot of macros, and where they are used it seems perfectly fine to me.
To each their own, I guess. I still find C to be so much cleaner than all the languages that attempt to replace it, I can not possibly see any of them as a future language for me. And it turns out that it is possible to fix issues in C if one is patient enough. Nowadays I would write this with a span type: https://godbolt.org/z/nvqf6eoK7 which is safe and gives good code.
Something like this: https://godbolt.org/z/er9n6ToGP It encapsulates a pointer to an array and a length. It is not perfect because of some language limitation (which I hope we can remove), but also not to bad. One limitation is that you need to pass it a typedef name instead of any type, i.e. you may need a typedef first. But this is not terrible.
Thanks, this is great! I've been having a look at your noplate repo, I really like what you're doing there (though I need a minute trying to figure out the more arcane macros!)
In this case, the generic span type is just
#define span(T) struct CONCAT(span_, T) { ssize_t N; T* data; }
And the array to span macro would just create such an object form an array by storing the length of the array and the address of the first element.
#define array2span(T, x) ({ auto __y = &(x); (span(T)){ array_lengthof(__y), &(__y)[0] }; })
You can change the `target` in those two linked godbolt examples for Rust and Zig to an older CPU. I'm sorry I didn't think about the limitations of the JS target for that example. As for your link, It's a good example of what clang can do for C++ - although I think that the generated assembly may be sub-par, even if you factor in zig compiling for a specific CPU here. I would be very interested to see a C++ port of https://github.com/RetroDev256/comptime_suffix_automaton though. It is a use of comptime that can't be cleanly guessed by a C++ compiler.
I just skimmed your code but I think C++ can probably constexpr its way through. I understand that's a little unfair though because C++ is one of the only other languages with a serious focus on compile-time evaluation.
I don't love the noise of Zig, but I love the ability to clearly express my intent and the detail of my code in Zig. As for arithmetic, I agree that it is a bit too verbose at the moment. Hopefully some variant of https://github.com/ziglang/zig/issues/3806 will fix this.
I fully agree with your TL;DR there, but would emphasize that gaining the same optimizations is easier in Zig due to how builtins and unreachable are built into the language, rather than needing gcc and llvm intrinsics like __builtin_unreachable() - https://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Other-Builtins....
It's my dream that LLVM will improve to the point that we don't need further annotation to enable positive optimization transformations. At that point though, is there really a purpose to using a low level language?
> LLVM will improve to the point that we don't need further annotation to enable positive optimization transformations
That is quite a long way to go, since the following formal specs/models are missing to make LLVM + user config possible:
- hardware semantics, specifically around timing behavior and (if used) weak memory
- memory synchronization semantics for weak memory systems with ideas from “Relaxed Memory Concurrency Re-executed” and suggested model looking promising
- SIMD with specifically floating point NaN propagation
- pointer semantics, specifically in object code (initialization), se- and deserialization, construction, optimizations on pointers with arithmetic, tagging
- constant time code semantics, for example how to ensure data stays in L1, L2 cache and operations have constant time
- ABI semantics, since specifications are not formal
LLVM is also still struggling with full restrict support due to architecture decisions and C++ (now worked on since more than 5 years).
> At that point though, is there really a purpose to using a low level language?
Languages simplify/encode formal semantics of the (software) system (and system interaction), so the question is if the standalone language with tooling is better than state of art and for what use cases.
On the tooling part with incremental compilation I definitely would say yes, because it provides a lot of vertical integration to simplify development.
The other long-term/research question is if and what code synthesis and formal method interaction for verification, debugging etc would look like for (what class of) hardware+software systems in the future.
For constant time code, it doesn’t matter too much if data spills out of a cache, constant time issues arise from compilers introducing early exits which leaves crypto open to timing attacks.
Yeah indeed. Having access to all those 'low-level tweaks' without having to deal with non-standard language extensions which are different in each C compiler (if supported at all) is definitely a good reason to use Zig.
One thing I was wondering, since most of Zig's builtins seem to map directly to LLVM features, if and how this will affect the future 'LLVM divorce'.
Good question! The TL;DR as I understand it is that it won't matter too much. For example, the self-hosted x86_64 backend (which is coincidentally becoming default for debugging on linux right now - https://github.com/ziglang/zig/pull/24072) has full support for most (all?) builtins. I don't think that we need to worry about that.
It's an interesting question about how Zig will handle additional builtins and data representations. The current way I understand it is that there's an additional opt-in translation layer that converts unsupported/complicated IR to IR which the backend can handle. This is referred to as the compiler's "Legalize" stage. It should help to reduce this issue, and perhaps even make backends like https://github.com/xoreaxeaxeax/movfuscator possible :)
You would need to consider if it is even worth it translating your C code. If the paradigm is identical and the entire purpose would be "haha it is now one language," surely you could just compile and link the C code with libzigc... In my opinion, it's not worth translating code if the benefit of "hey look one language" requires the cost of "let's pray the LLM didn't hallucinate or make a mistake while translating the code."