Hacker News new | past | comments | ask | show | jobs | submit login
An intro to Zig's integer casting for C programmers (lagerdata.com)
129 points by jorangreef on May 11, 2021 | hide | past | favorite | 131 comments



> If runtime safety is turned off, you get undefined behavior

You already know someone will teach their students to always have it off because its "slower" or something.

Make something idiot-proof and the world invents a better idiot.

Another language that does safety like this incredibly well is Ada (Ada/SPARK), and I'm unsure why people aren't more hyped about it. So many people hype Rust or Zig or whatever new language, and yet there's an incredibly safe and expressive language that's used in high-reliability applications like missile guidance systems, that nobody seems to talk about.


Not to sound too aggressive, I'm not sure why ada programmers keep being surprised at the hype.

Safety (of all kinds) is not the only selling point of Rust and Zig, and the fact that Ada also have safety measures doesn't mean that it's instantly an option for me.

For example, I keep seeing people hype about Zig/Rust, for all kinds of reasons; and the way that they present their arguments is very compelling. While I've heard of Ada before, AFAICT the only context in which it is brought up is to downplay hype over Rust/Zig; never as a standalone.

Sounds like a PR problem, mostly.

EDIT: Not to say that there aren't other problems, just saying that this point alone isn't some magic bullet.

I don't like Rust because it gives me memory safety, tons of languages do that. I love Rust because the tradeoffs it gives me worth the switch, and the overall tooling and ecosystem are very well made.


> Sounds like a PR problem, mostly.

It's the case that the Zig creator did put a high priority on soft "PR" for the language, the first hire was a developer advocate/community manager. I think this is a great case of "lessons learned" from the history of programming languages.


> I don't like Rust because it gives me memory safety, tons of languages do that. I love Rust because the tradeoffs it gives me worth the switch, and the overall tooling and ecosystem are very well made.

Same applies to the tooling of Ada/SPARK, but as you have said, it is pretty much a PR issue.


I'm not saying it's justified, but I think the syntax is also a big impediment (just like for my daily driver, OCaml). Ada looks pretty verbose and annoying to write code in, at least if one wants to use it for side projects, small utilities, etc. rather than spaceship firmware; and that's probably how a lot of languages become popular, you need to tinker with them on small things.

OCaml has seen the alternative "Reason" syntax (JS-like) grow in popularity. Maybe a C-like (or better, rust-like) syntax for Ada would help the language's adoption. People already using Ada would complain that the current syntax is fine but they're not the target demographic :-)


My opinion: I really dislike Rust's syntax because to me it is similar to Perl's that I also do not like that much as I do not know what is happening just by looking at it (I still like to use Perl here and there, but these days I lean towards Tcl more), it seems hidden from me, even compared to something as C where I have to think about conversions and whatnot.

I can easily read and understand someone's Ada, C, or OCaml code and actually know what is going on, but I cannot read and understand even my own Perl code (after a week or two), or other people's Rust code. I would rather prefer reference implementations to be in C than in Rust, too. If I read the C version, I can easily port it to any language vs. if it were in Rust.


Thank you for the answer :-). I'm still surprised because to me, Rust looks a _lot_ like OCaml with curly braces. The surface lexical conventions are mixed with C++ (like `::` for namespacing) but code still looks more like OCaml than C, with `let` bindings, sum types, expressions, `match` being omnipresent. Even the type annotations look more ML-ish than the type-first C convention. I fail to see the relation to perl (there's barely any `$` in rust code! and most sigils are still the & related to references that also abound in C).


I compared it to Perl because of its heavy use of symbols in general. In Rust, you can have 6 symbols next to each other and that is valid code. See below.

OCaml does not look like Rust to me (OCaml seems more consistent, and Rust seems like a mixed bag of everything to me). It does not hide as much behind symbols for example as Rust does, I think, and learning it was really easy. I tried to read Rust, I really did, but everything is so hidden from me. I did not mean to say that any of the mentioned languages are anywhere close to C though. In any case, if I had to look at reference implementations, C would be the best (to me). It is really easy to know what is going on and why, and thus it is easy to implement it in any languages I know. Probably because it is not a language with many high-level abstractions or language constructs (syntactic sugars) that hide what is going on.

Example snippets as to why I dislike Rust:

  let mut parents_array = ArrayVec::<[&[u8; BLOCK_LEN]; MAX_SIMD_DEGREE_OR_2]>::new()

  let input_ptrs: &[*const u8; DEGREE] = &*(inputs.as_ptr() as *const [*const u8; DEGREE]);
And this is nothing, there are much worse ones, and I would have to give you files for that, but pretty much any famous Rust project is difficult to read for me. I really do not understand most Rust code, I wonder if the problem is with me.


> Ada looks pretty verbose and annoying to write code in,

I still dont understand this, you spend 80% of the time thinking about code, 19% reading, and 1% typing.


I don't have this experience. I like to prototype by writing code (or type signatures), and then refactor a lot from the initial solution. That's easier to do if the syntax is reasonably concise (the other half is to have good types which Ada definitely provides).


80% thinking when you're stuck and don't know who to ask. I've made the same mistake. Have you watched some streamers? They are prototyping by typing, I would say, 50% of the time.


I never used Ada, but read a book on it.

I don't even remember much, just that how much I loved just the look of the example programs, but that was during community college when I was new to programming. It wasn't part of a class, but the guy who implemented the curriculum just jammed the library with jewels and the entire library collection was just full of classics especially since the actual curriculum was just a basic associates in "business data processing."


From what I can tell, safety isn't a selling factor of Zig. From a "safety" perspective Zig seems like a step backwards compared to the latest generation of languages, and Rust in particular. Zig's ergonomics seem decent but its memory safety tact appears to basically be to include valgrind-like tools into debug builds with good PR.


Not a Zig expert, but safety is a factor for Zig, it just treats it as less of an absolute than Rust. I think the thing to keep in mind is that something can be a priority without being an absolute priority. I'd make a comparison to OpenBSD vs Linux. Both have security as a priority, OpenBSD just has a more absolute focus on it.

For example, a couple of features come together really nicely to make memory safety easier to test in Zig: * You need a reference to an Allocator to be able to allocate memory, so as a general rule, the caller can control which allocator is used. * Unit testing is integrated well into the language. * Therefore, you can create an allocator for each unit test, and fail the test at the end if any memory was leaked. * This process can also happen at the application level with the General Purpose Allocator, which can let you print an error when the program exits if anything was leaked.

The above doesn't solve every memory safety problem (and there are other features like native bounds-checked slices that solve other kinds of issues), but it provides an extra layer that can probably get us quite far into the "quite safe" camp.


This. AFAICT Memory leaks are not practical to test in rust (note this is not the same as detectable), but basically come for free in zig tests.


Except that level of security I could already do with Pascal dialects like Turbo Pascal or Modula-2, hence why I really don't see much value in Zig, other than being more appealing to younger generations.


Putting valgrind into the stdlib is really clever, and also, I like memory safety being the carrot to get you to write tests. I worry having a 'safe' system like rust sometimes causes very smart (TM) developers, especially less experienced ones, to be complacent and write less tests.


Writing fewer tests is somewhat justified when you can encode invariants in the type system. It depends on the level of reliability you require of course. But my Rust code without tests has been comparably reliable to my code in other lanaguages eith tests.


No tests? How do you refactor while ensuring your business logic invariants?



Back when Ada was as young as Rust and Zig, we only had magazines like BYTE and Dr Dobbs, and BBSs for driving hype.

There were no free beer compilers and no 8/16 bit home computer was capable of hosting an Ada compiler, Basic, Pascal and Modula-2 was the best they could manage.


Sorry to hear that


Ada / spark may be a great thing. But how do people know?

Is there a good compiler that one can install for free?

Are there a few good and comprehensive books / references / guides, available online for free?

If you want something to become popular, give it away. Ideally, push it. At the very least, have a limited free version. Or turn a bling eye to small-time piracy, as many vendors did for a long time. Or make it dead easy to buy it for $10.

(None of the above works for you? Sorry, you have just turned down 99% of kids who might like to tinker with your tech, love it, and then promote it wherever they go. Your tech may be hot and desired in the narrow circle of pros, but it's not going to become popular and win the world.)


Not sure why you think ADA/Spark isn't free. There's GNAT[0], an open source ADA compiler that's part of the GNU toolchain, a fairly good book teaching ADA and SPARK[1] by Ada Core, and plenty of tutorials.

Accessibility is really not the problem. The problem is, Ada is an old language that has many similar problems to C: lack of a good package management story, and a design by committee making language evolution glacially slow (though that also doubles as a feature). It also lacks a vibrant ecosystem like we can find in other popular languages.

Furthermore, SPARK in particular takes the concept of safety much further than Rust or Zig do right now, as it allows proving the correctness of a program according to a formal specification. Rust/Zig only care about proving the absence of UB. This makes SPARK much more complex, and thus have a much higher barrier of entry.

[0]: https://www.gnu.org/software/gnat/

[1] https://learn.adacore.com/index.html


Your first link pretty much says it all: barebones website 5 years out of date, no docs, no resources, no community links, no indication that this project is even alive. To top it off the project is named after one of the most annoying creatures on the face of the Earth. This is a marketing issue.

The problem for older languages is that the bar for what's expected of a language has risen substantially. A long time ago, languages weren't even expected to have implementations. Today, not only are implementations expected for all major platforms, languages are also expected to run on phones and the browser, include package managers and library repositories, have a language server implementation and editing modes for all major IDEs, be open source with active development, provide extensive documentation on the scale of a book, and also promote a vibrant and active community. Oh, and to top it off, users don't want to pay for any of that. It's just expected, which is why most languages these days only come out of large tech companies that can afford to fund all of the above with no expectation of profit.


> Your first link pretty much says it all: barebones website 5 years out of date, no docs, no resources, no community links, no indication that this project is even alive.

A better starting point might be http://www.GetAdaNow.com/


Judging the state of affairs there might as well go with ATS. http://www.ats-lang.org/


There's an Ada compiler as part of gcc.

The most prominent project that i know of written in ada is ghdl.


There was at least always the recommendation to only use the version of the Ada compiler released by the FSF as the regular GCC one is/was more encumbered.


Are they not the same thing?

https://en.wikipedia.org/wiki/GNAT


> Make something idiot-proof and the world invents a better idiot.

Sure, but if you only ever give people safety scissors then you're severely limiting the kinds of things that they can build. You have to let people who know what they are doing be able to do what they need to do, so you have to be able to turn the runtime safety off. Zig can do this at the scope level.


In my mind this isn't about safety scissors at all. I hate languages that in any way limit the kind of code I want to write, which is why I write C++ for the most part, but that's another rant for another day.

If you dont want full control like in C++, a language would be nice that has one specific style, one specific way to do things, and so on. Go(lang) is quite close to this, in my experience. You can't do thread safety wrong, because there's only really one way, and so on.

I would love a language with

1. Explicit nullable types (not like Java's "anything can be null" design), so everything is a non-nullable reference type unless its "optional<T>" for example. I strongly support the idea of a nullable type only being used where it being null carries information.

2. Strong typing - like, tungsten strong. Implicit conversions need to be compiler errors, no implicit constructors, etc. with simple explicit casting

3. One style - and it better not be Java-like forced-OOP.

4. Full and clear memory control, RAII support, no gc. I need my RAM for other stuff.

I'm not sure if there's a language like that out there. I don't care for library support, I just want a fun language.


I don't understand why there aren't any solutions like this. Why aren't there any languages with a good garbage collector but also let you turn it off and work with memory manually. Maybe there is and I don't know of it? Maybe garbage collected languages and manual memory languages require different design? I don't know.


Nim is built like this. The gc is opt-in by type and manual memory management is simple. All other types are stack allocated by default.

The new gc is pretty much compiler assisted scope managed smart pointers as well. Also when using the gc you can build your own types with custom create/free/copy/move operators to do what you want without worrying about a stop the world gc.

Using the arc gc pretty much compiles down to what you'd do manually. For cycles you can use the orc gc, again it's all opt in.

I find this a great balance of productivity and performance, where it's easy to have high control when you want it, and still get good performance when you're not bothered.


+1 for Nim and being usable without a GC or for manual memory management like dealing with C apis. Though to be fair, quite a few types in the stdlib are heap based. But you can make your own static heap pool.

With the new ARC & move semantics, Nim has hit a sweet spot, IMHO. It's like the language struggled to find a fit for a long time. But ARC allows very low overhead memory safety, perfect for mcu's and wasm in particular.

Though perhaps Swift could be a contender in this arena as well with its ARC, but it's development is so Apple centric.

What's interesting to me is that with move semantics and smarter compiler analysis, an ARC based GC overhead approaches that of Rust's compile time lifetime memory management. For any non-trivial program Rust seems to use a lot of Rc's or copies to get around lifetime analysis issues. So if the compiler can automatically figure out lifetimes in code it can eliminate many ARC operations.

I hope more languages adopt more flexible ARC based gc's or improve on rusts ergonomics.


Well, GC languages can’t really allow arbitrary pointer arithmetic (because it would make GC useless/unsafe), but languages that can do explicit stack—allocation are numerous, like C#, Go, I’m sure D as well. Nim also has an optional GC, so there are languages all around the spectrum.


for erlang, if you drop down to C you completely lose the garbage collector, but the docs for doing this show you how to set up a hook for your memory to be garbage collected just like any other first-class data in the system.

If you're really responsible, it's possible to use the system allocator to inform the system about how much memory your allocations consume so that the memory pressure triggers have a correct accounting of how much memory is being used.


You can use value types and do manual memory management in C#, F#, VB, D, Nim, Swift (RC is GC).

Then on the past languages that failed to gain steam, Mesa/Cedar, Modula-2+, Modula-3, Oberon, Oberon-07, Oberon-2, Active Oberon, Eiffel, Sing#, System C#, Component Pascal, among many others.

The only thing missing is that many are badly taught, use new everywhere and don't bother to fully understand all language features.


dlang has exactly this, with both marking functions as @nogc and being able to both use the gc in one block, and malloc free in the next. You can also limit D a subset called 'BetterC'. I Really enjoy being able to gc my way though a problem until I need some explicit memory structure for putting together a ECS or similar memory management heavy patterns.


My favorite example of this is probably Clean, which solves this using uniqueness types. You can have unique values that are managed using compile time memory management, but also are free to use garbage collected values.

It makes for a very performant pure functional language. My only gripe is that the stdlib is abysmal, and hardcoded into the compiler.


I think this will just lead to a split ecosystem with some libraries requiring a GC and others requiring manual memory management.


C# has GC by default, but you can do manual memory management in "unsafe" blocks


You can also do manual memory management in safe code.

- value types

- stackalloc is safe since C# 7

- SafeHandles since C# 2.0

- IDispose interface

- IDispose like Dispose() since C# 8

- Marshal class since C# 1.0

- Span<>() classes since C# 7


Forgot to mention:

- static lambdas since C# 9

- native function pointers since C# 9


Yes, sure, but have you actually used Ada in practice? It's not exactly ergonomic or straightforward either, it's not magic.

You don't hear about the struggles of using it in practice because the projects you've pointed out are usually confidential.

In my experience MISRA is a much more popular standard for secure programming in the defence industry versus Ada. MISRA is a C standard, and yes, it runs missile systems, Boeing braking systems, you name it.


Or Modula-2, NEWP, BASIC, or really most languages that aren't copy paste compatibel with C.

Ada suffered from its domain, original price of compilers, and the few UNIX vendors that cared to offer compiler like Sun, it was an additional acquisiation on top of UNIX SDK.

Everyone knows that only newbies do coding errors in C, so why spend the extra money? /s


few UNIX vendors that cared to offer compiler like Sun, it was an additional acquisiation on top of UNIX SDK.

The Sun C compiler was an additional cost item, just like everyone else.


Indeed, so if you already had to pay for C which is a required language on any UNIX, there is no way around it, why spend the extra dollars for Ada just to feel good?

That was my point.


Because it's a better language for a lot of use cases? But I concede that the history of the industry indicates that "don't care" is the norm.


Is anyone actively using Modula-2? Actively as in starting new projects, compiler improvements, tooling, community etc.


The GNU M2 compiler is kept up to date with GCC, it is just kept out of tree.

https://www.nongnu.org/gm2/homepage.html

However I doubt the language gets much use nowadays beyond some legacy code bases, its opportunity is now gone.


> Everyone knows that only newbies do coding errors in C, so why spend the extra money? /s

The Ariane 5 maiden flight disaster illustrated quite clearly that the choice of language has little influence on the actual correctness of a program.

Ada on its own is no better than Pascal in that regard. A formally verified and thoroughly tested MISRA C program can be safer and more correct than a sloppily written Ada program.

So the question is indeed not as rhetorical as one might think - why spend the extra money indeed? Isn't it better spent on verification, testing, tooling and culture, which benefits the development regardless of programming language?


Ariane 5 error was caused by the remaining 30% programming errors when we leave out the 70% of software failures caused by C typical errors.

So yes, it is quite worthy to reduce the amount of money spent in verification, testing, tooling and culture.

The alternative is to just give up that programmers will never learn and just force verification at hardware level, like Google is doing on Android.

"Memory Tagging for the Kernel: Tag-Based KASAN"

https://www.youtube.com/watch?v=f-Rm7JFsJGI

Oracle on Solaris SPARC,

https://docs.oracle.com/cd/E37838_01/html/E61059/gqajs.html

Apple on iOS,

https://developer.apple.com/documentation/security/preparing...

Microsoft on Azure Sphere,

https://www.microsoft.com/security/blog/2020/11/17/meet-the-...


Ironically enough, I recall reading that if the overflow hadn't triggered a hardware exception and instead was silently ignored, the first stage would have survived; the code in question had just been carried over from Ariane 4 and was no longer important for the operation of the booster..


That’s not an alternative to memory safety. That’s just a basic security measure even if you write everything in a safe language like Rust or Zig (let alone the fact that you have enormous C/C++ legacy codebases that are likely still seeing ongoing development due to switching costs). The reason is you will always have some amount of unsafe code when dealing with the hardware and this is a hardening measure to protect against slipups.


No. The Ariane 5 disaster was caused by a development team deliberately turning off safety checks. Ada is in all regards a 'safer' language than Pascal, as long as you don't disable the safety features.

http://www-users.math.umn.edu/~arnold/disasters/ariane5rep.h...

http://www.adapower.com/index.php?Command=Class&ClassID=FAQ&...


I think I'd like to get into Ada more, but I'm always a bit unsure on how to select the right toolchain (is the Dragonegg/LLVM option viable today?) etc.

I would love to see something like "Rustlings" for Ada, as I found that was a good way to practice not only writing code, but reading it as well.

I was able to self-teach Haskell and Erlang without any major problems, and even managed to ship applications written in it commercially.


Plenty of learning paths at https://learn.adacore.com/


Thanks, I'll check it out!

(Looks like I found a use for the new Rosetta on my M1 Mac.... I wonder when there will be builds for aarch64 for Darwin)


> Make something idiot-proof and the world invents a better idiot.

:D

My argument is that if I need that insane speed and safety features have to be off I should actively work for that not the other way around that I should actively work for safety.

Why? Because we forget to do things.

> Another language that does safety like this incredibly well is Ada (Ada/SPARK)

I wanted to learn Ada for a long time. Do you have a guide or tutorial how to get into it?


I once collected all of my posts about Ada which do contain lots of links that can get you started. I will not re-post them here, instead you may go to https://news.ycombinator.com/item?id=23808305. Check out the links under [2].

Especially these ones: https://news.ycombinator.com/item?id=21435869 and https://news.ycombinator.com/item?id=21437498


Thanks!


No problem! You might find this website useful, too, as it is filled with resources: https://www.adaic.org/learn/materials/


I loved reading about ada back in the day, how is it nowadays wrt network programming. I'm checking out gnat.sockets, does it have a snappy select() type thing like linux's epoll?

Sorry if a lazy question, last time I looked at ada was decades ago and I'm surprised to find enthusiasts for it on HN.

edit: an advantage for c is that you can get loads of examples just googling, sucks for less used languages I guess.


It has been a while, but you may want to look at:

- https://en.wikibooks.org/wiki/Ada_Programming/Libraries/GNAT... (search for "C select()")

- https://www.codelabs.ch/anet/

- https://github.com/samueltardieu/adasockets

- https://github.com/rtyler/ada-playground/blob/master/epollec...

Some old reading: https://brokenco.de/2013/04/07/async-with-ada.html

You may want to check how GNAT.Sockets and Anet are implemented, too. I know that GNAT code is heavily documented.


thanks man, your post is a great timesaver, that wikibooks link itself is packed with info!


Of course, you are welcome!



Most of Rust's checks are at compiletime, and Ada employs runtime checks as well.


It can employ, but you do not have to (you can turn off all runtime checks and use formal verification only, something that Rust cannot do (statically verify correctness the same way you can with SPARK using GNATprove)), it can all be done at compile-time with Ada/SPARK.

> These runtime checks[1] are costly, both in terms of program size and execution time. It may be appropriate to remove them if we can statically ensure they aren't needed at runtime, in other words if we can prove that the condition tested for can never occur.

> This is where the analysis done by GNATprove comes in. It can be used to demonstrate statically that none of these errors can ever occur at runtime. Specifically, GNATprove logically interprets the meaning of every instruction in the program. Using this interpretation, GNATprove generates a logical formula called a verification condition for each check that would otherwise be required by the Ada (and hence SPARK) language.

Additionally, in Ada/SPARK, you can formally verify tasks (concurrency), too: https://docs.adacore.com/spark2014-docs/html/ug/en/source/co....

Moreover:

> SPARK builds on the strengths of Ada to provide even more guarantees statically rather than dynamically. As summarized in the following table, Ada provides strict syntax and strong typing at compile time plus dynamic checking of run-time errors and program contracts. SPARK allows such checking to be performed statically. In addition, it enforces the use of a safer language subset and detects data flow errors statically.

Contract programming:

- Ada: dynamic

- SPARK: dynamic / static

Run-time errors:

- Ada: dynamic

- SPARK: dynamic / static

Data flow errors:

- Ada: -

- SPARK: static

Strong typing:

- Ada: static

- SPARK: static

Safer language subset:

- Ada: -

- SPARK: static

Strict clear syntax:

- Ada: static

- SPARK: static

Additionally, safe pointers in SPARK: https://blog.adacore.com/using-pointers-in-spark and https://arxiv.org/abs/1710.07047.

More information about Get_Line (i.e. even where you would think you cannot go static): https://blog.adacore.com/formal-verification-of-legacy-code.

[1] overflow check, index check, range check, divide by zero


> You already know someone will teach their students to always have it off because its "slower" or something.

Already happening with Nim.

But I'd expect a system programmer to know better. When to use release and when to use development mode.

Case in point, rust doesn't catch integer overflows in release build either. Maybe Efficiency is more important than we think?


> there's an incredibly safe and expressive language that's used in high-reliability applications like missile guidance systems, that nobody seems to talk about.

Maybe because they assume that it's only used for that purpose? You talk about rocket science and you expect people to jump on the bandwagon...


> incredibly safe […] high-reliability applications like missile guidance systems

https://en.m.wikipedia.org/wiki/Cluster_(spacecraft)


Yup, people are the issue.


I am curious to know the reasons behind the @as() syntax. What's wrong with "i32 y = (i32) x;"?

I look with interest to new languages but after so many years dealing with C, my parser crashes when I see the type after the variable name and at the end of functions declarations.

    var x : u8 = 5;
What's wrong with "u8 x = 5;"?

And I don't really like type inference very much (or when it's abused or cannot be avoided). What is "b"? Oh, I have to check what it's being casted from...

    var a: u8 = 255;
    var b = 1 + 2 + 3 - (4 + @as(i32, a));
What type is "b"? i32?

Curious fact: the above code generates >40000 lines of assembly in godbolt: https://www.godbolt.org/z/fxfbb9jfn


> What's wrong with "u8 x = 5;"?

I believe it makes the parser marginally more complicated. Let's say you had a non-builtin type, "foo". The var trigger immediately declares what's going on. For "foo x = 5;" the parser must either 1) have contextual information that foo is a declared type or 2) wait to notice that there are two identifiers side-by-side and then resolve that this means "it must be a variable declaration".

Honestly, C and C++ are very much in the minority for choosing this syntax. Coming form pascal, I remember finding this syntax to be annoying 20 years ago, so it's my bias to believe that this is just internalized pain for C devs.

As for the type coercion with the () operator... well, C is infamous for "spiral types". Typecasting complicated things in C, like, say, an array of pointers, can get scary as hell.


A bit more than marginally more complicated, if you consider the cases of parsing partial or incorrect programs.

You want to be able to (somewhat) parse those so that you can syntax-colour, auto-complete, and mark errors in a code editor.

I think those are the major reasons modern languages don’t do such things the C way.


> I believe it makes the parser marginally more complicated

This would mean that the "peace of mind" of the Zig compiler developer has priority over my own "peace of mind". That's not ok.

Is the compile developer reasoning "I don't care if the syntax is verbose, messy (or whatever), it just makes my job easier"?

> Honestly, C and C++ are very much in the minority for choosing this syntax

Java, C#, Objective-C....


> Curious fact: the above code generates >40000 lines of assembly in godbolt: https://www.godbolt.org/z/fxfbb9jfn

I took a quick glance at the ASM, and it seems that the majority of that is code from the stdlib run prior to main. Compiling with -OReleaseSafe brings that down to 15,000 lines of assembly.

If you get rid of "main" and compile it as an exported function, you get far less code: https://www.godbolt.org/z/jjGP4xsfx

If you add -OReleaseSafe to the "export fn" version I shared, it's around 4 lines of code calling panic. Adding -OReleaseFast, and it's just a ret statement.


> Compiling with -OReleaseSafe brings that down to 15,000 lines of assembly.

I tried it with -OReleaseFast to compare, and it's 500-ish lines. That's amazing.


remove "pub" from your godbolt. Zig/godbolt is identifying that you're trying to build a full program and so it brings in a lot of the boilerplate necessary to launch a program (for example, the panic handler, stdlib stuff to format strings for the panic handler, etc).


On the other hand, leave out “pub” and Zig won’t compile it…


juxtaposition (i.e. syntax that has meaning when two identifiers stand together separated only by whitespace) is generally a problem for parsing. Basically, it introduces all kinds of parsing ambiguities. Compare the following:

    a *b
    a * b
    a[n] x
    b [n]
    a<x> y
    a < x > y
    e(1) f
    e (1)f
Even for C/C++ this is complicated (you need the symbol table when parsing, to know if an identifier is a type or not), but for more advanced languages, it's even worse (e.g. if you support pattern matching / destructuring assignment).

Declaring variables with var let's you also write let or const instead, giving the programmer more options (and making those options obvious as well).

You might not "like" type inference, but it really is superior to no-type-inference - you can always also write the expected type, if you want.


> you can always also write the expected type, if you want.

But you have to read the code of people who didn‘t.


Exactly. Like when you see:

    var x = some_function();
What am I dealing with here? mmmmmm....

Ok... let's find some_function()... I hope my editor has "go to definition", otherwise "grep -r some_function ."...


   var x : u8 = 5;
allows making type optional while keeping the code about the same, e.g.

   var x = 5;
This arrangement has this nice consistency to it, with the type info being more of a hint (to the compiler or to the programmer) than a required component.


> I am curious to know the reasons behind the @as() syntax. What's wrong with "i32 y = (i32) x;"?

Built-ins in Zig are functions beginning with `@`. Your cast operator would be a new syntax construct, something Zig tries to limit as much as possible (keeping the parser simple).

> What type is "b"? i32?

That's easy, i32. Integer literals are of type comptime_int, which may coerce into other integer types if possible without loss of information, and since the only typed expression is the @as, every other integer in the line will coerce to its type.


> That's easy.

For code that is not yours, or code that you open a year later, I think it's easier to specify the type. Exactly as when you see something like "var x = some_function();" and you need to know the type.


You can also use `@compileLog(@TypeOf(some_function))` or the same for the result. Type inference mostly makes it easier to change the types involved without having to update every single variable binding which would often be rather error prone in languages with implicit casts.

The other gain here is with duck-typing as being less explicit allows you to write more generic code which will often do what you want as long as the shape fits without `void*` which gives up on type safety.


Best practice for languages with this "var" approach is to use it when the type is obvious or irrelevant and to explicitly declare the type when it is non-obvious and relevant.

If you are reviewing code and can't instantly tell what the type is for a variable, ask the author to explicitly include the type. Problem solved.


Zig uses @something everywhere, maybe they just like Objective-C.


> You won't even get a warning unless -Wextra or -Weverything is turned on.

That's because -Wconversion isn't part of -Wextra for some reason. It is part of -Weverything though, and with that clang does warn:

  warning: implicit conversion changes signedness: 'int' to 'unsigned long' [-Wsign-conversion]
      if (sizeof(int) < -1) abort();
                     ~ ^~


IME the strict conversion rules in Rust and Zig can be quite a bummer for somebody coming from C because they may add a surprising amount of friction in day-to-day coding. Yes, C code is often way too sloppy when it comes to picking the right type (signed vs unsigned vs float), and it conveniently hides the problems if the wrong choice was made.

But sometimes the same value needs to be used in integer and floating-point math, and there is no single correct type for this value.

There are also some tricky choices in bit twiddling operations. Is it really such a problem when bits are shifted out of a value when this is normal behaviour down on the assembly level?

I guess I'll eventually learn to live with strict explicit conversion, and eventually I'll get better at picking the right type from the start, but I think implicit versus explicit conversion is an area where there is no "completely right" solution, because with 100% explicit conversions even coding down on the assembly level is more convenient ;)


Could you give some examples where the same value needs to be used in integer and floating-point math?

I'm not quite sure if I understood you correctly, but in case you propose that integer <-> float conversion should happen implicitly I have to disagree. While implicitly converting from integer to float/double would probably be fine, implicitly converting from float/double to integer sounds like a recipe for headaches: There are just too many options (truncation/ceil/floor/rounding). Even if you decided that some option should be the standard since it makes sense in 90% of all cases (let's say rounding), you now have a difficult-to-find (since it's implicit) footgun ready to cause damage in the remaining 10% of all cases.

Even in that paragraph lies a small surprise waiting to be found (at least for some people): The floating point standard IEEE 754 defines five different rounding modes - two "normal" modes (Round to nearest, ties to even as well as Round to nearest, ties away from zero) and the additional directed ones mentioned above (truncation, ceil, floor). Interestingly, the default rounding mode (Round to nearest, ties to even) is not the one you probably learnt in school (that would be Round to nearest, ties away from zero). In school, you always round up if you end up exactly between two numbers, i.e. round(0.5) = 1, round(1.5) = 2. However, this introduces a small bias that can manifest itself into a real problem, for example if you round many measurements and then calculate the mean. That's why the default floating-point rounding mode will essentially alternate between rounding up and down, i.e. round(0.5) = 0 and round(1.5) = 2.

Most of the time this is not an issue and you really want the default rounding mode, but I hope this example illustrates why hiding the "implementation detail" of converting floating-point numbers to integers might not be a good idea.

By the way, I just looked up the man page for round(), and to my surprise found that it will always round ties away from zero, independently of the floating-point environment. If you want to round using different rounding modes in C, you apparently have to use nearbyint() and friends after setting up the rounding mode using fesetround().

PS: Of course the rounding modes are all about rounding floating-point values, not necessarily converting them to integers, but I think the point should be clear.


> Could you give some examples where the same value needs to be used in integer and floating-point math?

Mainly when working with pixels. In some contexts, pixels are clearly integer values (for instance the width and height of a texture in a 3D API is almost always given as integers, a texture with a width of 12.5 pixels simply doesn't make sense).

Computations on 2D pixel coordinates on the other hand need to have subpixel precision, otherwise you'll get jittering artefacts. This results in code where integer values must be converted to floating point before going into computations, and sometimes the results need to be converted back to integer.

I started to add duplicate functions to my C APIs to reduce the need for explicit conversions when using those APIs from stricter languages like Zig or Rust, for instance:

    void sg_apply_viewport(int x, int y, int width, int height, bool origin_top_left);

    void sg_apply_viewportf(float x, float y, float width, float height, bool origin_top_left);
PS: interestingly, even 3D-APIs don't agree here. For instance in OpenGL, the glViewport() function takes integer values, while in D3D11 and Metal a viewport is defined with floating point values.


I agree that float<>int conversions are evil. I would make an exception for int to float conversion for literals. When switching from C to Rust I was annoyed by:

   if x > 0 
not compiling, because that's an integer zero, not a float zero! This makes the compiler feel very petty.


To me that seems equivalent to complaining that the following fails to compile:

    if 0 == "0"
... which is obviously broken code.


The number 0 happily and unambiguously plays a role in many numeric types and algebraic structures, and it’s kind of nice if the compiler can just figure out the type that it should be from context. Comparing a literal 0 to a double should cast to a double, comparing a literal 0 to an int should cast to an int, etc. I would be happy for “(int) 0 == (double) 0” to raise a type error though.

I guess it’s a matter of interpretation: does “x == 0” mean that we are comparing x with the integer zero, or the “zero value” of the same type as x? Numeric code is difficult for many reasons, but something that can make it much more tractable is having the code as close as possible to the mathematics underlying the algorithms, and this kind of “polymorphic constant” behaviour can help a lot, especially for integer literals which unambiguously embed into essentially any numeric type you could think of.


>Could you give some examples where the same value needs to be used in integer and floating-point math?

When you want to draw a pong game in a finite matrix representation, like a screen in ncurses, you cannot have something like screen[2.1][3.5], so you have to truncate (or round, depending what you want to do) the same float coordinates to write that matrix.

Of course, you could avoid these using fixed point but even there you need a type conversion.


D doesn't try to "fix" C integer promotion/casts. It's because they aren't bad defaults. As a bonus you can port code from C with less risk. Soon you'll be able to compile it directly.

I contributed to found the only discrepancy in D vs C code wrt integer (-byte would yield byte instead of int), and it was fixed so that they match exactly C _conventions_.


C integer promotions are what remains of old computers systems which couldn't handle anything but a 'word'.

It result in things highly inconsistent on modern 64-bit machines:

For example, char/short are automatically promoted to int or unsigned int when an operation is performed on them, but this doesn't happen between int and long.

Simple things like adding two int16_t together to get a int16_t in return is really hard: you need a mask and a extra explicit cast, and you rely on the compiler being smart enough to understand what you are doing and to emit just the instruction you want.

But you don't need to do the same with int32_t, because reasons.

This is a bad default behavior, as it doesn't match either the intuition or underlying hardware, it is inconsistent across all integer types, and to understand why it works that way you need to understand how computers works 30 years ago.

D supporting C integer promotion might be a good thing for portability, but that's it.


Not my idea, but: expanding all values used in expressions to "native word size" (e.g. int64_t on modern machines), and only treat "bit width" and "signedness" as load/store attributes doesn't seem like such a bad idea.

The problem seems to be that (AFAIK) C stopped at 32-bit integers.


I agree, both choices are fine as long as they are consistent.

One issue with expanding all values to native word size is that this would manly benefit _very_ old hardware: modern CPUs are as fast directly manipulating any int size as they are manipulating "native word size", or even worse because vector instructions usually manipulate twice many int32 per cycle than they manipulate int64 (and twice many int16 as int32), even if not every code can be vectorized.


The hardware match C, in x86 using 32-bit integers is not slower than using 64-bit integers.


> adding two int16_t together to get a int16_t in return is really hard

Because the first thing people will do is:

    int16_t a, b;
    a = (int16_t) ((b * 157) >> 12);
and then if b * 157 doesn't promote to 32-bit it overflows in the negative and then the code is incorrect.


You're assuming `int` is 32-bits. The fact that C implicitly promotes to `int` makes it extremely hard to use the stdint.h typedefs without accidentally making platform-specific assumptions.

   uint16_t a, b, c;
   a = 50000;
   b = 40000;
   c = a * b;
This simple code may or may not have undefined behavior depending on the target platform. (it's fine for 16-bit int, UB for 32-bit int, and fine again for 64-bit int)

Numeric promotion means that it's almost impossible to write code that is correct on multiple platforms with different sizeof(int).


Well there is the integer promotion issue, and the integers-have-unknown-size issue.


> This is a bad default behavior

I don't think so. A lot of existing C code relies on this, and removing the promotion will prime this code to be copy/pasted without fix. Who will rewrite all the C and C++ codecs code out there?

A lot of C and C++ programmers actually ignore the int promotion rule, but their code relies on it involuntarily.


It should be noted that Zig’s int promotion rules are currently broken though.


I sort of felt similarly when first using Zig until it crashed with a helpful error at runtime when I was trying to cast -1 to an unsigned integer :) The debugging time that saved more than outweighed the amount of time it took to do the explicit cast.


C programmers are well known to refer to such programming safety as straighjacket since the Pascal days anyway.

There was more than enough time to learn why it was the right option to start with.


Rust does go overboard though, e.g. indexing requires `usize`. You either use `usize` for all your integers, or you end up with a cast-salad.

   arr[i as usize]
This is actually risky, because even when you only meant to extend, you can also accidentally truncate or change sign. `as` does all of these things without a warning, and mixed with type inference it can easily lead to surprises.

Rust makes it worse by insisting on a theoretical support for 16-bit and 128-bit `usize` regardless of the target platform, so you can't use `a[i.into()]` to infallibly convert `u32` to `usize` on 32-bit and 64-bit platforms.

The correct syntax is:

   arr[usize::try_from(i).unwrap()]
which is painfully verbose. In larger expressions it's almost an obfuscation of what the code is actually doing.


The Rust approach of having all indexing being unsigned has been extremely annoying for programming algorithms which need to do interesting indexing patterns like walking backwards through an array while indexing into another: the fact that “i >= 0” can no longer be used as a loop condition is quite exasperating. It means other more complicated indexing or looping approaches need to be used, and when showing code to coworkers (mostly mathematicians), they puzzle over this for a while before asking “why not just use i >= 0”? It’s not just this case - doing index arithmetic in general is vastly complicated by the fact that it’s difficult to detect idx < 0.

I think unsignedness on array indices is one of those places where the “make invalid states unrepresentable” mantra has gone too far: yes it’s nice that theoretically the whole 64-bit index space is addressable from a byte array based at 0, but in reality -1 is just as stupid and invalid an array index as 2^64 - 1 for pretty much every use-case. What we gain is negligible; what we lose is a lot of sensible code for dealing with index arithmetic.


negative indexing could even make sense in some situations. It's not at a far stretch to imagine a pointer pointing to the one-past-last element of a series, and use e.g. -1 to get the most recent element. Python's indexing indeed works like that way.


I'm not a Rust programmer, but AFAIK Rust has inline functions, so if your code has lots of indexing calls, could most of the pain go away with a few short helper functions?

    #[inline(always)]
    fn idx(i : u32) { usize::try_from(i).unwrap() }
or

    #[inline(always)]
    fn get(arr: [T], i : u32) { arr.[usize::try_from(i).unwrap()] }
Have to repeat the definition for each numeric type you use, I guess, but hopefully you aren't using _that_ many of them.


As I'm sure you're aware, this gets lost in bike shed land every time it comes up but Rust could implement `Index<iN|uN>` pretty easily. Unlike C you don't need an implicit cast to do the right thing.

Personally, I have datastructure that uses non `usize` indexes I usually wrap my vector/array in in a custom type that implements index on whatever my common index types are.


It's not only "bikeshedding", there are (in my understanding) significant inference issues that happen if we were to enable this, and that would have to be dealt with in a satisfactory way.


Just wait for the updated C++ standard with size_t indexers.


The straight-jacket is the syntax in Ada. I don't feel locked in by the correctness measures, but if you are going to have elaborate declaration blocks I prefer "let foo = bar in ...".

Because everything else is "let" in disguise. ML got the syntax right.


You basically have to accept the relatively small, upfront cost of verbose declarations for these sorts of things in exchange for the lack of headaches + bugs you wind up with later.

Or not, you can also choose not to write these kinds of languages and nobody can blame you -- to each their own.


As a C programmer, I have definitely shot myself in the foot in the conversions between types on multiple occasions. This is a really welcome feature.


>You won't even get a warning unless -Wextra or -Weverything is turned on.

I know people are used to languages where compiler warnings are on by default, but if you are building C without -W/-Wextra you are kind of asking for trouble. -Weverything is probably too pedantic for most people but -Wextra is pretty much required. And -Werror is required for projects where discipline can't be assumed or where the build log is bigger than a screenful.


Zig is shaping up to be a very nice language. It has some very cool ideas. I hope we'll see more of it at a larger scale.


It says Zig eliminates "implicit conversions unless they are guaranteed to be safe (for example, assigning a u8 value to a u16 variable cannot fail or lose data)", but then suggests that @as should be used when "casting an int to a larger-size int of the same sign". Why should @as be used if it's safe to implicitly convert?


What I took from the article is that it's useful when combined with type inference. So in Zig you could do:

  var x : u8 = 5;
  var y = @as(u32, x);
OR

  var x : u8 = 5;
  var y : u32 = x;
The cast is needed in the first case to force the inference y: u32, otherwise it would be y: u8


Hmm, this indeed doesn't make much sense. Assigning to a wider type with or without different signedness doesn't require a cast as long as all bits fit into the new type:

https://www.godbolt.org/z/x8a65sPcr


casting, one of the reason i gave up with Zig quickly, it just is painful and solves nothing, it makes the code hard to decipher


How quickly?

Like, you saw it in the docs when you were mildly interested and decided it wasn't for you? Or you started a project, had at least a "hello world" compiling, but then eventually gave up on it because you were tired of casting? In that case I'm curious what the project was since I haven't run into casting all that often, but it might vary by domain.


i just looked over some code I have which takes a complicated 3rd party binary format, parses, and extracts information from it. In about 2K LOC including tests there are no casting events except for some int-to-pointer and pointer-to-int at the boundary of passing reference to WASM.


It was a while ago, a game engine, it just feels bad, maybe i misused the language, but that is a sign things aren't intuitive

Some people love typing code like they'd write books, i don't, i like to be concise and to go straight to the point

And it's not just as(i32) intToFloat(f32) floatToInt(i32), and then between i8 i32 i64, and then bitwise operations, and then slices, and then C strings etc etc etc, lot of visual bloat


I like it because it provides explicitness and readability. There is no guessing or following a flow chart of implicit type conversions to see what's going on.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: