Agreed. I alleviated this in my last zig project by just declaring a type alias, however. Strings could use some nice support from the stdlib once it gets fleshed out.
The string part of this is just "[]const u8", which is somewhat similar to "const char*" in C (except that [] is a slice and also holds the string literal's length).
IMHO for a systems programming language it's the right decision to not have a builtin string type (and instead treat strings as bags of bytes), proper string support is better provided by a library (or even better several specialized string processing libraries).
I would have agreed with this before I saw Rust's built-in str and thus &str.
Obviously Zig isn't going to offer what Rust does here because it involves safety guarantees, but the assumption could be exactly the same, this is a UTF-8 string, and even in systems programming that's a valuable built-in.
Twenty years ago it's unclear what you should pick here, and so "bytes" ends up being a reasonable compromise. But today it isn't unclear, the answer will be UTF-8. Languages like C++ are stuck offering people a 16-bit character type if they want one because they were around in the 1990s when that seemed potentially viable and they're stuck in a compatibility quagmire, not because it is any actual use today.
Hmm.... I may miss some details, but Rust's str isn't that much different from a type wrapper
const str = []const u8;
...in Zig isn't it? Both are slices (aka pointer/size pairs). Rust's str seems to have some additional string-related functions, but in Zig those could be provided in the stdlib (along with the 'str' type wrapper), and then on the next level a String-like type which includes memory management.
PS: Ah ok, one important difference is that Rust's str is guaranteed to be valid UTF-8.
Rust's str also checks things like whether you're trying to read in the middle of a character. Rust also has several related string types like Path, CStr and OsStr to make sure the strings passed to the relevant APIs are well-formed. You can also use &[u8] if you want a slice of bytes or ascii string--it has several string methods implemented in the standard library as well.
I like Rust's approach of having purpose-specific string types that may differ (e.g. if your OS is using UTF-16 encoded strings but the C strings are just arrays of bytes). It guarantees that the programmer checks/converts the string is in/to the correct form, or just let the program panic with .unwrap(), when interfacing with the external world. It makes the program safe and makes the potentially expensive string conversion calls explicit. Also, I think valid UTF-8 strings is a good default.
I haven't used Zig yet (I haven't found a good use case for it), it seems closer to C both in spirit and in implementation while removing some foot guns.
Specialized string types like Path, CStr and OsStr are stdlib types though (as they should be), not built into the language (AFAIK at least). Such high level "API enforcement" types also make sense in Zig. The question is rather whether something like Rust's 'str' should be builtin or also provided in the stdlib.
Oh, that makes sense as an interesting question. I think, similar to API enforcement types, it is a good idea to separate strings from "byte strings" if the "core" language/library uses strings (which Rust does as part of panic!, one can argue that panic! or assert! should not be part of the core though). It is a hard problem, as Python's transition to Python 3 showed.
It is built in. It was almost switched to be exactly that before 1.0, but IIRC it provided no real upsides and several downsides. There was a PR with the discussion but I can’t find it right now.
I'll take your word on it that writing const str = []const u8; would be equivalent, however, I think that's already valuable to build in considering how much people use string literals.
You're correct that Rust has a bunch of actual type rules here to say that a str really is UTF-8 and not some random bytes -- and Zig presumably wouldn't do that, but by explicitly building in a named str type, rather than just shrugging and saying []const u8, there'd at least be a social convention that str is actually, you know, a UTF-8 string, not just some bytes.
And yes, obviously the memory managing high level String with features like concatenation and mutability is way more heavyweight and shouldn't be built-in to the language fundamentals of something like Zig, it's just the cheap "size + pointer" immutable string slice that I believe is worth building in explicitly now that I've seen Rust do it and after decades of experience noticing that software almost invariably has string literals representing human text (thus suitable for UTF-8) that we shouldn't muddle with slices of arbitrary bytes.
Agreed, the guarantee that str only contains valid UTF-8 is nice and justifies a separate type, also the API hint ("this is expected to be an UTF-8 string, not a random bag of bytes") is a good thing. I don't know what the Zig team's stance is on builtin types vs stdlib wrapper types (assuming this can be expressed in a stdlib type, but I think it can), but either way it probably makes sense to introduce such a wrapper type relatively early (even if it's incomplete) to guide APIs towards using this.
Btw one thing that Zig has over Rust is that string literals are zero-terminated for C-compatibility (so the proper type is actually "[:0]const u8" - which is a "sentinel-terminated array"). This is in addition to strings being slices (pointers/size pairs), so despite being zero-terminated it doesn't inherit C's problems.
Presumably the string literals are zero-terminated but other slices aren't? In a language that doesn't have safety promises that feels like a bug waiting to happen, somebody's code "works" with all the string literal test inputs but alas, real slices sometimes aren't followed by a zero terminator and it blows up nastily.
And then also the same misfire can hurt in the other direction. Since Rust's str is really a UTF-8 slice, the zeroes are just zeroes, valid UTF-8 characters, you can ask if a Rust str contains(0 as char) because that's a completely reasonable question - but if your code thinks zero might be a "terminator" it can't actually handle arbitrary UTF-8 this way.
Actually I think this makes the call for an actual str built-in even more appropriate because Zig's owners would need to wrestle with this and work out what they intended whereas just saying []const u8 allows you to hide inconsistencies in the libraries and people will get bitten.
Sentinel-terminated arrays and slices have a different type than regular arrays and slices, and the sentinel isn't the terminator for the array or slice, just an additional item past the end. I think it's reasonably type-safe:
What puts me off are the JavaScript pre-modules way to do Zig modules, I already have Objective-C for @ everywhere, and it is just like C in regards to use-after-free.
I know it's a small thing but strings are common enough that the little ergonomic costs add up.