Ah I see, so I think that is a statement that goes beyond just "we don't need string support in the language" and towards "we don't need some class of string support in the standard library." I also think there's definitely some misunderstanding here. Certainly, at minimum, on my side, but perhaps on yours as well. I'll try to clarify a bit by being more concrete.
> assuming we have a package manager that works well, is it worth it for the standard library to take on
I am quite sympathetic to this line of thinking! We tend to think similarly in Rust-land. The library ecosystem coupled with a good package manager lets us get away with having a small standard library.
> large, volatile dependency of the entire Unicode database
OK, so here is maybe where there is some misunderstanding. The Unicode database is not an all-or-nothing proposition. You can (and most people do) pick and choose the parts you want. Both of the Rust and Go standard libraries, for example, only embed a very very small subset of the UCD (Unicode Character Database) into the standard library. Off the top of my head, probably the most obvious example is Unicode-aware lower/upper casing. But there are some other things, like, "split on whitespace, where whitespace is defined by Unicode." With that said, this only addresses the size aspect of your concern. No matter which subset you take, it's still, at least in theory, volatile. (With some parts being more volatile than others.)
But yeah, there is a whole other world of Unicode that neither Rust nor Go provide by default. Grapheme/word/sentence/line segmentation. Normalization. More elaborate case folding. Locale specific tailoring. And on and on.
With that said, there are oodles of "string" operations that don't really need to care about Unicode at all: substring searching, splitting on a delimiter, string replacements and trimming all come to mind. This is where my ignorance might be biting me. The Zig standard library might already support most/all of that. And pushing off the more sophisticated aspects of Unicode to a library is definitely totally reasonable! When I was saying I'd be "surprised" above, this is what I was referring to. That is, I'd be surprised if the standard library didn't have elementary encoding-agnostic string functions.
> One topic is that there is an entire domain of software that never needs to deal with decoding strings and manipulating them. That is, it can deal only with strings in their entirety, and only care about the size of encoded bytes. One such piece of software is the Zig compiler itself.
Oh absolutely. As a brief case study, take something like ripgrep. I'm speaking somewhat off the cuff here, but I believe it would be accurate (or mostly accurate) to say that it depends on literally no aspect of any of the Unicode support provided by Rust's standard library. I don't even think it uses any of std's UTF-8 decoding anywhere. This is despite the fact that Unicode is one of ripgrep's headlining features! Almost all of it comes from the regex engine, which provides its own Unicode tables.
I'd say my advice would be to mentally prepare oneself for a drove of low-effort "zig doesn't even have Unicode support in its standard library"-style criticisms. Or similar things like, "Zig needs a dependency for 'basic' string handling, so it's just going to be like the Node ecosystem." Rust gets these kinds of complaints all the time.
Also, if you'd ever want to talk about Unicode and its role in Zig (or in its ecosystem), I'd be happy to chat. There's likely some things worth learning from Rust (perhaps in terms of good and bad things).
Thank you so much for this detailed and helpful reply! We will certainly be taking clues and inspiration from Rust's excellent standard library as it becomes time to finalize Zig's std lib API. And I will definitely take you up on that chat about Unicode someday :)
> assuming we have a package manager that works well, is it worth it for the standard library to take on
I am quite sympathetic to this line of thinking! We tend to think similarly in Rust-land. The library ecosystem coupled with a good package manager lets us get away with having a small standard library.
> large, volatile dependency of the entire Unicode database
OK, so here is maybe where there is some misunderstanding. The Unicode database is not an all-or-nothing proposition. You can (and most people do) pick and choose the parts you want. Both of the Rust and Go standard libraries, for example, only embed a very very small subset of the UCD (Unicode Character Database) into the standard library. Off the top of my head, probably the most obvious example is Unicode-aware lower/upper casing. But there are some other things, like, "split on whitespace, where whitespace is defined by Unicode." With that said, this only addresses the size aspect of your concern. No matter which subset you take, it's still, at least in theory, volatile. (With some parts being more volatile than others.)
But yeah, there is a whole other world of Unicode that neither Rust nor Go provide by default. Grapheme/word/sentence/line segmentation. Normalization. More elaborate case folding. Locale specific tailoring. And on and on.
With that said, there are oodles of "string" operations that don't really need to care about Unicode at all: substring searching, splitting on a delimiter, string replacements and trimming all come to mind. This is where my ignorance might be biting me. The Zig standard library might already support most/all of that. And pushing off the more sophisticated aspects of Unicode to a library is definitely totally reasonable! When I was saying I'd be "surprised" above, this is what I was referring to. That is, I'd be surprised if the standard library didn't have elementary encoding-agnostic string functions.
> One topic is that there is an entire domain of software that never needs to deal with decoding strings and manipulating them. That is, it can deal only with strings in their entirety, and only care about the size of encoded bytes. One such piece of software is the Zig compiler itself.
Oh absolutely. As a brief case study, take something like ripgrep. I'm speaking somewhat off the cuff here, but I believe it would be accurate (or mostly accurate) to say that it depends on literally no aspect of any of the Unicode support provided by Rust's standard library. I don't even think it uses any of std's UTF-8 decoding anywhere. This is despite the fact that Unicode is one of ripgrep's headlining features! Almost all of it comes from the regex engine, which provides its own Unicode tables.
I'd say my advice would be to mentally prepare oneself for a drove of low-effort "zig doesn't even have Unicode support in its standard library"-style criticisms. Or similar things like, "Zig needs a dependency for 'basic' string handling, so it's just going to be like the Node ecosystem." Rust gets these kinds of complaints all the time.
Also, if you'd ever want to talk about Unicode and its role in Zig (or in its ecosystem), I'd be happy to chat. There's likely some things worth learning from Rust (perhaps in terms of good and bad things).