Hacker News new | past | comments | ask | show | jobs | submit login

I was fairly hyped up about Zig with various HN pieces I've read and wanted to check it out. After reading the status of strings in Zig in this post, and going on and reading this ludicrous exchange: [1], I wouldn't touch it with a 100-ft pole.

Half of the brokage of C is in the string handling (safety aside), and this is what passes for a 'better C' candidate in 2021?

And don't get me started on their discussion on Unicode support. It's like some bizarro alternative universe, where ASCII or a bag of bytes is good enough...

[1] https://github.com/ziglang/zig/issues/234




What specifically looks bad there? The only decision I see is that string handling doesn't need to be in the language and that it should be a concern of the standard library. That doesn't seem unreasonable to me. Or at least, not as unreasonable as you make it sound. It's true that Rust and Go have language level support for Unicode strings, but the actual language features for it are quite light. It buys a little bit of convenience maybe. But the vast majority of the burden of those string types is carried by the standard library.


People who act like this are always the same ones that try to implement string handling and do it incorrectly. Zig is saving people like you from handling strings incorrectly.


How so, though, as the discussion seems to show a lack of knowledge on Unicode on the part of the language's team - bigger than even a "guy like me" has.

As an example there's reading and outputing the same byte sequence unmodified and concluding: "See? unicode totally works with a byte array, why would we ever need anything else?"

(meanwhile the byte array could just as well have any crap that's not a valid utf-8 string for example, and 100s of ways to mess it up further, with the existing byte manipulation functions).

As for delegating it to the std lib, that at least would be something (though not as convenient as a first class type for something as important as strings), well, where is it, for a language nearing 1.0?

I see Zig as inviting everyone to make their own string handling and mess it up in unique and wonderful ways, or have several string libs with subtly incompatible handling used in different third party projects...

Now, as always I came to regret my whining when the other side replies in person (damn you, HN), and I would like to say that I am a huge fun of most design decisions in Zig nonetheless and I was very excited to evaluate it and put it to use.

But the string situation seems like a major oversight or under-estimation, when I saw the string manipulations (or lack thereof) in the example code, and this thread.

BurntSushi seems to think it can be fixed in the std lib alone (and if someone knows about strings, he would be it) -- but I think that even for that, a more serious consideration towards strings should be taken compared to the one shown in the thread.


> as always I came to regret my whining when the other side replies in person

The trick is to avoid approaching communication that way in the fist place.

I see here the same pattern that creates haters: you start by idealizing X a too much and end up excessively frustrated when you come in disagreement with one specific decision, making you explode in exaggerated outbursts that are obviously not be well accepted by the other party.

In all honesty we don't need this kind of behavior, and on your side you need to let go and be more dispassionate on the subject before you can have an honest discussion about it.

Idealize things less, hate them less, stop making things part of your identity.

Reach out once you've figured that out, I'm sure people in the Zig community will be happy to explain to you the good and bad things about Zig's approach to strings.


I kind of think you're maybe reading a bit more out of that thread than maybe what's there. From what I can tell, the only thing that came out of that thread is that they aren't convinced that language support for string types is necessary.

I've seen the Zig folks acknowledge that the standard library needs work/polish, and that they're focusing on the compiler right now. If the standard library doesn't have good string support (I don't know, I haven't looked that closely), then I would basically assume that they'll get that figured out before 1.0 unless I see a really explicit declaration otherwise. Because I think that would be surprising to me if they didn't.

But even with Rust, there are multiple string libraries. I even made one. :P https://docs.rs/bstr is really just a bunch of API functions defined on a bundle of bytes. There's no language support to help make it work, but it doesn't really need it. The only language support Rust really gives to strings is maybe carving out its own type and string literals themselves. Rust demands that its string type be UTF-8, where as Go doesn't, for example. Either choice is reasonable. But that's pretty much where the language level support ends. Everything else is just library goop.

But yes, if Zig's standard library doesn't have the kind of bare bones string handling that already exists in places like Rust's and Go's standard library, then I would hope/expect they will get to that at some point. If they didn't do that at least, then yeah, okay, I'll either be surprised or I'm misunderstanding something.


One topic is that there is an entire domain of software that never needs to deal with decoding strings and manipulating them. That is, it can deal only with strings in their entirety, and only care about the size of encoded bytes. One such piece of software is the Zig compiler itself. So the question is, assuming we have a package manager that works well, is it worth it for the standard library to take on a large, volatile dependency of the entire Unicode database, for the sole purpose of ensuring the ecosystem has a canonical string handling library? I don't think the answer is obviously "yes".


Ah I see, so I think that is a statement that goes beyond just "we don't need string support in the language" and towards "we don't need some class of string support in the standard library." I also think there's definitely some misunderstanding here. Certainly, at minimum, on my side, but perhaps on yours as well. I'll try to clarify a bit by being more concrete.

> assuming we have a package manager that works well, is it worth it for the standard library to take on

I am quite sympathetic to this line of thinking! We tend to think similarly in Rust-land. The library ecosystem coupled with a good package manager lets us get away with having a small standard library.

> large, volatile dependency of the entire Unicode database

OK, so here is maybe where there is some misunderstanding. The Unicode database is not an all-or-nothing proposition. You can (and most people do) pick and choose the parts you want. Both of the Rust and Go standard libraries, for example, only embed a very very small subset of the UCD (Unicode Character Database) into the standard library. Off the top of my head, probably the most obvious example is Unicode-aware lower/upper casing. But there are some other things, like, "split on whitespace, where whitespace is defined by Unicode." With that said, this only addresses the size aspect of your concern. No matter which subset you take, it's still, at least in theory, volatile. (With some parts being more volatile than others.)

But yeah, there is a whole other world of Unicode that neither Rust nor Go provide by default. Grapheme/word/sentence/line segmentation. Normalization. More elaborate case folding. Locale specific tailoring. And on and on.

With that said, there are oodles of "string" operations that don't really need to care about Unicode at all: substring searching, splitting on a delimiter, string replacements and trimming all come to mind. This is where my ignorance might be biting me. The Zig standard library might already support most/all of that. And pushing off the more sophisticated aspects of Unicode to a library is definitely totally reasonable! When I was saying I'd be "surprised" above, this is what I was referring to. That is, I'd be surprised if the standard library didn't have elementary encoding-agnostic string functions.

> One topic is that there is an entire domain of software that never needs to deal with decoding strings and manipulating them. That is, it can deal only with strings in their entirety, and only care about the size of encoded bytes. One such piece of software is the Zig compiler itself.

Oh absolutely. As a brief case study, take something like ripgrep. I'm speaking somewhat off the cuff here, but I believe it would be accurate (or mostly accurate) to say that it depends on literally no aspect of any of the Unicode support provided by Rust's standard library. I don't even think it uses any of std's UTF-8 decoding anywhere. This is despite the fact that Unicode is one of ripgrep's headlining features! Almost all of it comes from the regex engine, which provides its own Unicode tables.

I'd say my advice would be to mentally prepare oneself for a drove of low-effort "zig doesn't even have Unicode support in its standard library"-style criticisms. Or similar things like, "Zig needs a dependency for 'basic' string handling, so it's just going to be like the Node ecosystem." Rust gets these kinds of complaints all the time.

Also, if you'd ever want to talk about Unicode and its role in Zig (or in its ecosystem), I'd be happy to chat. There's likely some things worth learning from Rust (perhaps in terms of good and bad things).


Thank you so much for this detailed and helpful reply! We will certainly be taking clues and inspiration from Rust's excellent standard library as it becomes time to finalize Zig's std lib API. And I will definitely take you up on that chat about Unicode someday :)


> where ASCII or a bag of bytes is good enough...

You are definitely misunderstanding both Zig's stance on this, and how unicode works in a more general sense.


Well, everything is a "big bug of bytes" in computing in the "more general sense", so there's that.

I don't think handling unicode (and even more importantly, specific encodings) in this level of abstraction, and delegating things to second-tier lib support is the way to go. But I don't even see this second-tier support prioritized - for something one would think is half of programming: dealing with strings.


Then you never heard of ropes. Strings are not just vectors, any efficient handling of strings must also deal with small vector optimizations, prefix trees and ropes, besides the neglected unicode nightmares.

You don't want to have that in core, only in the stdlib. Core is fine with vectors and hashes, the rest is built on top of that, plus lists and trees. Do you insist on lists in core? Nobody does that.

Not even the libc has proper strings, you need gnulib or libunistring for that. Neither the libc++ has proper strings.


>Not even the libc has proper strings, you need gnulib or libunistring for that. Neither the libc++ has proper strings.

Yes, but those are the kind of mistakes we want to avoid with newer languages, no?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: