Hacker News new | past | comments | ask | show | jobs | submit login

Not to bring up one of my favorite languages or anything, but I do think D got this completely right.

https://tour.dlang.org/tour/en/basics/alias-strings

> This means that a plain string is defined as an array of 8-bit Unicode code units. All array operations can be used on strings, but they will work on a code unit level, and not a character level. At the same time, standard library algorithms will interpret strings as sequences of code points, and there is also an option to treat them as sequence of graphemes by explicit usage of std.uni.byGrapheme.

And perhaps my favorite part (https://tour.dlang.org/tour/en/gems/unicode):

> According to the spec, it is an error to store non-Unicode data in the D string types; expect your program to fail in different ways if your string is encoded improperly.

I should note that what I really like about this approach is the total lack of ambiguity. There is no question about what belongs in a string, and if it's not UTF then you had better be using a byte or ubyte array or you are doing it wrong by definition.




So in D it's impossible to work with files if their filename is not Unicode?


Rather it would be an error to grab a Unix filename, figure your job was done, and store it directly into a string. So you'd... handle things correctly. Somehow. I admit I've never had the bad luck of encountering a non-UTF8 encoded filename under Linux before and can't claim with any confidence that my code would handle it gracefully. In any language, assuming you're using the standard library facilities it provides things will hopefully mostly be taken care of behind the scenes anyway.

What I like about the D approach isn't that declaring it an error actually solves anything directly (obviously it doesn't) but that it removes any ambiguity about how things are expected to work. If the encoding of strings isn't well defined, then if you're writing a library which encodings are the users going to expect you to accept? Or worse, which encodings does that minimally documented library function you're about to call accept?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: