
Working with Strings in Rust - agluszak
https://fasterthanli.me/blog/2020/working-with-strings-in-rust/
======
compi
The first half was a surprisingly ok intro to unicode and utf encodings in C.
Super clearly illustrated a handful of pitfalls of trying to roll your own
utf8 handling code in C.

But it irks me when rust blogs misrepresent C to show that Rust is better:

    
    
        How nice. We even use const in all
        the right places! I think! Except
        maybe argv! Who knows? The compiler
        sure doesn't seem to care much. I
        guess casting non-const to const is
        pretty harmless. Fair enough, GCC,
        fair enough.
    
        Lucky toupper has no way to return
        an error and just returns 0 for 0,
        right? Or maybe 0 is what it returns
        on error? Who knows! It's a C API!
        Anything is possible.
    

Both of these have defined answers written into the C standard itself.

Also you can have const memory in C that will crash if you cast it to non-
const and write to it. You can call out to the OS to mark a chunk of memory it
as read only. The reason it's not as built into C as it is for Rust is that C
was in part designed for and currently runs on hardware where there isn't
read-only-memory outside of a physical ROM chip.

    
    
        So the problem here is that… we
        allocate enough room for “DOG\0”,
        but we end up converting “doggo
        override\0” to uppercase, so we end
        up writing past the area that malloc
        allocated for us.  I believe the
        technical term is a fucky wucky
        buffer overflow.
    

Something about this sentence screams Rust programmer but I can't put my
finger on why.

~~~
_bxg1
> Both of these have defined answers written into the C standard itself.

I think the point wasn't that they're unknowable, but that the tooling doesn't
give you any help in knowing - much less enforcing - the answers. Really basic
things are left up to convention, which especially when it comes to third-
party code, forces you to be distrustful of what you're working with and maybe
even have your code do extra work (eager cloning for example), just in case an
interface behaves in a way you hope it doesn't.

~~~
compi
I did miss that until I read your post, looking up behavior that your program
relies in a language standard seems easy to me to because I _have_ to as a C
programmer, a lot, for the reasons you and the post mentioned. I totally agree
with you, and nearly all of the C footguns in the article.

That being said I'm still doubling down on my post:

"who knows" if argv is const? Me, because I program for a standard of C that
guarantees that that it's not const and that it's pointers and the strings it
pointers to are both modifiable.

And "who knows" if toupper returns an error because "It's a C API! Anything is
possible!"

Also me, because I program for a standard of C where toupper returns what you
gave it, or it gives you a single character back that _might_be_different_
based on the host computer's language.

Not defending toupper and towupper, but it is explicitly unreliable in the C
standard I program to.

If I naively wrote a case-insensitive file manager in Rust, wchar_t C, or C
with ICU, could operations on the following files be reliable on a German
computer and on a English computer? If they are reliable, then is it correct
to the point where either person wouldn't overwrite a file on accident? If the
Rust one overwrites files, is it a smaller fuckywuky since Rust programers
value "safety, correctness, and performance"?

    
    
        ẞ.txt
        β.txt
        阝.txt
        ß.txt
        ẞ.txt
        SS.txt
        ss.txt

