
The Sad State of C Strings - hyc_symas
https://symas.com/the-sad-state-of-c-strings/
======
huhtenberg
[http://webcache.googleusercontent.com/search?q=cache:PptRk2k...](http://webcache.googleusercontent.com/search?q=cache:PptRk2k3FKIJ:https://symas.com/the-
sad-state-of-c-strings/&num=1&hl=en&gl=ch&strip=1&vwsrc=0)

This topic is a triviality of C programming. You run into performance (or
usability) issues with standard strings - you roll out your own. It takes...
what?... 20 minutes? OK, maybe an hour. Would it help to have an alternative
standardized? Perhaps. Is it _critical_ to have it? Hell, no.

~~~
draw_down
In what other context anywhere in programming is it considered a good idea for
everyone to hand-roll their own, separate, subtly-incompatible, surely-bug-
ridden implementation of functionality that everyone needs? Gahhh.

~~~
hyc_symas
Exactly. We can't kill strcpy/strcat, for backward compatibility reasons. But
the standard ought to have a spec that is actually useful and not ridiculously
sub-optimal. strlcpy() doesn't fit the bill either.

~~~
nkurz
I'm late to this discussion, but did you possibly miss stpcpy() and its
variants? Or is something about them that doesn't meet your needs?

    
    
      STPCPY(3)      Linux Programmer's Manual                                      
    
      NAME
           stpcpy - copy a string returning a pointer to its end
    
      SYNOPSIS
           #include <string.h>
    
           char *stpcpy(char *dest, const char *src);
    
       Feature Test Macro Requirements for glibc (see feature_test_macros(7)):
    
           stpcpy():
               Since glibc 2.10:
                   _XOPEN_SOURCE >= 700 || _POSIX_C_SOURCE >= 200809L
               Before glibc 2.10:
                   _GNU_SOURCE
    
      DESCRIPTION
           The  stpcpy()  function  copies the string pointed to
           by src (including the terminating null byte ('\0')) to the
           array pointed to by dest.  The strings may not overlap,
           and the destination string dest must be large enough  to
           receive the copy.
    
      RETURN VALUE
           stpcpy()  returns  a  pointer  to the end of the string
           dest (that is, the address of the terminating null byte)
           rather than the beginning.
    
      CONFORMING TO
           This function was added to POSIX.1-2008.  Before that, it
           was not part of the C or POSIX.1 standards,  nor  customary
           on UNIX systems, but was not a GNU invention either. Perhaps 
           it came from MS-DOS.  It is also present on the BSDs.
    
      BUGS
           This function may overrun the buffer dest.
    
      EXAMPLE
           For example, this program uses stpcpy() to concatenate foo 
           and bar to produce foobar, which it then prints.
    
               #define _GNU_SOURCE
               #include <string.h>
               #include <stdio.h>
    
               int
               main(void)
               {
                   char buffer[20];
                   char *to = buffer;
    
                   to = stpcpy(to, "foo");
                   to = stpcpy(to, "bar");
                   printf("%s\n", buffer);
               }
    
      SEE ALSO
           memccpy(3), stpncpy(3), wcpcpy(3)

~~~
hyc_symas
Added to POSIX is good, but not sufficient. E.g., there is no reason for
Windows to implement a function merely because it's in the POSIX spec. It
needs to be in the C spec to actually be viable.

stpncpy is equivalent to strncpy - it NUL-pads dest if the src is shorter than
N. Frankly I have never found a use for this behavior. The desired behavior is
to simply copy src without additional padding. That is what my proposed
strecopy() does, and that's what would be required of a replacement for
strlcpy().

------
anon4
Eh...

    
    
      // This code is licensed under CC0.
      // A copy of the license can be obtained at https://creativecommons.org/publicdomain/zero/1.0/
      // May you forever catenate in peace.
      #define strcatm(...) strcat_multi(__VA_ARGS__, NULL);
      char* strcat_multi(char *dest, ...) {
        va_list srcs;
        char *dest_end;
        const char *src;
        size_t src_sz;
        va_start(srcs, dest);
        for (dest_end = dest; *dest_end; dest_end++);
        for (src = va_arg(srcs, const char *); src; src = va_arg(srcs, const char *)) {
          src_sz = strlen(src);
          memmove(dest_end, src, src_sz);
          dest_end += src_sz;
          *dest_end = '\0';
        }
        va_end(srcs);
        return dest_end;
      }
    

There. Now go and catenate, children.

~~~
efaref

      // This code is licensed under CC0.
      // A copy of the license can be obtained at https://creativecommons.org/publicdomain/zero/1.0/
      // May you forever catenate in peace.
      #define strncatm(dest, sz, ...) strncat_multi((dest), (sz), __VA_ARGS__, NULL);
      char* strncat_multi(char *dest, size_t dest_sz, ...) {
        va_list srcs;
        char *dest_end;
        const char *src;
        size_t src_sz;
        size_t dest_off;
        size_t copy_sz;
        va_start(srcs, dest_sz);
        for (dest_end = dest, dest_off = 0; *dest_end && dest_off < dest_sz; dest_end++, dest_sz++);
        for (src = va_arg(srcs, const char *); src && dest_off < dest_sz; src = va_arg(srcs, const char *)) {
          src_sz = strlen(src);
          copy_sz = src_sz < (dest_sz - dest_off) ? src_sz : (dest_sz - dest_off);
          memmove(dest_end, src, copy_sz);
          dest_end += copy_sz;
          dest_off += copy_sz;
          if (dest_off < dest_sz)
          {
            *dest_end = '\0';
          }
        }
        va_end(srcs);
        return dest_end;
      }

~~~
anon4
Good, but I'd do one modification -- always write a null byte at the end.
Otherwise you end up with a non-null-terminated string and that's bad.

So,

    
    
      copy_sz = src_sz < (dest_sz - dest_off) ? src_sz : (dest_sz - dest_off - 1);
      ...
      *dest_end = '\0';

------
AdmiralAsshat
Reading this made me remember Zed Shaw's rant on C-strings and his
recommendation to use a "Safer" string library:

[http://c.learncodethehardway.org/book/ex36.html](http://c.learncodethehardway.org/book/ex36.html)

I'm not an experienced enough programmer to be able to evaluate how much
"safer" the bstrlib library actually is. I'm sure the various HN Engineers can
chime in with their insights.

------
reacweb
I am pretty convinced by the argument of this article. but, this looks to good
to be true. Are there some drawbacks I can not see ? Why nobody has thought
about this simple api ?

~~~
ktRolster
Of all the problems with C strings, the one mentioned in the article
(inefficient strcat() function) is really small.

I think unicode support would be much more important, but not everyone agrees
(for example, on embedded devices unicode support can be dead-weight). And
adding to C without consensus leads to lousy decisions.

~~~
reacweb
I have already noticed some performance issue with strncpy (and big
destination buffers) vs strlcpy. The two functions introduced in the article
have the good complexity and seem to be more handy than usual api.

~~~
ktRolster
If you need to copy big strings, memcpy is worth checking out, too.

------
asgfoi
That's nothing, you should see the sad sad state of strings in assembly... Oh
you're supposed to manage them yourself? Huh, what a concept.

Seriously though, a C programmer should know the downsides of C strings, and
act accordingly, i.e. use a library if necessary.

~~~
hyc_symas
"should" \- yes. "does" \- apparently not. "use a library if necessary" \-
that's really the point of this - the standard library's offerings suck. What
is the point of a standard that is actually unusable?

strlcpy() also sucks. The solutions I proposed solve both the overflow
protection aspect that strlcpy aims to solve _and_ solves the inefficiency
problems of strcpy/strcat/memcpy.

------
endymi0n
Somehow, I was expecting a piece about musical instrument tuning.

~~~
hyc_symas
Heh, maybe next time. I could tell ya about the viola C string I tried on my
baritone fiddle - it really wasn't very usable.

------
sirmiller
The premise is wrong: "constructing a larger string out of multiple smaller
strings is a pretty common programming task"

No, it's not.

"strcat(strcat(strcat(strcpy(buf, "This "),"is "),"a long "),"string.");"

Just checked 20 years of C/C++ repositories ... not a single strcat. Actually
I can't remember ever using c-strings for more than args parsing or debug
logging.

~~~
sethrin
I just did a quick GitHub search, strcat seems to have been used about 5
million times across all repositories, as compared to (e.g.) memcpy at about
80 million. Clearly somebody is using it. I would hate to try to get a measure
of how often it's used in other languages; it's not the sort of thing anyone
would think twice about. I believe you have made an inductive error based on
your own sample and that the original premise is entirely correct.

------
sedatk
The missing length field was a selling point back then: you could have strings
of unlimited length unlike Pascal's 255-byte strings. People liked overcoming
that limit with the same storage overhead of an extra byte.

------
pkolaczk

      "strcat(strcat(strcat(strcpy(buf, "This "),"is "),"a long "),"string.");
      len = strlen(buf);
    

> The above example executes in exponential time with the length of the
> strings.

This statement is obviously not true. The example is linear with the length of
the strings (assuming the number of strings is constant) and O(n^2) with the
number of strings (assuming each string has non-zero length).

------
draw_down
What a nightmare, I know C but wouldn't want to touch it for anything that
does any type of anything with strings.

And of course you have the crowd that's been writing C for 20 years saying
"it's not that bad". Folks, the god damn plane has crashed into the mountain.

~~~
Grishnakh
Exactly. Any time I start a project, if it's going to working with strings, I
automatically pick something else, anything else, besides C, no matter how
great C would have been for the task for performance, because its string-
handling is just that bad. C++ isn't great, but it's a lot better.

Personally, my favorite for good performance is C++ with Qt. The QString class
lets you do a lot of amazing stuff very easily.

