
The Most Expensive One-byte Mistake (2011) - eric59
http://queue.acm.org/detail.cfm?id=2010365
======
DanBC
See also discussion here: [https://hn.algolia.com/?q=The+Most+Expensive+One-
byte+Mistak...](https://hn.algolia.com/?q=The+Most+Expensive+One-
byte+Mistake#!/story/forever/0/The%20Most%20Expensive%20One-byte%20Mistake)

------
lysium
I find this article very interesting, but I'd still like to point out it is
from 2011.

------
kazinator
Betteridge's Law strikes again.

No, null-terminated strings are fine. Rants against null-terminated string are
a good way to spot nutjobs.

Null-terminated strings have virtues, such as being recursively defined: the
tail of a string is a string. So strchr could be written like this (let's drop
the const for simplicity):

    
    
       char *strchr(char *s, int ch)
       {
         if (*s == ch)
           return s;
    
         if (*s == 0)
           return NULL;
    
         return strchr(s + 1, ch);
       }
    

It's easy to break a string with delimiters into the individual pieces in
place, just by writing nulls over the separating characters, and keeping a
vector of pointers to the pieces. This can't be done with some other string
representations like length + data.

When one null terminated string is a suffix of another (and ideally both are
treated as immutable), then they can share storage.

    
    
      char *excon = "excon", *con = excon + 2;
    

Catenating null terminating strings is efficient if you keep a tail pointer. A
repeated strcat-like operation will be O(N*N) of course, so don't do that in
some critical inner loop, with a large amount of text.

Length + data strings have various disadvantages. For one thing, how wide
should be the length? Two bytes? Four? Eight? If you make it two bytes today
and store binary data somewhere, it will be incompatible with tomorrow's four
byte length. And then there is endiannness. A binary file with strings
produced on one machine will have byte swapped lengths on another. Null
terminated strings can be blasted over a serial line or network, or written to
disks, as they are; they are already marshaled and ready to go! (Though I must
hastily acknowledge that this isn't true of wide character null term'd
strings, of course.)

Dynamic strings (management record plus pointer to data) are heavyweight
representations that will show their weaknesses at virtual machine boundaries.
You cannot pass them between address spaces or share them without marshaling
to some flat form and back.

Also, regarding another point in the article, MS-DOS did not invent the
backslash as a path separator instead of the slash. This is a common
misconception. MS-DOS supports both forward and backslash as separators! And
so does Windows (every version between then and now). Early versions of
COMMAND.COM had a variable whereby you could set this as a preference: whether
you want to display and input path separators as slash or as backslash. This
was later removed.

Today, when you have trouble with forward slashes in Windows, this is due to
the application you are using (including, sadly that application known as the
Windows Explorer, and its "Shell API"). The underlying kernel handles the
slashes just fine.

~~~
hyperliner
Bingo: "Length + data strings have various disadvantages. For one thing, how
wide should be the length? Two bytes? Four? Eight? "

~~~
andreasvc
That's easy, it should be of type size_t. Yes that is wasteful for short
strings, but then again I believe null-terminated strings are the most
widespread and worst case of premature optimization, and correctness & safety
should take precedence.

