
Solving the strcat() woes - haneefmubarak
https://haneefmubarak.com/2015/12/13/solving-the-strcat-woes/
======
makecheck
Great observations.

Really, it's unfortunate that null-termination was used at all. The "unlimited
length" feature did not have to be implemented in such an expensive way.

As an example, I think it would be possible to use leading bytes to encode
strings in one of three ways, using a Pascal-style string as the base case but
limited to 253 characters:

    
    
        [0]      [1...253]                  (short strings)
        length   data
    
        [0]      [1...254]   [end+1]        (concat. string chains in large contiguous block)
        254      len+data    start of next
    
        [0]      [1...8]     [9...255]      (concat. string chains in non-contiguous blocks)
        255      next ptr.   segment len+data
    

In other words, the values 254 and 255 could act as special values to dictate
how to chain together multiple string fragments, and any shorter complete
string would degrade to a Pascal-style string.

~~~
rootbear
Interesting idea. It makes me think of the indirect blocks used in Unix style
file systems. But I'm unclear on some points. Should the 253 in the second
case (s[0] == 254) be 255? It's not clear why it ends at 253. Second, how is
the end of the string found in the cases two and three? I'm assuming the last
part of the chain would be a string segment of type 1, with the length of the
final segment given in the first byte.

~~~
makecheck
Actually in cases 2 and 3, since the first byte is "special", the length of
the actual string must be the next byte. I've updated it to show that. This is
why there is no length-254 string. And in the 3rd case, the maximum segment
length is 10 bytes shorter.

For a multi-segment setup the length of the entire string would be found by
summing the lengths of each segment. This is still "fast" because you would
have to, at most, test one byte per segment to find the next length byte, to
find the total string length.

