
Efficient string copying and concatenation in C - beefhash
https://developers.redhat.com/blog/2019/08/12/efficient-string-copying-and-concatenation-in-c/
======
antirez
I can't see how incrementally improving the approach of the C standard library
is the right design decision here. They need to just include a standard, well
designed dynamic strings library to C, that stores the length alongside with
the (binary safe) string itself, and that's all. Every serious C program uses
it, null terminated strings are a joke. However because of this huge
background with null terminated strings in C, such library should make sure to
always automatically terminate strings with a null term, so that people can
trivially print them, call strlen() against them and so forth when needed and
when there is no binary data inside.

~~~
Steven_Vellon
A lot of C developers I know see value in keeping the C standard library
small. I implemented my own `l_string` (length-string) struct and associated
libraries for a class (we weren't allowed to use external libraries) and it
was not prohibitively difficult. Something like this probably covers most use
cases: [https://github.com/antirez/sds](https://github.com/antirez/sds)

~~~
einpoklum
> A lot of C developers I know see value in keeping the C standard library
> small.

And what value what that be?

... if you're going to say "easier to fit a small implementation into the
memory of some microcontroller" \- not a sufficient argument. There's always
some smaller system with not enough memory, and larger systems with more than
enough.

~~~
kjeetgill
This is C land. You're not going to make it far by making the case that bloat
is acceptable.

People have a range of technical and aesthetic reasons for hating bloat and C
attracts a lot of them.

Off the top of my head:

\- Smaller memory footprints

\- Easier reimplementation

\- Lower attack surface

\- Less room for breaking changes

\- Less baggage for when we inevitably wish we could deprecate things

~~~
AceJohnny2
> _\- Lower attack surface_

wut?

~~~
Steven_Vellon
String formatting libraries, and other things that work with buffers are
frequent attack surfaces for buffer overflow attacks.

Granted, using C means developers often implement these operations themselves
which introduces the possibility of creating more attack surfaces. But it's
less likely that the standard library presents an attack surface when the
standard library is tiny.

~~~
AceJohnny2
I mean, C's attack surface is like that of activated charcoal. I'm not sure
that C's small standard library gives it a smaller attack surface,
specifically because it means programmers who have better things to do are
forced to reinvent the wheel, poorly [1]. But mostly, because C's lack of
guardrails means it takes active effort on even trivial operations to be safe.

I've been working with it for nearly two decades, and every year I think more
that C programs should be confined to a well-guarded quarantined area with
hazard trefoils and a "beware of the leopard" sign.

[1]
[https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule](https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule)

edit: people reimplementing their "safe" string library isn't something to
brag about, but be ashamed of our entire industry for.

------
dwheeler
I think memccpy is an improvement, so I support it, but it's still complicated
to use in practice. The article gives this example for a copy followed by
concatenation:

    
    
        char *p = memccpy (d, s1, '\0', dsize);
        dsize -= (p - d - 1);
        memccpy (p - 1, s2, '\0', dsize);
    

Notice that you have to recalculate dsize (correctly without one-off errors!),
it assumes you have dsize itself, and this doesn't detect overruns (which in
many cases you should do).

So real-world code would look more like this:

    
    
        // dsize is the space *available* in d, including \0
        size_t dsize = sizeof(d); // if d is an array
        // ...
        char *p = memccpy (d, s1, '\0', dsize);
        dsize -= (p - d - 1);
        if (dsize <= 0) goto overflow; // handle overflow
        char *q = memccpy (p - 1, s2, '\0', dsize);
        dsize -= (q - (p - 1) - 1);
        if (dsize <= 0) goto overflow; // handle overflow
    

It's a little easier to understand than strncat/strncpy versions, it doesn't
unnecessarily read its inputs past where they are needed like strlcat/strlcpy
do, and it's more efficient than snprintf. So yes, it's an improvement and I
support it. However, this is still rather complex; in particular, it's _way_
harder to understand compared to code that uses snprintf, and certainly harder
to understand than pretty much any other programming language higher level
than assembly.

So let's accept this improvement, and keep striving to do better.

~~~
camgunz
Yeah I think strlcpy is way better exactly for this reason. I'm sure the idea
behind returning the pointer was call chaining, but you shouldn't ever be
doing that anyway, and with strlcpy you basically can't do an off-by-one.

~~~
dwheeler
It's true that strlcpy is easier to use. One challenge is that strlcpy always
reads its inputs to completion, even when it is not needed to make the copy,
because it has to compute the total length of the screen ignoring limits. In
some circumstances that is a problem.

------
wkz
I feel like one alternative is missing from the list:

    
    
      fp = fmemopen(buf, sizeof(buf), "w");
    
      fputs("one", fp);
      fputs("two", fp);
      fputs("three", fp);
    
      fclose(fp);
    

Not sure how it performs, but it reads pretty well IMHO.

------
papermachete
Why are embedded developers unnerved by the concept of a featureful standard
library? There are Linux distributions aimed at statically compiled, musl-
based packages, hence you can very much choose what you need for your project.

Look at C++'s std::string in GCC, Clang, and MSVC's standard libraries and
respectively its development history. Of course you can make a minimalistic
standard string and also eliminate nullpointer checks, trailing \0 checks
(everyone passes size_t len anyways), and allocation issues in runtime.

The only standard thing about C strings are vulnerabilities.

~~~
umvi
I used to work on routers that essentially ran embedded linux inside and our
new projects started all being c++ after a while. It's amazing after the
switch to c++, we basically stopped ever seeing string/array related
segmentation faults now that developers use std string/vector/etc by default.
Yeah, it's more resource expensive, but we have plenty of RAM now on these
boards and it's totally worth not having a maintenence nightmare like our
legacy C projects which I swear get a new bug report every other month about a
newly discovered string/array-related segmentation fault (I'm not saying C++
can't ever be a maintenance nightmare, to be clear, but as far as memory
related issues go, they disappeared the moment we started using std library).

~~~
papermachete
Interesting, did you make your own allocators? Can you tell me the company?

------
loeg
The refusal of glibc (and I suppose, POSIX) to adopt strlcat/cpy is
continually obnoxious.

That said, if you actually need efficient string operations, you probably want
a Rope data structure rather than any libc primitive.

~~~
ncmncm
strlcat and strlcpy are not adopted because they are a really bad design.
Using them correctly takes more code than not using them. In practice they are
never used correctly, making them, an "attractive nuisance", a feature that
causes more trouble than its absence.

~~~
loeg
This argument doesn't hold water. They're absolutely no worse than strcat/cpy
and strncat/cpy, which glibc implements. I totally disagree with your premise
that truncation is an incorrect use.

In reality, the alternative to strlcpy/cat isn't "force programmers to write
correct code," it's "programmers will just use the crappier available
functions with even worse behavior on overrun."

~~~
ncmncm
Perhaps you have some better reason why Posix has rejected it, again and
again? Some sort of conspiracy is conceivable, but in service of what?

I have seen much, much better designs, that take into account that these
functions are rarely called in isolation. In those, calls cooperate with
previous and subsequent calls to share the burdens of maintaining correctness
and safety.

~~~
brynet
POSIX never rejected strlcpy/strlcat, because they were never submitted for
inclusion. Also, POSIX doesn't control the str* namespace, ISO C does. Not
that it matters, as it wasn't submitted there either.

But that's beside the point. Every major OS's libc has an implementation of
strlcpy/strlcat, and OpenBSD's can be readily lifted into a project's source
tree as it is portable, a simple code search will reveal the breadth of
adoption. The /only/ exception is glibc now. And glibc is not a standards
committee, for years the primary objections came from one person.

You're being dishonest.

[https://github.com/search?q=strlcpy&type=Code](https://github.com/search?q=strlcpy&type=Code)

------
zokier
C-coders of HN, do you use plain vanilla C strings in your project (s)? I was
under the impression that most (at least bigger ones) use some custom length
carrying string type to avoid exactly these sort of problems

~~~
kstenerud
It depends. If I'm writing a library, it's bare pointers. If I'm writing
something not a library that's big enough, I'll use a struct {size_t length;
char* string;} where the length is the string length, and string contains
(length) characters + a nul byte. I might even mix in allocation data for the
total allocated size of the buffer if it's important enough.

Simple to implement and use (and also backwards compatible), provided you have
a library of common functions for allocating, copying, etc.

If I'm size constrained, I'll consider uint16_t for the length field. If I'm
REALLY size constrained, I'll use a VLQ [1] for the length field and take the
slight performance hit.

[1] [https://github.com/kstenerud/vlq/blob/master/vlq-
specificati...](https://github.com/kstenerud/vlq/blob/master/vlq-
specification.md)

------
yyyk
"The strlcpy and strlcat functions are available on other systems besides
OpenBSD, including Solaris and Linux (in the BSD compatibility library) but
because they are not specified by POSIX, they are not nearly ubiquitous."

This ignores how often they are (re)implemented in userland. glib, X and even
the linux kernel have implementations. Perhaps we could just standardize what
programmers chose rather than allow glibc an unjustified veto?

------
dwheeler
memccpy definitely has some advantages, so it definitely should be in the ISO
standard.

But memccpy has its own problems. In particular, when concatenating you have
to constantly recalculate the "space remaining"; that is just _asking_ for an
off-by-one error that leads to a buffer overflow, and makes it more
complicated to use. The discussion here doesn't detect attempted overflows,
and that's a mistake; you often need to not just _prevent_ an overflow, but
you also need to _detect_ an attempted overflow and do something different.
You also have to pass \0, which makes the function call more complex (and
perhaps under-optimized) since \0 would in nearly all cases be the parameter
passed.

So I'm glad this is being added, but it's at most a small step to improving
simple string copying and concatenation in C.

------
wrs
The Microsoft approach to this was to make a set of replacement functions
(strsafe.h [1]) that are very explicit and not at all “clever”, as the
strsplcasdfcpy functions seem to want to be. They return an error code so it’s
obvious when the operation did what was expected, or ran out of space.

[1] [https://docs.microsoft.com/en-
us/windows/win32/menurc/strsaf...](https://docs.microsoft.com/en-
us/windows/win32/menurc/strsafe-ovw)

~~~
dwheeler
But most people want to use standard, portable calls.

The C standard tried to add such functions in "Annex K", but unfortunately
annex K hasn't received much of a pickup (for various reasons).

So in many places the problem continues.

------
GuB-42
This is one of the reason I tend to avoid str* functions in the first place,
except for one strlen() per string.

The way I copy and concatenate strings typically looks like:

    
    
      int len1 = strlen(str1);
      int len2 = strlen(str2);
      char *buf = malloc(len1 + len2 + 1);
      if (buf) {
          memcpy(buf, str1, len1);
          memcpy(buf + len1, str2, len2);
          buf[len1 + len2] = 0;
      }
    

Of course, not memcpy()ing data around is even better if I can avoid it.

~~~
js2
s/int/size_t/ and check for overflow.

~~~
GuB-42
Totally right about size_t, my bad, hopefully, the compiler will raise a
warning.

As for integer overflow, I don't actually know how to handle it properly. In
normal conditions, it is unlikely to be a problem. If the two strings can fit
in memory, the sum of their size should fit in a size_t, but I agree that
making such assumptions can be a bad idea.

Maybe the best way is to limit the size of the input strings to a reasonable
value. That would prevent many out of memory situations too and potential DoS
too.

~~~
js2
> As for integer overflow, I don't actually know how to handle it properly.

Something like:

    
    
       if (str1 && str2) {
          size_t len1 = strlen(str1);
          size_t len2 = strlen(str2);
          size_t buf_len = len1 + len2 + 1;
          if (len1 < buf_len && len2 < buf_len) {
             char *buf = malloc(buf_len);
             if (buf) {
                memcpy(buf, str1, len1);
                memcpy(buf + len1, str2, buf_len - len1 - 1);
                buf[buf_len - 1] = '\0';
             }
          }
       }
    

(I probably made a mistake above.)

As you suggest, you'll probably run out of memory before you'll overflow, so
in reality, you want to check len1 and len2 are some sane value, but of
course, library functions don't usually have that luxury. Take a hint from
git:

    
    
        #define unsigned_add_overflows(a, b) \
            ((b) > maximum_unsigned_value_of_type(a) - (a))
    
        if (unsigned_add_overflows(extra, 1) ||
            unsigned_add_overflows(sb->len, extra + 1))
              die("you want to use way too much memory");
    

[https://github.com/git/git/blob/6d5b26420848ec3bc7eae46a7ffa...](https://github.com/git/git/blob/6d5b26420848ec3bc7eae46a7ffa54f20276249d/git-
compat-util.h)

[https://github.com/git/git/blob/9d418600f4d10dcbbfb0b5fdbc71...](https://github.com/git/git/blob/9d418600f4d10dcbbfb0b5fdbc71d509e03ba719/strbuf.c#L90)

> making such assumptions can be a bad idea

It's always a bad idea, especially in an unsafe language. Never trust user
input.

------
johnisgood
> Of the solutions described above, the memccpy function is the most general,
> optimally _efficient_ [...]

This does not seem to be the case for me AT ALL. _strcpy_ for example, is a
lot faster than _memccpy_. Here are my results:

    
    
      $ gcc -O0 bench.c && ./a.out
      memccpy: 0.008405
       strcpy: 0.002913
      
      $ gcc -O3 bench.c && ./a.out
      memccpy: 0.007933
       strcpy: 0.002590
      
      $ clang -O0 bench.c && ./a.out
      memccpy: 0.008771
       strcpy: 0.003225
    
      $ clang -O3 bench.c && ./a.out
      memccpy: 0.007966
       strcpy: 0.000383
      
      $ musl-gcc -O0 -static bench.c && ./a.out
      memccpy: 0.007849
       strcpy: 0.005647
    
      $ musl-gcc -O3 -static bench.c && ./a.out
      memccpy: 0.005754
       strcpy: 0.005625
      
      $ tcc bench.c && ./a.out
      memccpy: 0.014252
       strcpy: 0.004045
    

Source code can be found here:
[https://slexy.org/view/s2EHngPvDh](https://slexy.org/view/s2EHngPvDh)

\---

The differences seem to be quite interesting. Did I mess up the code? Compare
_gcc -O3_ 's _strcpy_ and _clang -O3_ 's _strcpy_ : _0.002590_ vs _0.000383_!
_musl-gcc_ on the other hand has much more similar results.

\---

    
    
      $ gcc --version
      gcc (GCC) 9.1.0
      Copyright (C) 2019 Free Software Foundation, Inc.
      This is free software; see the source for copying conditions.  There is NO
      warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    
      $ clang --version
      clang version 8.0.1 (tags/RELEASE_801/final)
      Target: x86_64-pc-linux-gnu
      Thread model: posix
      InstalledDir: /usr/bin
      
      $ tcc -v
      tcc version 0.9.27 (x86_64 Linux)

~~~
eMSF
>Did I mess up the code?

You certainly did; both your benchmark functions overflow their internal
buffers.

However, it's not all that improbable for memccpy to be slower than strcpy in
this case, even if it was used correctly; after all, it does more, and by
doing so would prevent the buffer overflow if supplied with correct arguments
(specifically, the last one). As to how much slower, I cannot tell. Also,
you're skipping over half the point of the article by using a (fixed) source
string, the size of which is known in advance.

In general, I would not benchmark library functions on local variables
(buffers) without observing their results. It's far too easy for the compiler
to remove the call altogether when said removal doesn't make any difference on
the output.

~~~
johnisgood
> You're skipping over half the point of the article by using a (fixed) source
> string, the size of which is known in advance.

You are right. I should have focused more on that instead, that seems more
relevant to why the author of the article is suggesting _memccpy_. I am
curious as to whether or not it really is the case that _memccpy_ is
"optimally efficient" in practice over the alternatives. Would you like to
prove or disprove that statement yourself? I modified the code a bit; it uses
_strlen_ to calculate the length of the string passed, and the string is
_argv[1]_. _memccpy_ is still just as slow. Is this a more acceptable approach
to you? In this case we do not know neither the string, nor its length in
advance. _strcpy_ still outperforms _memccpy_. Is this sufficient to disprove
the claim that _memccpy_ is "optimally" efficient to other alternatives? The
other criterion was being widely adopted, in which case, well, _strcpy_ also
looks good. Moreover, as dwheeler pointed out, _memccpy_ is a tad difficult to
use in practice. I will give _strlcpy_ a try, too, since I prefer that over
_strcpy_. In any case, I am not convinced that these criteria hold true for
_memccpy_ over the alternatives.

> on local variables (buffers) without observing their results

What do you mean exactly? I did observe the results of the buffer. See the
_printf_ , or are you not referring to that?

> both your benchmark functions overflow their internal buffers.

Would you please elaborate on it, and its relevance? Are you referring to N
being too high?

~~~
eMSF
>Would you please elaborate on it, and its relevance? Are you referring to N
being too high?

No, the stack size is implementation-defined anyway. Instead, you have a
classic off-by-one error because you didn't reserve any space for the final
null terminator.

Correctly used memccpy would protect against an issue like this, although the
destination string would not be correctly terminated, as it's not a safe
string function. Also, if your memccpy version had the correct arguments
inside the loop, you wouldn't have needed the extra call before the loop to
hide the issue, as the memccpy call would have been functionally identical to
strcpy except for the last pass of the loop.

>What do you mean exactly? I did observe the results of the buffer. See the
printf, or are you not referring to that?

At least in the link you provided, all the printf's that would observe the
contents of _buf_ after the loop are commented out. No observable change
happens in the execution of the program even if your compiler decides to just
remove any calls to strcpy or memccpy.

\--

That being said, strcpy is quite efficient at what you're benchmarking; that
is, "multiplying" short strings. The task doesn't highlight its shortcomings.
(strcpy wouldn't be too bad even if the strings were longer, although memcpy
might be slightly faster.)

But consider the following silly example (not checked for errors) that does
highlight the issue:

    
    
      char *next_insert;
      size_t remaining_size;
    
      void append_memccpy(const char *str)
      {
        char *tmp = memccpy(next_insert, str, '\0', remaining_size); // single pass over str
        if (tmp) {
          --tmp; // move pointer to terminator from one past it
          remaining_size -= tmp - next_insert;
          next_insert = tmp;
        } else { // insufficient size remaining
          str += remaining_size; // first remaining_size bytes are already copied
          allocate_more();
          append_memccpy(str);
        }
      }
    
      void append_strcpy(const char *str)
      {
        size_t len = strlen(str); // first pass over str
        if (len + 1 < remaining_size) {
          strcpy(next_insert, str); // second pass over str
          remaining_size -= len;
          next_insert += len;
        } else {
          allocate_more();
          append_strcpy(str);
        }
      }
    

Now, even though the latter version is extra silly (just to resemble the
former more), it doesn't change the fact that with strcpy, we have to process
each byte in _str_ twice. If _str_ is long enough, that might not be exactly
free.

~~~
johnisgood
Thank you for the reply. I did think about the significance of the length of
the string, but I was too lazy to benchmark that. Perhaps another time.
Theoretically, for the reasons you mentioned, _memccpy_ should perform better
on larger strings, but I am not sure if that really is the case in practice
(slow implementation of _memccpy_ , lack of compiler optimizations, etc.), and
it seems like that "stick to _memccpy_ " is not an universal rule (obviously).
:D

~~~
eMSF
Note that I mentioned the ordinary memcpy (with a single c) there briefly.
memccpy should under no circumstances be faster than strcpy for "string
multiplying", as it holds no advantages over it in that use.

~~~
johnisgood
Ouch, my mistake.

In any case, could we sum it up? In what cases should _memccpy_ be used over,
say, _str{n,l}cpy_ , or even _memcpy_ , and is it in conflict with the
article's recommendation or its statement on performance regarding _memccpy_
vs. the alternatives?

------
pksadiq
> The committee chose to adopt memccpy but rejected the remaining proposals.

Is that the case? Reading the updated standard draft[0] they also included
strdup and strndup. May be they rejected first, then chose to add later.

[0] [http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n2385.pdf](http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n2385.pdf)

------
legulere
Why not abandon \0-terminated string and pass the length of strings as an
additional parameter?

~~~
carapace
[https://stackoverflow.com/questions/25068903/what-are-
pascal...](https://stackoverflow.com/questions/25068903/what-are-pascal-
strings)

~~~
legulere
Pascal strings prefix the string with the length. What I'm arguing for is more
like fat pointers, where the length is stored at the same location as the
pointer. Something that is already standard to do in C for binary data.

------
codehero
Efficiency looks past current deficiency.

We have the empty string: "\0"

We have the null string: NULL

There is no concept of an INVALID string, as float has NAN.

This would be the result of trying to copy a string to a buffer that is too
small.

Or sprintf() into a small buffer.

Or a raw string parsed as UTF-8 and is invalid.

Correctness over efficiency.

~~~
charliesome
I'd argue that an invalid string concept would be neither correct nor
efficient. Why should all code that deals with strings carry the burden of
fallibility of a subset of string functions?

You've mentioned NaN propagation in another comment and I think that's a
perfect example of the problem with this approach. Sorting a vector of
arbitrary floats is a notoriously thorny problem because any float could be
NaN, and as NaN is incomparable to any other float, there is no total ordering
of floats. There is no general solution to this problem that doesn't involve
making assumptions that could be faulty for some applications.

~~~
codehero
Please support your argument against correctness by providing an example where
an INVALID string as input to a suitable modified generic string function
would result in a valid string.

~~~
RhysU
What is length of an invalid string? What is the length of the concatenation
of two invalid strings?

There are sensible answers. But they are weird.

~~~
codehero
Is it more sensible to cat 2 strings, but cut off the second one, then pass
off the result as valid?

I would say let an INVALID string be length 0. Then accept that catting a
valid and invalid string would result in a shorter length.

Which one do you think is safer?

~~~
RhysU
I would expect an invalid string to have an invalid length. For integer-valued
lengths you'd have to use a negative number to differentiate from a valid,
empty string. But then the sum of the invalid-string lengths differs from the
length of the concatenated invalid strings. Which is wonky.

~~~
codehero
Safe string manipulation never exceeds the bounds of the buffer. So negative
values are dangerous, as all as any additions that would exceed the maximum
size.

Negative lengths are not compatible with unsigned representation.

A system implementing invalid string values must choose a text encoding such
as UTF-8 that supports the concept of an invalid character. Null termination
is too flexible. As such is simple length prepending.

------
SignalsFromBob
That web page's color choices have made it very difficult to read. I don't
know who thought putting light grey text on white was a good idea. I had to
copy and paste the text to a text editor in order to read it.

~~~
dmortin
I usually just press Ctrl+A to select all in these situations to make the text
readable.

------
ChrisSD
Does every org have their own C string library or does it just feel like it?

~~~
camgunz
The last job I had working in C didn't--we leaned heavily on libraries for
stuff like strings, logging, hashtables, and serialization. Implementing that
stuff yourself is either a big timesink, or just asking for bugs and security
issues.

------
lota-putty
OT: Is there any API out there that implements say a length* prefix in all C
strings?

* 2 or 4 octets size

~~~
ninkendo
What you're talking about typically goes by the name of "pascal strings", and
while they're possible to do, C's string literals are not compatible with
them, so nobody does it.

~~~
childintime
It is certainly possible to declare Pascal string literals with not much
hassle: [https://stackoverflow.com/questions/7648947/declaring-
pascal...](https://stackoverflow.com/questions/7648947/declaring-pascal-style-
strings-in-c)

One of the answers states that GCC and Clang do have support for Pascal
strings.

Probably these strings do not work as (well) in #defines, i.e. they don't
concat like regular literals.

------
komali2
Obligatory sidenote on website itself rather than content: please don't set
such a low contrast between font color, choice, and background color. I had to
copy/paste this article to read it.

~~~
segfaultbuserr
Use the developer tool to inspect CSS.

1\. Uncheck

    
    
        article .entry-content {
            color: #646464;
        }
    

And a new CSS color rules appears.

2\. Uncheck

    
    
        body {
            color: #333;
        }
    

Done!

I used to think it's ridiculous to manipulate a webpage like this manually,
but now I believe: if it's helpful for your for an one-time browsing on a
broken webpage, why not?

