If the standard libs are that bad, the community should really lobby for the inclusion of a "safe" string copy into the C standard. Failing that, use a battle-tested third-party library like bstrlib.
If you must write C, you should use a real string library, and not string.h.
I agree that a standard fix for this would be great. It's been five years since I wrote the post, the solution has been available for many years longer than that, and the standards committee has shown no interest in fixing this issue. In the interim I think it would be irresponsible for developers to continue using strncpy or any of the other unsafe options when strcpy_safe is so simple and effective.
Also, it doesn't appear to be specified what happens if the count argument is too large to be represented as a ssize_t. The destination buffer would have to be extremely large, so it probably doesn't happen in practice, but it'd be good to specify it, or at least explicitly state it's unspecified / undefined.
But this year I decided to make them safe instead. In this case the spec is broken.
"C strings and string functions aren't designed to be safe. If you want safe strings use a library."
Let's not forget that it's also not natively available in glibc. Though you'll find it in every BSD you can think of.
Like snprintf(3), the strlcpy() and strlcat() functions return the total
length of the string they tried to create. For strlcpy() that means the
length of src. For strlcat() that means the initial length of dst plus
the length of src.
In fairness, that is probably just not the target "market" for strlcpy. Presumably, it is meant for "please copy this whole string which I expect to fit in the target (but catch me in the rare case that it does not)".
They made it this way to make it more of a drop-in replacement for strncpy, but IMHO they should have changed the return value to be the number of characters copied. If your return value is less than n, then your string was copied completely, if it is equal to n then your string was truncated.
There are good reasons to use strlcpy over memcpy. For example you have a text parser where the most common case is each string is only a few bytes long, but you need to be able to handle odd cases where they are much longer. So you have a large buffer that you only use a tiny chunk of most times. With strlcpy it will be quick, but memcpy will be chugging the whole buffer each time.
If efficiency is not a concern for you and you rather like the convenience, good for you.
Personally I feel plain const char pointers (plus length, depending on expected size) are good enough to emulate the immutable string type. For modifiable strings, I still prefer viewing them as chunks of memory, using char pointer + length + capacity. Nice thing is, you can easily go from the former to the latter in the lifetime of any one string. You can also pool many strings in large chunks if you build only one string at any one time.
Micro-optimizing each line of C code based on gut feeling is not an option if quality matters.
So I expect a good software engineer on my team to be able to design a data structure where the fact that a linked list is being used doesn't hinder to change it afterwards to an array, a B-Tree or what have you.
Without changing a single line of client code I might add.
Such good software engineer will certainly had attended an algorithms and data structures class on their CS degree and be aware of architecture designs based on Abstract Data Types.
Yes, even in C with its flaky translation units instead of proper module systems, it is possible to implement Abstract Data Types application architectures, which any good software engineer can easily tackle.
And for those that aspire to reach that level, here are a few examples.
"Algorithms + Data Structures = Programs" by Niklaus Wirth,
1st edition is Modula-2, the 2nd in Oberon
"Introduction to Algorithms" by Thomas Cormen, Clifford Stein, Ronald Rivest, Charles Leiserson
"Abstract data types and Modula-2: a worked example of design using data abstraction" by Richard Mitchell
"Abstract Data Types in Modula-2" by Rachel Harrison
"Abstract Data Types and Algorithms" by Azmoodeh, Manoochchr
Sofware quality and meeting delivery SLAs should trump micro-optimizing cache access per code line.
So if SLAs define 100ms per access, no need to go crazy making everything under 5ms.
Thankfully, the Linux foundation has another view on the matter.
Naturally, when one creates struts and accesses the fields directly there is no way out of a bad design.
Other more far reaching design choices could be: when and how much precomputation is done, data loading strategies, data-flow oriented vs control flow oriented, threading strategies, etc. You can't hide these choices behind a name. They deeply affect the structure of the program.
If you have the length of both, simply assert the length of the destination is <= the length of the source. Problem solved (bugs become obvious in testing).
Asserts are necessary, but not sufficient.
If the destination buffer doesn't have a fixed length then it will fail to compile.
This is similar to your suggestion with these improvements:
- Template deduction of the destination size is done - this is essential to avoid bugs
- Detection of overwrites is done in release builds as well as during testing
- Detection of overwrites is automatic - it doesn't require manually adding asserts to call sites
- strcpy_s exists in some standard libraries
I do have some maintainability concerns though about the promulgation of numerous competing home-brew solutions to a near trivial problem throughout a code base.
With `strncpy`, the caller assumes the responsibility of buffer size (by specify that size explicitly every single time. I thought that responsibility is not being debated within the current context.
I bring it up because the article makes programming errors on the part of the caller part of the case for its "safe" solution.
TL;DR - I've reviewed enough code to know that software developers pass the wrong size to strncpy and friends.
I am not saying that is bad. But solutions already exist. One is using the string class, another being transiting to a dynamic language (having size as part of data structure and checking it at every usage is one step into dynamic types).
Alternatively, the compiler can take over this responsibility by statically check every call (the rust solution). That is a different language. On that, I have a question: how can we pass the syntactic buffer size into a static library without embedding the size into the data? If we do embedding the size, then it is still dynamic within the context of the library.
Either way, one should not still be discussing the safety of strncpy here.
Now as long as we are discussing strncpy, I assume it is given that we do not want to have buffer size as permanent part of the simple string type and we are not looking for any solutions that demand an overhaul of the language/libraries. I assume it is given that caller takes the responsibility of buffer size -- exchange with a bit responsibility for simpler system and greater control.
You won't have a good solution (or good discussions) without confining our context first.
* We're going to stick with C and its standard library: lots of room to continue to make errors.
* We're going to stick with C, but it's OK to consider an alternate implementation of strings: the C standard, the standard library, and lots of other C libraries are full of functions that deal in char* strings, so you're stuck with them. Also, runtime overhead increases.
* We're willing to consider other languages: other languages have compiler overhead, semantic overhead, runtime overhead, or all three. Also, you still have to interoperate with C.
So the main issue is how to make C devs actually adopt some kind of safer C dialect and migrate towards it.
I can't quite tell why, but I found it kind of insightful and it made me feel a little better about my own inclination to work on lower levels.
And I also find funny the remark. :)
Really, stop using C. To quote Hayao Miyazaki, C was a mistake.
#define BUFSIZE 100
buf[BUFSIZE]=0 // for global var, it is already zeroed, right?
How does this work if the size of the destination isn't known at compile time? I.e., it's not "buffer”?
And this is progress. I have fixed code that looked something like this:
strncpy(dst, sizeof(dst), src);
data[sizeof(dst)-1] = 0;
Every mistake that can be made, will be made.
> The focus of this post is on fixed-size string buffers, but the technique applies to any type of fixed-length buffer.
I'm a big fan of using strategies to avoid bugs. Another strategy than can be a big help is to spend a little time thinking about what kinds of strings you are going to be using and how big they might be. And then sanitizing your program input based on those sizes. Find the too-long strings before they get into your program and before you start copying them around.
The system I'm responsible for came into being as a set of programs on a Univac II and was a bunch of C and FORTRAN when I joined. Because of the need to pass data between C and FORTRAN programs/code, we had a set of 10-20 standard string sizes, based on what kind of text you had. We no longer write new code in either C or FORTRAN, but maintain the concept of standard sizes for our text. By specifying the kind of text (we also do this with numbers) in the design steps, we save far more time and effort than we spend over the entire SDLC.