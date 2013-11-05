https://www.sudo.ws/todd/papers/strlcpy.html (USENIX '99)
Further reading, including EXAMPLEs:
http://man.openbsd.org/strlcpy
http://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/lib/...
Barring quibbles about return values, strlcpy is essentially:
snprintf(dst, sizeof(dst), "%s", src);
https://blog.mozilla.org/nfroyd/2013/11/05/the-performance-i...
Put another way, the following either maintains the string invariant and does not silently truncate out, or it fails on a null pointer dereference. If you forget the + 1, it will certainly fail in valgrind when out is used, and will almost certainly crash your unit tests.
size_t len = strlen(in)+1;
char* out = malloc(len);
memcpy(out, in, len);
Open challenge: Produce something that is correct (i.e., it cannot silently corrupt the string), is as simple, and is less error prone with the strn* functions.
String manipulation isn't that hard, and strcpy and friends aren't very good. I'm a fan of creating a struct that has a uint8_t* and a length, and using that for all buffers, including strings.
Thanks for the pointer though! I'll definitely give it a careful read.
Unsurprisingly, "record strings" are what most every language even just slightly higher-level than C does: C++, Rust, Swift, Go, …
Writing C code with well-defined semantics in the face of existing heap corruption is harder. The strn* functions don't do that either though.
For real programs you basically have to put it in the initial computation, or it will be forgotten somewhere later (maybe in a later commit).
Mayhaps "len" should be renamed to something clearer?
Not of any use, few libcs support Annex K, which essentially "standardises" Microsoft extensions. Annex K was repeatedly rejected by GNU libc.
You're probably better off using the portable BSD functions (strl*).
The only safe thing to do with a string that doesn’t fit in the available space is to abort with an error. Check those return values!
For this usage, strncpy is perfectly suitable (though the performance concern may still slightly favor another approach):
if (strncpy(dst, src, size) >= size) {
abort();
}
I don't disagree that return values should be checked, or that things should maybe be more explicit if you want truncation, but calling aborting "the only safe thing" is wrong.
As a bonus, your strings can now contain interior NUL bytes, as strings in just about any non-C language can, and so you get improved interoperability.
You can do this in C, with various third-party string libraries like http://bstring.sourceforge.net/ or with third-party frameworks that include string handling like GLib (https://developer.gnome.org/glib/stable/glib-Strings.html). You can also do this in easily-C-compatible languages like C++ or Objective-C, or less C-like compiled languages like Go or Rust or Ada, or other languages like Java / other JVM languages (which will often give you faster performance than C), Python, Ruby, C#, the works.
The only good reason to write code in C that uses C strings is compatibility with some existing code that requires it (either you're making minimal changes to an existing codebase, or you're writing a small amount of code that interoperates with an API that uses C strings). You can't have both safety for large-scale programs and C strings.
Take the strlcpy() function from OpenBSD and incorporate that into your source if you don't have access. Strlcpy() is a well designed substitute for strcpy.
> Now having a function like this in the standard library isn't such a bad thing in itself. It's designed to deal with a specialized data structure ...
> The problem is that the name strncpy() strongly implies that it's a "safer" version of strcpy(). It isn't. ....
> It's because strncpy()'s name implies something that it isn't that it's such a trap for the unwary. It's not a useless function, but I see far more incorrect uses of it than correct uses. This article is my modest attempt to spread the word that strncpy() isn't what you probably think it is.
This argument seems correct to me. "Safer," here, does not mean "If you swap out every use of strcpy for a correct use of strncpy, you'll have fewer bugs." As the author is defining it, "safer" includes the risk that someone will be less rigorous with their code by assuming strncpy will solve all their problems. It won't. It will solve some of their problems, yes, but they still have to analyze their problem almost as rigorously as if they were using strcpy.
If a code reviewer cares deeply about strcpy and less deeply about strncpy (and anecdotally this is a thing code reviewers do), then in practice, the use of strncpy is not safer.
Why rigorously? A blind
dest[n-1] = '\0';
Any use of strncpy that breaks would also have broken if strcpy had been used instead, and there exist some situations where strncpy would not break while strcpy would; for instance, if we just inspect a prefix of the resulting string.
That said, it's certainly the case that most breaking uses of strcpy, if naively replaced with strncpy, still break.
Yes, I agree that this statement is true. But my claim is that this is not what the article means by "safer", and the article's definition of "safer" is more useful.
That is, people who are saying "The article is wrong because it is possible to use strncpy correctly in cases where strcpy was not being used correctly" are not disagreeing with the article so much as talking past it.
strncpy will /terminate/ at the end of the string if it's shorter than the allocated buffer.
If, somehow, the source string is invalid it will then terminate at the specified cutoff point.
However one should argue that if it has overflowed bounds, that if the source isn't a valid string, you've already violated the contract of a secure environment and thus something has gone wrong and should be handled at a level higher than string copy preferences.
Use of strncpy, however, assumes that the source string /is/ valid, but might overflow the target buffer. In /that/ case truncation may be desired and if not the program flow reacting accordingly is probably desired. (This is C, you don't /throw/ errors, you write explicit paths.)
This would be like claiming that malloc is unsafe because it doesn't return a null terminated string.
And yes, I would absolutely argue that malloc is unsafe. Certainly it's "less safe" in the language of this thread than calloc, and certainly it's completely type-unsafe. But more than that, if you want a string, there should be a function that allocates and returns an empty but well-formed string in one action, and malloc should be the lower-level unsafe API that this function uses. The myriad alternatives to C strings that I mentioned in my comment all support this - bstring has bfromcstr(""), GLib has g_string_new(""), Rust has String::new(), Ruby has "", and so forth.
Think of it as the argument that sharper knives are safer than duller ones. Certainly not true for untrained users. But widely true for trained ones.
Well, no, because this behavior is very close to the more commonly desired behavior and often indistinguishable therefrom. That makes it more of a "gotcha".
And thinking that strncpy will enforce a null at the end actually seems nonsensical to me. Now. I agree that it may not have at some point in my life. But this is again arguing that it is not safer than strcpy. It most certainly is. Just not completely safe. Which again, is no surprise. Why not point out that it does no checking that you passed it pointers you are allowed to write into?
I mean, I get that mistakes can be made. I even agree it is not safe. However, to claim it is not safer is akin to the people that claim Java is not safer than C because example.
> And thinking that strncpy will enforce a null at the end actually seems nonsensical to me.
That's bizarre, seeing as there are similar functions that do exactly that. I certainly understand having internalized that strncpy doesn't happen to, but I don't see how you can call it nonsensical.
It is nonsensical because I know it doesn't. I grant that that is a learned thing.
I do question how folks thought that it could do this. If you pass it n bytes with no null, how should it pick where to start putting the nulls? Last byte is somewhat intuitive, but then the behaviour will lead to string corruption pretty quickly.
You seem to be missing a distinction I would draw between "counterfactual" and "nonsensical".
The assumption that a function prefaced with "str" makes a well-formed str seems reasonable (if, it happens, misguided). When given too long a string, the function will truncate - there's not really anything else it could do. The question is only where it truncates.
> the behaviour will lead to string corruption pretty quickly.
It will lead immediately to string truncation - whether that's acceptable depends on context, and can be checked for. Is there something you're worried about that I'm missing?
My sibling comment of treating this like knives is ultimately my view. Strncpy is a very sharp tool. This makes it safer than strcpy. It is by no means safe, though. Just like sharper knives are safer, but still only safe with training and proper user.
So, that a function prefixed with str has some special treatment for strings makes sense. Otherwise just use memcpy. :)
The question is natural for how it truncates. It so happens the answer is by stopping. You imply it should do so by altering the string. I can see arguments either way.
For the rest, Truncation is a form of corruption. So I'm not sure where you are taking that line. Was it just my rhetoric? (I'm mainly typing on my phone, so using loose and likely mistyped phrases.)
> The question is natural for how it truncates. It so happens the answer is by stopping. You imply it should do so by altering the string. I can see arguments either way.
So can I, and I think that's the distinction I draw. Something is nonsensical if I really can't see any arguments for it. It's counterfactual if things could reasonably have gone that way but didn't.
> For the rest, Truncation is a form of corruption. So I'm not sure where you are taking that line. Was it just my rhetoric?
Truncation is, indeed, a form of corruption. It is perhaps acceptable in more circumstances than other forms of corruption. It wasn't exactly that I thought, from your rhetoric, that you were referring to something else. But on the off chance you were, I definitely wanted to know about it.
Agreed.
> arrays were zero-indexed to save one or two instructions in a compiler
Actually, I find zero-indexing more consistent whenever you need to do math with the indexes.
It's a primitive and gives you choices. You can either:
A) call it with sizeof(target) - 1 to explicitly tell it to NOT over-write the guard byte at the end, OR
B) you can add an explicit operation to always over-write the byte at the end
Since treating the destination buffer as a string is very tempting to do given that it came from a function named "str*cpy()", you've basically added another "Gotcha" to strcpy() without gaining any safety. You're basically trading one form of undefined behavior that's explicit in strcpy for another form of undefined behavior when a developer mishandles the return value of strncpy.
strncpy(newBuf, oldBuf, size);
newBuf[size-1] = 0;
That way you always end up with a null. It adds the null even when you don't need it, but that's more efficient than adding an 'if' statement.
Yep. Or perhaps return an error.
All it takes to avoid a missing NUL terminator is something like
#define safer_strncpy(dest, src, n) \
do { \
if (n > 0) \
strncpy(dest, src, n)[n - 1] = '\0'; \
} while (0)
Use snprintf() or strncat(), as suggested by the article.
