For someone being so pedantic it is puzzling that he doesn't address the actual title of his post. No one claims that strncpy is 'safe' they just say 'safer', that is, less likely than normal strcpy to have catastrophic bugs. And I think that claim stands up: make sure your n in strncpy is the length of your buffer, do a strncpy, write a null at the end of the buffer. Voila, all those strcpyies from unexpectedly large strings writing in your heap are gone. No need to constantly strlen source strings to check the length -- which in itself can cause a segfault if they are not null terminated.
So yes, it is safer. You would be mad not to use it.
This can be put into a #define macro or inline function.
If what you want is a null terminated, truncated copy that fits into a given buffer size, and snprintf is available, you are mad to still use strncpy.
Also, strncpy pads the destination with nulls, if there is remaining space. Because of this, the evident purpose of strncpy is to fill fixed-width text fields in "old school" fixed-field database records. Writing the extra nulls is wasteful if the data type in question is nothing but a C string.
I dislike the strncat approach because if the prior null termination gets lost then problems ensue, whereas with snprintf that won't happen. The BSD man page mentions that snprintf may not be safe for use in signal handlers on some systems; obviously snprintf is a much bigger piece of code.
Personally I dislike using extensive macros, but I'll leave that for the flame war.
With fixed-width records, writing the extra nulls may be important for data privacy reasons and in any case, it allows comparison using memcpy(3), diff(1), and checksums.
I tried to give alternatives. Calling strncpy and then manually null-terminating is not a good alternative. Creating a wrapper that does this is better. That wrapper could optionally abort the program on string truncation.
And, probably more importantly, a template wrapper can infer the destination size, thus getting rid of the error-prone specification of the destination size (I have seen it specified incorrectly many times).
You'd be mad to use strncpy when better alternatives are so easy to find or create.
It's worth noting that, quite often when this subject comes up, people suggest a straight replacement with strncpy, strlcpy, snprintf, or something similiar. This is safer because it doesn't overwrite memory or cause a segfault, BUT, that's not always enough. If the code just carries on with the truncated string that's data loss and it might cause trouble later.
For example imagine a system designed for 8.3 filenames - if you pass a long filename and it gets truncated it may then refer to a completely different file.
The man page for strlcpy (IIRC) reminds you of this.
strcpy() can be used safely if you ensure that the destination array is big enough to contain the source string. (Yes, that can be non-trivial.)
strncpy() by itself doesn't write past the end of the destination array (assuming you call it correctly) -- but it can easily leave the destination without a terminating null character. This causes undefined behavior if you then pass the destination array to another string function. strncpy() is not dangerous by itself, but it can leave a booby-trap that causes other string functions to be dangerous.
This:
strncpy(dest, source, size);
can be replaced by this:
dest[0] = '\0';
strncat(dest, source, size);
which will leave the destination properly null-terminated.
Note that this can still quietly truncate the string. Ignoring that possibility is rarely the right thing to do.
Replacing one function call with a function call to a function that does almost the right thing in many cases, but potentially wastes time by pointlessly zero-filling a buffer,
and an extra assigment statement is not an improvement. You would be mad to do that.
C people, isn't it perhaps time to quietly pasture the existing standard library and create a new one, from scratch, that isn't chock full of booby traps in the small print?
Opinion: Some not so quietly; in any case, most of them are not worth reusing. But there are exceptions, e.g., stralloc.h
Opinion: Personally, because I strive to learn more than to be "productive", I prefer languages that force the author to write her own libraries. As I see it, the alternative is languages that discourage this, effectively coercing the author to use other people's libraries, the quality over which she has no control.
Opinion: For me, the bad part is that there are so many poorly thought out C functions in the wild (even including the "standard" ones); the good part is that the C language encourages authors to write their own libraries.
Given the choice, I still prefer assembly to C. I guess this is because I prefer to write small programs; perhaps I am not smart enough to write large ones.
Are you not making an assumption about your own code then? The strength of open software is that you don't have to rely on one person to have solved everything. I consider myself to be a good developer, better than most for some areas, but I know there are others who have written better code for other areas.
I prefer not to reinvent the wheel when I know my implementation won't be any better, or if I were able to make a better wheel, that will be time spent that I wasn't reinventing the carriage.
The real problem, how C sees strings, arrays and pointers would still persist.
Implicit conversion between vectors and pointers should have never happened. I don't get why people find so hard to write &vector[0] for those cases where a pointer is needed.
Also if other languages, for other OS, older than C could do bound checking elision, C not doing it was only saving work for the compiler developer.
If &array[0] were needed, it wouldn't solve anything. The semantics that you can take the pointer to the first element, and pass it around and treat it as an array, with no meta-data connecting it to the original properties of the array, would still be there.
The requirement for a cast would be better, because that's the existing notation for explicitly requesting conversions that are identified as unsafe. I.e.
It didn't teach me enough to recognize the silliness in C.
I believe that this is an aspect of C popularity that hasn't received a lot of recognition. Programmers were sucked away from cleaner languages into C, because the users of the cleaner languages had neglected to acquire the body of rationalizations for why those languages are that way which could have helped them stick.
C looked good in naive ways. "Hey, look at all the things you can easily do with memory and pointers! Wow, look at that slick syntax: you can stuff several increments and an assignment into an expression, and use that as the test of a loop, ... man I'm never writing foo := foo + 4; again!"
Modula 2's memory management is basically malloc/free though. While the language could save you from array overruns, it is basically defenseless against leaks and use-after-free bugs, as well as uses of unchecked null pointers under OOM:
I came to C from Turbo Pascal (3, 5.5 and 6.0), so it already looked primitive to be back then.
Fortunately on the same year I got to learn C++, which allowed me to use most of the features I already knew from Turbo Pascal, alongside better safety if cared about it.
Eventually I only used pure C when required to do so at the university and on my first job after the university. Everywhere else the choice between both languages was always C++.
Regarding C, when coding in it, I always adopted a Turbo Pascal style by using translation units as if they were modules. With data structures designed as ADTs, without any direct access to its internals, defensively programming and eventually with a little bit of design by contract.
> Modula 2's memory management is basically malloc/free though. While the language could save you from array overruns, it is basically defenseless against leaks and use-after-free bugs, as well as uses of unchecked null pointers under OOM:
Yes, but there is already a bit list of errors that Modula-2 and similar languages saves one from:
- Buffer overruns (unless disabled, of course)
- Out parameters being null
- Implicit conversions of enumerations into integers
- Implicit conversions between numeric types
- Strings without terminating null
> Is that just "C envy", or is there perhaps need for those things, know what I mean?
Of course those features are needed in systems programming, but there is a big difference being safe by default and making explicit use of unsafe code, or just being unsafe everywhere.
For example, in Modula-2 if a module imports SYSTEM you already know that module is doing something fishy. In Oberon and Modula-3, like .NET, one could forbid unsafe modules from being loaded.
Whereas in C, even code that looks harmless can be doing strange things to the memory state.
You remember the Holy Grail scene from Indiana Jones? The problem is not the absence of the grail, somewhere on the shelf. The problem is all the others and the fact that if you pick the wrong one, your program desiccates, crumbles to dust, and is blown away by a conveniently dramatic wind.
A grail shelf that wasn't intended as a booby trap would contain one cup, the right one.
strndup is not a replacement for strncpy, it allocates memory, whereas strncpy takes an existing buffer. In-place memory usage is very important for preparing anything of a fixed size (structs) to be written out to disk or the network.
To be fair, maintaining compatibility with existing software (including libc) is the reason, either directly or indirectly, for many (most?) of the gotchas in C++.
Then again "your C code is valid C++ code!" was perhaps the biggest selling point of C++ when it came out.
std::string is a complete replacement for C strings. That is, fixed character width ASCII-or-equivalent strings. That's what it was designed for, and that's what it is.
People often complain about the lack of a real "string class" in the standard C++ library. Well, guess what? No language today really has primitives that handle unicode absolutely perfectly, and there's still intense debate about what they should be doing with respect to encodings, length() functions etc.
Maybe the answer is that Unicode itself should be simplified so that every word in language has one representation in any particular encoding.
A C string could be just as well replaced with a std::vector<char> (or a wide variant thereof) without any significant loss of functionality.
std::string is an embarassement. It's both bloated (why, we need to have both iterator-based and index-based access interfaces!) and lacks basic string manipulation functionality, it's encoding-unaware.
- std::string is old. It predates the rest of the STL, and had an index interface prior to the iterator based one. The old API was left in for compatibility and imposes no runtime overhead
- It has all the string manipulation capability of the C standard library
- Being encoding unaware was probably a blessing given how unicode has evolved since std::string was introduced (20-30 years ago?)
If you want to use std::string operations on an array of characters that you don't own or that are part of an indivisible larger structure, you're out of luck.
It's not difficult to craft a slice-like replacement, but range types will be most welcome when they finally hit the standard.
C was the first programming language I learned well. I've co-written two books on C programming. I've built software ranging from text adventure games to flight management networking software in C.
Backwards-incompatible change breaks code. And it may do so silently. And the C standardisers don't really have the political clout to do this. Trying to force people to "new C" would be a lot of work and would probably result in a culture split like the backwards-incompatible Perl and Python splits. OpenBSD have managed to move people to strlcpy(), but that's about it.
People are just as likely to migrate to another language (Rust? Go? Even C++) as upgrade the old C code.
You add them along side, and you define them as the "modern" standard, and you discourage, but do not forbid, mixing them up. Ideally, there would be no temptation to mix them up in new code, because the new ones would just b better. Including them into old established code (that can't realistically be re-written) is harmless.
At least, there are some corners that would need to be addressed. A new, better standard library would provide safer string handling and omit the traps that where left out for years to preserve compatibility.
The C++ standard seems to get much more attention than C. C seems to be the step-daughter of programming languages, but still so many projects rely simply on standard C (and not C++).
Because, much like reallocarray, strlcpy is not part of the C standard library.
The correct solution, assuming you can't just import libbsd or include its functions, is probably to use snprintf. It goes guarantee null-termination of the target. So
strncpy(dst, src, size_dst);
becomes
snprintf(dst, size_dst, "%s", src);
Checking for partial copy is not ideal though, snprintf returns the number of bytes it would have copied had the destination buffer been infinite excluding the final NUL byte. So a copy is only complete if `ret < size_dst`.
If, for whatever reason, you don't want to have anyone else's copyright on your product, or don't want to keep track of which licenses have to be displayed where.
> Just use strncpy and NUL-terminate the buffer. It's not hard.
Just as it's not hard to not dereference null pointers, not double-free allocations and a truckload of other mindless drudgery which history shows humans won't reliably get right all the time. Not to mention checking for copy truncation with strncpy is error-prone, you've got to check whether it points to a `\0` before you manually null-terminate it, providing more chances to get it wrong.
> snprintf is also slower. It has to parse the format string.
How about worrying about that when snprintf actually shows up in profiles and there's no algorithmic way to improve the situation and strncpy is a significant improvement?
I'm not a C programmer, but even I knew that strlcpy() and snprintf() would come up in this discussion. How fucked up is that?
Is there a resource where these issues are answered once and for all, for the novice C programmer? Also, are there deprecation warnings emitted by the compiler for non-safe C stdlib functions?
Not a C programmer either. But I've seen projects banning strcpy. The code was littered with strncpy(a, b, strlen(b)).
So I'm not sure how useful these warnings would be.
On OpenBSD the linker whines about uses of unsafe functions. I don't see it becoming mainstream though, as there are so many people who disagree about what's unsafe.
What do people get out of pedantic rants such as this? Anyone who's written anything in C is quite aware of the issue here.
And then there's this gem:
"...If the source string is 5 characters long, and the target is a 1024-byte buffer, and you set n to the size of the target, strncpy will copy those 5 characters and then fill all 1019 remaining bytes in the target with null characters. Since all it takes to terminate a string is a single null character, this is almost always a waste of time.
Ok, so that's not so bad. CPUs are fast these days, and filling a buffer with zeros is not an expensive operation, right? Unless you're doing it a few billion times, but let's not worry about premature optimization."
Does he think languages with built in String types have some magical optimization juice that make string operations fast? How long does it take to instantiate a new String object, run it's copy constructor, blah blah.
Any C programmer looks to avoid these types of situations.
> What do people get out of pedantic rants such as this? Anyone who's written anything in C is quite aware of the issue here.
Unfortunately, that's not the case.
I presume this link was posted here as a result of my recent comment on the "Don't Learn C the Wrong Way" article, which recommends strncpy() and strlcpy() as safer alternatives to strcpy().
I've see a number of calls to strncpy(). I've rarely seen such calls explicitly null-terminate the destination array. I've even seen things like
strncpy(dest, src, strlen(src));
which is certainly no safer than strcpy().
What I get out of this particular pedantic rant is an opportunity to let C programmers know that strncpy() isn't what they might thing it is (as well as a mention on the front page of Hacker News!).
> Does he think languages with built in String types have some magical optimization juice that make string operations fast? How long does it take to instantiate a new String object, run it's copy constructor, blah blah.
No, I don't think that. I didn't mention it because I wasn't talking about languages with built-in String types; I was talking about C.
There is a very simple, efficient solution: always put a null termination character in the end position of the destination after the strncpy(). If the source is longer than the destination, it will properly terminate the destination. If it isn't longer, it didn't cost (hardly) anything.
Unfortunately, it isn't automatic so some times humans will forget to do it.
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[])
{
char *src = "Gobblygook is a long string";
char dest[5];
strncpy(dest, src, sizeof(dest));
dest[sizeof(dest) - 1] = '\0';
printf("src = %s\ndest = %s\n", src, dest);
}
make test
./test
src = Gobblygook is a long string
dest = Gobb
While I've done this myself, it's not a 100% solution, because now you get a truncated string that further processing may not be expecting. For example, if you validate the un-truncated string somehow then take a copy of it, the copy may be invalid.
> That second paragraph means that if the string pointed to by s2 is shorter than n characters, it doesn't just copy n characters and add a terminating null character, which is what you'd expect.
But what I would expect is that it would copy len(s2) characters, not n.
So yes, it is safer. You would be mad not to use it.