Strncpy() is not a “safer” strcpy()

angry_octet · on June 1, 2015

For someone being so pedantic it is puzzling that he doesn't address the actual title of his post. No one claims that strncpy is 'safe' they just say 'safer', that is, less likely than normal strcpy to have catastrophic bugs. And I think that claim stands up: make sure your n in strncpy is the length of your buffer, do a strncpy, write a null at the end of the buffer. Voila, all those strcpyies from unexpectedly large strings writing in your heap are gone. No need to constantly strlen source strings to check the length -- which in itself can cause a segfault if they are not null terminated.

So yes, it is safer. You would be mad not to use it.

kazinator · on June 1, 2015

Starting in C99, you can do this:

  snprintf(buffer, size, "%s", source);

This can be put into a #define macro or inline function.

If what you want is a null terminated, truncated copy that fits into a given buffer size, and snprintf is available, you are mad to still use strncpy.

Also, strncpy pads the destination with nulls, if there is remaining space. Because of this, the evident purpose of strncpy is to fill fixed-width text fields in "old school" fixed-field database records. Writing the extra nulls is wasteful if the data type in question is nothing but a C string.

angry_octet · on June 1, 2015

I think we can agree that strncpy is not perfect. Perhaps we can rank by levels of sanity?

my_string_cpy() < strcpy < strncpy < strncat < snprintf < strlcat

I dislike the strncat approach because if the prior null termination gets lost then problems ensue, whereas with snprintf that won't happen. The BSD man page mentions that snprintf may not be safe for use in signal handlers on some systems; obviously snprintf is a much bigger piece of code.

Personally I dislike using extensive macros, but I'll leave that for the flame war.

jedbrown · on June 1, 2015

With fixed-width records, writing the extra nulls may be important for data privacy reasons and in any case, it allows comparison using memcpy(3), diff(1), and checksums.

_kst_ · on June 1, 2015

Sure, but then you have to make sure that you always maintain the extra nulls.

diff(1) doesn't seem relevant; it operates on text files, which normally don't contain any null characters.

jedbrown · on June 1, 2015

diff -q is often used in scripts to determine whether two files are identical.

_kst_ · on June 2, 2015

I would think that "cmp" or "cmp -s" would be better, particularly if the files might be binary.

maxlybbert · on June 1, 2015

The article also suggests using strncat for this: dest[0] = '\0'; strncat(dest, src, size);.

brucedawson · on June 1, 2015

People keep on misusing strncpy which is why people keep blogging about it. I wrote about it two years ago after running across one too many abuses:

https://randomascii.wordpress.com/2013/04/03/stop-using-strn...

I tried to give alternatives. Calling strncpy and then manually null-terminating is not a good alternative. Creating a wrapper that does this is better. That wrapper could optionally abort the program on string truncation.

And, probably more importantly, a template wrapper can infer the destination size, thus getting rid of the error-prone specification of the destination size (I have seen it specified incorrectly many times).

You'd be mad to use strncpy when better alternatives are so easy to find or create.

pedrow · on June 1, 2015

It's worth noting that, quite often when this subject comes up, people suggest a straight replacement with strncpy, strlcpy, snprintf, or something similiar. This is safer because it doesn't overwrite memory or cause a segfault, BUT, that's not always enough. If the code just carries on with the truncated string that's data loss and it might cause trouble later. For example imagine a system designed for 8.3 filenames - if you pass a long filename and it gets truncated it may then refer to a completely different file. The man page for strlcpy (IIRC) reminds you of this.

_kst_ · on June 1, 2015

(I'm the author of the linked blog post.)

strcpy() can be used safely if you ensure that the destination array is big enough to contain the source string. (Yes, that can be non-trivial.)

strncpy() by itself doesn't write past the end of the destination array (assuming you call it correctly) -- but it can easily leave the destination without a terminating null character. This causes undefined behavior if you then pass the destination array to another string function. strncpy() is not dangerous by itself, but it can leave a booby-trap that causes other string functions to be dangerous.

This:

    strncpy(dest, source, size);

can be replaced by this:

    dest[0] = '\0';
    strncat(dest, source, size);

which will leave the destination properly null-terminated.

Note that this can still quietly truncate the string. Ignoring that possibility is rarely the right thing to do.

unwind · on June 1, 2015

Replacing one function call with a function call to a function that does almost the right thing in many cases, but potentially wastes time by pointlessly zero-filling a buffer, and an extra assigment statement is not an improvement. You would be mad to do that.

useerup · on June 1, 2015

Microsoft has a number of dangerous c functions that are on the Secure Development Lifecycle "banned" list. And yes, strncpy is one of them.

Some insights here: https://msdn.microsoft.com/en-us/library/bb288454.aspx

strncpy example vulnerability experience linked to from that article "Buffer Overflow in Apache 1.3.xx fixed on Bugtraq - the evils of strncpy and strncat!": http://blogs.msdn.com/b/michael_howard/archive/2004/10/29/24...

Source code for MS products are automatically (since SDL) screened for use of the banned functions. You'll have to use the "safer" alternatives.

JulianMorrison · on June 1, 2015

C people, isn't it perhaps time to quietly pasture the existing standard library and create a new one, from scratch, that isn't chock full of booby traps in the small print?

kazinator · on June 1, 2015

Every significantly large C program in C's history quietly farms its own library.

101914 · on June 1, 2015

Opinion: Some not so quietly; in any case, most of them are not worth reusing. But there are exceptions, e.g., stralloc.h

Opinion: Personally, because I strive to learn more than to be "productive", I prefer languages that force the author to write her own libraries. As I see it, the alternative is languages that discourage this, effectively coercing the author to use other people's libraries, the quality over which she has no control.

Opinion: For me, the bad part is that there are so many poorly thought out C functions in the wild (even including the "standard" ones); the good part is that the C language encourages authors to write their own libraries.

Given the choice, I still prefer assembly to C. I guess this is because I prefer to write small programs; perhaps I am not smart enough to write large ones.

chinpokomon · on June 2, 2015

Are you not making an assumption about your own code then? The strength of open software is that you don't have to rely on one person to have solved everything. I consider myself to be a good developer, better than most for some areas, but I know there are others who have written better code for other areas.

I prefer not to reinvent the wheel when I know my implementation won't be any better, or if I were able to make a better wheel, that will be time spent that I wasn't reinventing the carriage.

pjmlp · on June 1, 2015

The real problem, how C sees strings, arrays and pointers would still persist.

Implicit conversion between vectors and pointers should have never happened. I don't get why people find so hard to write &vector[0] for those cases where a pointer is needed.

Also if other languages, for other OS, older than C could do bound checking elision, C not doing it was only saving work for the compiler developer.

kazinator · on June 1, 2015

If &array[0] were needed, it wouldn't solve anything. The semantics that you can take the pointer to the first element, and pass it around and treat it as an array, with no meta-data connecting it to the original properties of the array, would still be there.

The requirement for a cast would be better, because that's the existing notation for explicitly requesting conversions that are identified as unsafe. I.e.

   char array[42];
   char *p = array; /* error, implicit conversion */
   char *q = (char *) array; /* OK */
   char *s = &array[0]; /* Sneaky: OK, without a cast */

pjmlp · on June 1, 2015

Sure, but I was specially thinking about the issue of passing vectors into functions.

In any way, for me the right way of doing systems programming is the way of Mesa, Modula-2 and many others.

C was already bad in the 70's.

kazinator · on June 1, 2015

I programmed quite a bit in Modula-2 before C.

It didn't teach me enough to recognize the silliness in C.

I believe that this is an aspect of C popularity that hasn't received a lot of recognition. Programmers were sucked away from cleaner languages into C, because the users of the cleaner languages had neglected to acquire the body of rationalizations for why those languages are that way which could have helped them stick.

C looked good in naive ways. "Hey, look at all the things you can easily do with memory and pointers! Wow, look at that slick syntax: you can stuff several increments and an assignment into an expression, and use that as the test of a loop, ... man I'm never writing foo := foo + 4; again!"

Modula 2's memory management is basically malloc/free though. While the language could save you from array overruns, it is basically defenseless against leaks and use-after-free bugs, as well as uses of unchecked null pointers under OOM:

See:

http://www.modula2.org/reference/isomodules/isomodule.php?fi...

Look, the allocator works with generic pointers, has an explicit DEALLOCATE, and ALLOCATE returns NIL when there is no memory!

Modula 2' standard also has a SYSTEM module which adds C-like things: pointer arithmetic, casting and whatnot.

http://www.modula2.org/reference/isomodules/isomodule.php?fi...

Is that just "C envy", or is there perhaps need for those things, know what I mean?

pjmlp · on June 1, 2015

I came to C from Turbo Pascal (3, 5.5 and 6.0), so it already looked primitive to be back then.

Fortunately on the same year I got to learn C++, which allowed me to use most of the features I already knew from Turbo Pascal, alongside better safety if cared about it.

Eventually I only used pure C when required to do so at the university and on my first job after the university. Everywhere else the choice between both languages was always C++.

Regarding C, when coding in it, I always adopted a Turbo Pascal style by using translation units as if they were modules. With data structures designed as ADTs, without any direct access to its internals, defensively programming and eventually with a little bit of design by contract.

> Modula 2's memory management is basically malloc/free though. While the language could save you from array overruns, it is basically defenseless against leaks and use-after-free bugs, as well as uses of unchecked null pointers under OOM:

Yes, but there is already a bit list of errors that Modula-2 and similar languages saves one from:

- Buffer overruns (unless disabled, of course)

- Out parameters being null

- Implicit conversions of enumerations into integers

- Implicit conversions between numeric types

- Strings without terminating null

> Is that just "C envy", or is there perhaps need for those things, know what I mean?

Of course those features are needed in systems programming, but there is a big difference being safe by default and making explicit use of unsafe code, or just being unsafe everywhere.

For example, in Modula-2 if a module imports SYSTEM you already know that module is doing something fishy. In Oberon and Modula-3, like .NET, one could forbid unsafe modules from being loaded.

Whereas in C, even code that looks harmless can be doing strange things to the memory state.

belorn · on June 1, 2015

There is one, and it exist inside the gnu libc.

strndup performs identical to strncpy except that it always terminates the destination string. (https://www.gnu.org/software/libc/manual/html_node/Copying-a...)

JulianMorrison · on June 1, 2015

You remember the Holy Grail scene from Indiana Jones? The problem is not the absence of the grail, somewhere on the shelf. The problem is all the others and the fact that if you pick the wrong one, your program desiccates, crumbles to dust, and is blown away by a conveniently dramatic wind.

A grail shelf that wasn't intended as a booby trap would contain one cup, the right one.

pjc50 · on June 1, 2015

But this is the great advantage of C! It's one Holy Grail for every platform! Choice is wonderful!

/sarcasm

jpollock · on June 1, 2015

strndup is not a replacement for strncpy, it allocates memory, whereas strncpy takes an existing buffer. In-place memory usage is very important for preparing anything of a fixed size (structs) to be written out to disk or the network.

sirwolfgang · on June 1, 2015

I mean, that pretty much is the entire premise of C++. Which notably does have a full wrapped string class.

humanrebar · on June 1, 2015

To be fair, maintaining compatibility with existing software (including libc) is the reason, either directly or indirectly, for many (most?) of the gotchas in C++.

Then again "your C code is valid C++ code!" was perhaps the biggest selling point of C++ when it came out.

anton_gogolev · on June 1, 2015

Or, like, dozens and dozens of them, all differing in almost any aspect imaginable.

And no, std::string is decidedly not a proper "string class".

nly · on June 1, 2015

std::string is a complete replacement for C strings. That is, fixed character width ASCII-or-equivalent strings. That's what it was designed for, and that's what it is.

People often complain about the lack of a real "string class" in the standard C++ library. Well, guess what? No language today really has primitives that handle unicode absolutely perfectly, and there's still intense debate about what they should be doing with respect to encodings, length() functions etc.

Maybe the answer is that Unicode itself should be simplified so that every word in language has one representation in any particular encoding.

anton_gogolev · on June 1, 2015

A C string could be just as well replaced with a std::vector<char> (or a wide variant thereof) without any significant loss of functionality.

std::string is an embarassement. It's both bloated (why, we need to have both iterator-based and index-based access interfaces!) and lacks basic string manipulation functionality, it's encoding-unaware.

nly · on June 1, 2015

- std::string is old. It predates the rest of the STL, and had an index interface prior to the iterator based one. The old API was left in for compatibility and imposes no runtime overhead

- It has all the string manipulation capability of the C standard library

- Being encoding unaware was probably a blessing given how unicode has evolved since std::string was introduced (20-30 years ago?)

twoodfin · on June 1, 2015

It's a replacement for heap-allocated C strings.

If you want to use std::string operations on an array of characters that you don't own or that are part of an indivisible larger structure, you're out of luck.

It's not difficult to craft a slice-like replacement, but range types will be most welcome when they finally hit the standard.

vinkelhake · on June 1, 2015

FWIW, string_view is on its way. Both libc++ and libstdc++ ship with implementations of it (in the std::experimental namespace).

http://en.cppreference.com/w/cpp/experimental/basic_string_v...

optimiz3 · on June 1, 2015

What if you want to reference a string using only sizeof(void*) bytes?

tjr · on June 1, 2015

C was the first programming language I learned well. I've co-written two books on C programming. I've built software ranging from text adventure games to flight management networking software in C.

I personally now avoid C as much as possible.

pjc50 · on June 1, 2015

Backwards-incompatible change breaks code. And it may do so silently. And the C standardisers don't really have the political clout to do this. Trying to force people to "new C" would be a lot of work and would probably result in a culture split like the backwards-incompatible Perl and Python splits. OpenBSD have managed to move people to strlcpy(), but that's about it.

People are just as likely to migrate to another language (Rust? Go? Even C++) as upgrade the old C code.

PythonicAlpha · on June 1, 2015

But it would be possible to create new functions with new names that supersede the old ones, so that the old stdc-lib will be discontinued silently.

pjc50 · on June 1, 2015

? Either you discontinue the old functions "noisily", by breaking code that uses them, or they continue to be used.

JulianMorrison · on June 2, 2015

You add them along side, and you define them as the "modern" standard, and you discourage, but do not forbid, mixing them up. Ideally, there would be no temptation to mix them up in new code, because the new ones would just b better. Including them into old established code (that can't realistically be re-written) is harmless.

PythonicAlpha · on June 1, 2015

When you have good replacements, at least serious developers will use them. Also there are tools to check for compliance.

I don't think, that you can discontinue strcpy and co.

PythonicAlpha · on June 1, 2015

At least, there are some corners that would need to be addressed. A new, better standard library would provide safer string handling and omit the traps that where left out for years to preserve compatibility.

The C++ standard seems to get much more attention than C. C seems to be the step-daughter of programming languages, but still so many projects rely simply on standard C (and not C++).

lmm · on June 1, 2015

Most people who would be in a position to switch to such a library have already migrated to a safer language than C.

lawl · on June 1, 2015

tl;dr: strncpy() doesn't null terminate, but doesn't mention strlcpy()

http://www.openbsd.org/cgi-bin/man.cgi?query=strlcpy

masklinn · on June 1, 2015

Because, much like reallocarray, strlcpy is not part of the C standard library.

The correct solution, assuming you can't just import libbsd or include its functions, is probably to use snprintf. It goes guarantee null-termination of the target. So

    strncpy(dst, src, size_dst);

becomes

    snprintf(dst, size_dst, "%s", src);

Checking for partial copy is not ideal though, snprintf returns the number of bytes it would have copied had the destination buffer been infinite excluding the final NUL byte. So a copy is only complete if `ret < size_dst`.

snprintf is not available in C89 though...

rkangel · on June 1, 2015

Or just to add an implementation of strlcpy in a util module in your codebase. e.g.: http://mirror.fsf.org/pmon2000/3.x/src/sdk/libc/string/strlc...

masklinn · on June 1, 2015

Hence

> assuming you can't just import libbsd or include its functions

lmm · on June 1, 2015

Under what circumstances would you ever be unable to copy such a small and liberally-licensed file into your source?

nitrogen · on June 1, 2015

If, for whatever reason, you don't want to have anyone else's copyright on your product, or don't want to keep track of which licenses have to be displayed where.

jjnoakes · on June 1, 2015

snprintf is also slower. It has to parse the format string.

Just use strncpy and NUL-terminate the buffer. It's not hard. Write a two line function to do it for you, and use that function in place of strncpy.

masklinn · on June 1, 2015

> Just use strncpy and NUL-terminate the buffer. It's not hard.

Just as it's not hard to not dereference null pointers, not double-free allocations and a truckload of other mindless drudgery which history shows humans won't reliably get right all the time. Not to mention checking for copy truncation with strncpy is error-prone, you've got to check whether it points to a `\0` before you manually null-terminate it, providing more chances to get it wrong.

> snprintf is also slower. It has to parse the format string.

How about worrying about that when snprintf actually shows up in profiles and there's no algorithmic way to improve the situation and strncpy is a significant improvement?

jjnoakes · on June 1, 2015

But if you read my actual comment, I suggest not using strncpy everywhere.

Use it once. In a 2-line wrapper. And use the wrapper everywhere.

No human reliability to worry about.

_kst_ · on June 1, 2015

    dest[0] = '\0';
    strncat(dest, src, size);

jjnoakes · on June 1, 2015

Sure. Or:

  strncpy(dest, src, size);
  dest[size - 1] = '\0';

nitrogen · on June 2, 2015

The fact that strncpy zeros the buffer may make it slower.

jjnoakes · on June 2, 2015

Sure, by a couple of cycles. Unless your size is huge. In which case, perhaps you should try to avoid copying strings in the first place.

MrBuddyCasino · on June 1, 2015

I'm not a C programmer, but even I knew that strlcpy() and snprintf() would come up in this discussion. How fucked up is that?

Is there a resource where these issues are answered once and for all, for the novice C programmer? Also, are there deprecation warnings emitted by the compiler for non-safe C stdlib functions?

lawl · on June 1, 2015

Not a C programmer either. But I've seen projects banning strcpy. The code was littered with strncpy(a, b, strlen(b)). So I'm not sure how useful these warnings would be.

clarry · on June 1, 2015

On OpenBSD the linker whines about uses of unsafe functions. I don't see it becoming mainstream though, as there are so many people who disagree about what's unsafe.

JustSomeNobody · on June 1, 2015

What do people get out of pedantic rants such as this? Anyone who's written anything in C is quite aware of the issue here.

And then there's this gem:

"...If the source string is 5 characters long, and the target is a 1024-byte buffer, and you set n to the size of the target, strncpy will copy those 5 characters and then fill all 1019 remaining bytes in the target with null characters. Since all it takes to terminate a string is a single null character, this is almost always a waste of time.

Ok, so that's not so bad. CPUs are fast these days, and filling a buffer with zeros is not an expensive operation, right? Unless you're doing it a few billion times, but let's not worry about premature optimization."

Does he think languages with built in String types have some magical optimization juice that make string operations fast? How long does it take to instantiate a new String object, run it's copy constructor, blah blah.

Any C programmer looks to avoid these types of situations.

_kst_ · on June 1, 2015

> What do people get out of pedantic rants such as this? Anyone who's written anything in C is quite aware of the issue here.

Unfortunately, that's not the case.

I presume this link was posted here as a result of my recent comment on the "Don't Learn C the Wrong Way" article, which recommends strncpy() and strlcpy() as safer alternatives to strcpy().

I've see a number of calls to strncpy(). I've rarely seen such calls explicitly null-terminate the destination array. I've even seen things like

    strncpy(dest, src, strlen(src));

which is certainly no safer than strcpy().

What I get out of this particular pedantic rant is an opportunity to let C programmers know that strncpy() isn't what they might thing it is (as well as a mention on the front page of Hacker News!).

_kst_ · on June 1, 2015

> Does he think languages with built in String types have some magical optimization juice that make string operations fast? How long does it take to instantiate a new String object, run it's copy constructor, blah blah.

No, I don't think that. I didn't mention it because I wasn't talking about languages with built-in String types; I was talking about C.

gvb · on June 1, 2015

There is a very simple, efficient solution: always put a null termination character in the end position of the destination after the strncpy(). If the source is longer than the destination, it will properly terminate the destination. If it isn't longer, it didn't cost (hardly) anything.

Unfortunately, it isn't automatic so some times humans will forget to do it.

    #include <stdio.h>
    #include <string.h>

    int main(int argc, char *argv[])
    {
        char *src = "Gobblygook is a long string";
        char dest[5];

        strncpy(dest, src, sizeof(dest));
        dest[sizeof(dest) - 1] = '\0';

        printf("src  = %s\ndest = %s\n", src, dest);
    }

    make test
   ./test
    src  = Gobblygook is a long string
    dest = Gobb

pjc50 · on June 1, 2015

While I've done this myself, it's not a 100% solution, because now you get a truncated string that further processing may not be expecting. For example, if you validate the un-truncated string somehow then take a copy of it, the copy may be invalid.

cremno · on June 1, 2015

A similar article: https://randomascii.wordpress.com/2013/04/03/stop-using-strn...

This article is also on GH (including some changes): https://github.com/Keith-S-Thompson/the-flat-trantor-society...

stavros · on June 1, 2015

> That second paragraph means that if the string pointed to by s2 is shorter than n characters, it doesn't just copy n characters and add a terminating null character, which is what you'd expect.

But what I would expect is that it would copy len(s2) characters, not n.

patal · on June 1, 2015

That's correct. I think the author meant that.

earlz · on June 1, 2015

It's a shame that strncpy doesn't have the semantics of OpenBSD's strlcpy, which I think behaves as everyone would expect.

http://www.openbsd.org/cgi-bin/man.cgi?query=strlcpy

bitwize · on June 1, 2015

You should be using strcpy_s...

tobiasu · on June 1, 2015

This beard has its own beards.