
Strcpycat - jhack
http://byuu.org/articles/programming/strcpycat
======
caf
The information in the strncpy() / strncat() section is incorrect.

Firstly, strncpy() was _not_ designed to be a "safer strcpy()" at all. It was
designed for storing strings in the manner of original UNIX directory entries:
in a fixed size buffer, null-padded only if the stored string is shorter than
the buffer. This is why it fills the buffer with nulls, and why it doesn't
necessarily null-terminate the destination.

Secondly, contrary to what the article implies, strncat() does _not_ work this
way: strncat() _always_ null-terminates the destination, and it doesn't write
any extra nulls.

~~~
byuu
Thanks, I will make the necessary corrections shortly.

The strncat() behavior is quite confusing. I can understand expecting the
buffer to already be null-padded and skipping that step, but always null-
terminating the result seems to go against strncpy()'s expected behavior. It
also looks mildly dangerous, as the NUL-write can be one after the size given.

~~~
caf
It's not confusing when you realise that strncat() is _not_ a concatenating
analogue of strncpy(). It's just a version of strcat() that accepts an upper
bound on the number of characters copied.

------
sambeau
I use this:

    
    
            char*   scatn(char* ss, …);
    
            #define scat(args...) scatn(args, 0)
    
            char* scatn(char* ss, ...)
    	{
    		char *s;
    		char *new;
    		size_t len;
    		size_t n;
    		va_list ap;
    
    		/* first find the lengths of the strings */
    		va_start(ap, ss);
    		s = ss;
    		len = 0;
    		while(s != 0){
    			len = len + strlen(s);
    			s=va_arg(ap, char *);
    		}
    		va_end(ap);
    	
    		/* make a buffer big enough for all strings + '\0" */
    		new = (char *) malloc(sizeof(char) * (len + 1));
    		if(new == NULL){
    			perror("Out of memory");
    			exit(EXIT_FAILURE);
    		}
    	
    		/* copy the strings into the buffer */
    		va_start(ap, ss);
    		s = ss;
    		n = 0;
    		while(s != 0){
    			len = strlen(s);
    			memcpy(new+n,s,len);
    			n = n + len;
    			s=va_arg(ap, char *);
    		}
    		va_end(ap);
    	
    		/* terminate the string */
    		new[n]='\0';
    	
    		/* return the string: remember to free it! */
    		return new;
    	}
    

use like this:

    
    
                  s = scat("hello","hacker","news");  
                  :
                  free(s);                                                                                                                                      
     

I did have a version with only one loop and realloc but it turned out to be
twice as quicker to go through each string twice. I have no idea why :)

~~~
dchest
If len overflows and wraps (i.e. the sum of string lengths is > SIZE_MAX-1),
you'll allocate less memory than is required and memcpy will write outside of
buffer bounds.

~~~
tedunangst
While that may be possible in some segmented systems, it's worth noting that
on just about any modern system having a set of strings bigger than all
available memory is unlikely.

------
jrockway
Yeah, don't use cstrings. If you're using C, use bstrings. If you're using
C++, use std::string. Ignore the extra three bytes of memory overhead and
enjoy the extra safety.

~~~
nas
Yeah, there might be a few cases were null terminated strings work okay but
mostly they are just a recipe for buffer overflows. Seriously, how many years
is this going to take to figure out?

~~~
apaprocki
Well move semantics were only introduced in C++11. So until C++11 is supported
natively everywhere you can't build something like:

    
    
      std::map<std::string, int>
    

without paying for a extra copy of the string on insertion. How many years
since C++ has existed before they finally allowed for a single copy?
Everything is relative. You might never have had a need to worry about that
double copy, but there are other people who have.

~~~
minimax
As long as you're worrying about the double copy, you may as well worry about
chasing the pointers in your map through the cache hierarchy. Usually I'd use
unordered_map here unless there was a good reason not to.

------
wickedchicken
'that is what the libc runtime, which is the foundation that every language
ultimately reaches down to in the end, is built around.'

Except, interestingly enough, Go code compiled with the standard compilers
(5g, 6g, 8g, see
[http://golang.org/doc/go_faq.html#How_is_the_run_time_suppor...](http://golang.org/doc/go_faq.html#How_is_the_run_time_support_implemented)).
The Plan9 expats are hell-bent on doing things their own way, I am 70% in
agreement with them :).

~~~
byuu
Well, Go code will eventually call into OS-level functions. And when your
Linux kernel is built in C, the eventual calls to open(), etc will need null-
terminated strings. So you will have to make a copy, eg std::string::c_str().

~~~
Arelius
I was under the impression that std::string::c_str() did not make an
allocation, but rather had the scope of the parent std::string, it does in
fact return a const char* which can't be freed like normal. Either that, or
nearly all code I've seen which uses it amuses memory leaks.

~~~
byuu
I suppose it could allocate an extra byte anyway, and set that byte to null.
It seems like that could cause issues with memory alignment if you go thinking
every string is 256-bytes, but it's 257-bytes because of that. Not that big of
a deal, of course.

The length value in that case would also be invalid for C usage, because
std::string does allow storing nulls inside the string, but it would eliminate
the need for allocation.

------
apaprocki
Deficiencies in strl aside, most C programmers do not realize the target-
padding that is going on with the strn family. Discussion is always good, but
it will be much more of an uphill battle to introduce new routines than it
would be to get strn users to use strl. I've seen too many people bitten by
declaring something like:

    
    
      char buf[65536]; /* 64k is enough for everything! */
    

and using strncpy w/ sizeof(buf) to populate it.

------
byuu
Oh neat, thanks for posting this.

I should note that I usually write articles to kick off discussions on my own
forum, and then revise them after all input is gathered. They almost never
appear on other sites, especially pre-revision.

I'm not saying the strmcpy/cat functions there now are the best solution, and
I'm open to hearing what's wrong with them so that we can improve upon them.

~~~
dlazar
Only commenting on the coding style: that kind of formatting is just looking
for trouble. Quick, tell me which of the functions below contains a bug?

    
    
      //return = strlen(target)
      unsigned strmcat(char *target, const char *source, unsigned length) {
        const char *origin = target;
        while(*target && length) target++, length--;
        return (target - origin) + strmcpy(target, source, length);
      }
    
      //return = strlen(target)
      unsigned strmcat(char *target, const char *source, unsigned length) {
        const char *origin = target;
        while(*target && length) target++; length--;
        return (target - origin) + strmcpy(target, source, length);
      }
    

How about this?

    
    
      //return = strlen(target)
      unsigned strmcat(char *target, const char *source, unsigned length) {
        const char *origin = target;
        while(*target && length)
          target++, length--;
        return (target - origin) + strmcpy(target, source, length);
      }
    
      //return = strlen(target)
      unsigned strmcat(char *target, const char *source, unsigned length) {
        const char *origin = target;
        while(*target && length)
          target++;
        length--;
        return (target - origin) + strmcpy(target, source, length);
      }
    

Granted, it's a tiny example, but you're going to be writing fewer bugs in the
long run if you avoid being clever. So here are two rules to start with:

1\. Never put more than one statement on one line.

2\. Never use the comma operator outside the "for" syntax.

------
grout
The best system involves keeping end pointers, not lengths. Remaining length
changes. End pointer never changes. At previous job we converted our whole
code base to this system, and it worked great.

~~~
byuu
I figured the actual length value may be useful without needing pointer
arithmetic, but this is a nice solution to the argument taking the size, and
the return being the length (size-1, eg p[return]=NUL location.)

I'll consider this idea and write some comparisons to see how it works,
thanks.

------
pixelbeat
The new proposed functions, sounds very similar to stpcpy() and stpncpy()

