
On the matter of strlcpy/strlcat acceptance by industry - jcr
http://marc.info/?l=openbsd-tech&m=138733933417096&w=2
======
anon1385
For a bit of historical context, this is Drepper's 'rationale' for not
accepting a patch for strlcpy in glibc: [http://www.sourceware.org/ml/libc-
alpha/2000-08/msg00053.htm...](http://www.sourceware.org/ml/libc-
alpha/2000-08/msg00053.html)

~~~
sitharus
Ah, there's some classic Drepper. It's so easy when you get to vet every line
of source and never have to worry about inexperienced or careless programmers.

Personally I much prefer APIs that will fail safe until you know you need the
performance of a less safe one, and can spend the time auditing the code.

~~~
saurik
These APIs do _not_ "fail safe"; as Kaz Kylheku points out in his reply to
Ulrich on that list:

> These words make sense. The problem with strlcat and strlcpy is that they
> assume that it's okay to arbitrarily discard data for the sake of preventing
> a buffer overflow. The buffer overflow may be prevented, but because data
> may have been discarded, the program is still incorrect. This is roughly
> analogous to clamping floating point overflow to DBL_MAX and merrily
> continuing in the calculation. ;)

This is still a bug in your program, and it can still specifically be a
security risk; if you want to cap a buffer, you need to fail the operation,
not just silently discard some of the data.

Further down-thread, here is Ulrich's (I will argue 100% correct) rant:

> Dammit, it is not safe. It hides bugs in programs. If a string is too long
> for an allocated memory block the copying must not simply silently stop.
> Instead the program must reallocate or signal an error. I can construct you
> cases where the use of these stupid functions is creating new security
> problem.

I realize it is fun (and sadly part of the zeitgeist at this point) to rag on
Ulrich, but a lot of the things he's said over the years have been quite
insightful, and you should at least read the full conversation before judging
him.

~~~
anon1385
You think that

    
    
      *((char *) mempcpy (dst, src, n)) = '\0';
    

is a good solution then?

Also, I don't know where this 'srlcpy silently fails' meme comes from. It's
your responsibility to check the return value. It only 'silently fails' if you
don't check. You can argue about whether it is a good idea or not to make
client code responsible for that kind of thing but it is the C philosophy. If
you think string copy functions should call abort on failure then, like Theo
says in the OP, implement that and submit patches to all those projects using
strcpy.

~~~
saurik
The reality is that both strcpy and strlcpy require the person using the API
to use it correctly in order to not end up with a buggy program; but, in the
real world, people hype strlcpy as being fundamentally safer than strcpy, and
so if you actually look into how people use strlcpy no one ever checks its
return value (even projects like OpenSSH).

For more on this (and a specific ludicrous example from OpenSSH where someone
spent the time to write a comment above the strlcpy about truncation and yet
still didn't actually bother to check the return value of the function),
please see this response I made to someone else:

[https://news.ycombinator.com/item?id=6940724](https://news.ycombinator.com/item?id=6940724)

------
byuu
strl(cpy|cat)() is quite dangerous to just use blindly as a full replacement
for other (mem|str)*(cpy|cat) functions. strlcpy's return value is the length
of the source string. So say you have a big 20MB XML document, and as you are
scanning along, you use strlcpy() to store a node name or value into another
string. Even though you only ask it to copy five bytes, it goes ahead and runs
over the rest of the 20MB document simply for a return value you don't (in
this case) care about. If you keep doing this over the entire document, your
parser will end up with quadratic overhead.

strlcat() has the same overhead issue, and its return value is downright
cryptic: it is strlen(source) + min(length, strlen(target)).

When it tries to compute strlen(), ignoring your requested length parameter,
this means it will access out-of-bounds memory if that string is not properly
null-terminated.

And if you're absolutely paranoid about security, these functions not having
both source size and target size parameters gives you a second way to go out-
of-bounds.

Overall, they're okay functions, but you can easily do a lot better. And since
these aren't universally available, if you're going to declare your own
cpy/cat functions, you may as well use a better implementation than strl.

~~~
clarry
> strl(cpy|cat)() is quite dangerous to just use blindly

Very true; people have caused serious issues by blindly doing things
(supposedly making things secure or else), without bothering to understand
what they're doing. However, for the specific (and reasonably common) scenario
of dealing with proper C strings, the strl* functions are very simple and easy
to use correctly, and their use is easy to audit.

> So say you have a big 20MB XML document, and as you are scanning along, you
> use strlcpy() to store a node name or value into another string. [..] If you
> keep doing this over the entire document, your parser will end up with
> quadratic overhead.

Right. It is going to be ridiculously slow (and that should be obvious). But
this isn't the common scenario that strl* family of functions were designed
for (and which a lot of people get wrong with standard C string functions).
But chances are in code like this C strings' shortcomings would be problematic
anyway.

> strlcat() [..] its return value is downright cryptic: it is strlen(source) +
> min(length, strlen(target)).

Sorry but that is just wrong.

> When it tries to compute strlen(), ignoring your requested length parameter,
> this means it will access out-of-bounds memory if that string is not
> properly null-terminated.

It's not "requested length", it's "this many bytes fit in my buffer". These
functions are not supposed to validate your input any more than are they
supposed to check the validity of your pointers, intent, etc. If you're
passing a pointer to something that is not a string (if you don't have a NUL
terminator, your array by definition does not contain a string), you have a
problem that you shouldn't hope _string_ handling functions to magically cope
with.

> if you're absolutely paranoid about security, these functions not having
> both source size and target size parameters gives you a second way to go
> out-of-bounds.

Target size ought to be enough for simple code dealing with actual _strings_.
But if what you're dealing with isn't a string (and you need extra
reassurances because you're not sure what it is), then how are you going to
compute the source size? But if you can just compute the source size, then you
can make sure what you have is a real string.

~~~
byuu
> But this isn't the common scenario that strl* family of functions were
> designed for

I will give you that. They definitely weren't designed for this scenario. But
in practice, people tend to ignore the return value (which is using the
functions incorrectly), and if you were used to strncpy (which has its own
performance problems), you might wrongly assume that you could use these to
pull substrings out as well.

The regrettable part is that it's easy to combine overflow detection with
substring copy functionality at the same time.

> Sorry but that is just wrong.

Sorry, but it is not wrong.

See the actual implementation source: [http://www.openbsd.org/cgi-
bin/cvsweb/src/lib/libc/string/st...](http://www.openbsd.org/cgi-
bin/cvsweb/src/lib/libc/string/strlcat.c?rev=1.13)

"Returns strlen(src) + MIN(siz, strlen(initial dst))."

Even the official man page says the return value is confusing: "While this may
seem somewhat confusing, it was done to make truncation detection simple."

I mean no disrespect, but you're proving my point. Even most proponents of
these functions do not understand all of the nuances in how they work. That's
a real problem, as if you don't understand strlcat's return value, you can't
use it safely. The functionality is too specialized to be general purpose
functions.

> It's not "requested length", it's "this many bytes fit in my buffer"

Yes. Sorry if my wording was poor, I'm not really good at explaining things.

> These functions are not supposed to validate your input any more than are
> they supposed to check the validity of your pointers, intent, etc.

Exactly. They're useful functions for certain purposes. My only point is that
they are by no means perfect. There are many use cases where they are not
appropriate. Unfortunately people tend to either praise this family as the
solution to C memory safety problems, or they dismiss them entirely.

Absolutely use them when they're appropriate, but don't simply use them
everywhere and think you're making your program safer. There's performance
implications, out-of-bounds memory violations, security issues raised by
truncating values without catching it, etc. These are all the programmer's
fault for misusing the functions, of course, but safer and/or more versatile
functions can be made.

~~~
clarry
> But in practice, people tend to ignore the return value (which is using the
> functions incorrectly)

That might be true. It would be very interesting to see some statistics.
However, with these functions, it is _very_ easy to check whether the
truncation detection is being done or not. Grep is mostly enough, but it can
even be automated for most cases.

Also, that is hardly an argument against these functions: any string utility
function will inevitably have to deal with errors somehow (or worse, ignore
them and continue!).

> if you were used to strncpy (which has its own performance problems), you
> might wrongly assume that you could use these to pull substrings out as
> well.

Yes, and sadly I've seen something like that (and worse) in the wild. These
are easy to spot though.

> Sorry, but it is not wrong. > > See the actual implementation source:
> [http://www.openbsd.org/cgi-
> bin/cvsweb/src/lib/libc/string/st...](http://www.openbsd.org/cgi-
> bin/cvsweb/src/lib/libc/string/st..). > > "Returns strlen(src) + MIN(siz,
> strlen(initial dst))."

Interesting. You're right about the code and its return value. However, the
current manual documents it a little bit differently; the weird return value
for the case when the initial destination string is _not_ a string is never
mentioned (but the fact that NUL-termination won't occur in this case is
there). So in practice you can forget the MIN() and say it is strlen(dst) +
strlen(src). From my perspective it makes all the sense, because, going back
to what I've argued before, a string handling function should at least be
allowed to assume you're giving it strings to work with; it's not there to
validate your input or check pointers, etc.

> Even the official man page says the return value is confusing: "While this
> may seem somewhat confusing, it was done to make truncation detection
> simple."

You quoted a line that is not describing the corner-case return value
mentioned above. It is describing the fact that the return value is (assuming
actual strings are processed!) the total length of the string that would
result from the operation if the destination buffer was large enough to hold
it without truncation. There is nothing confusing about that. Perhaps the
author initially thought it would be confusing for people who expect it to
resemble functions such as read() that return the number of characters
_actually_ given to you. But there is nothing to be confused for, and the
official man page has been simplified to reflect that:

    
    
      commit: http://marc.info/?l=openbsd-cvs&m=133338810915771&w=2
      current version: http://mdoc.su/o/strlcat
    

> I mean no disrespect, but you're proving my point. Even most proponents of
> these functions do not understand all of the nuances in how they work.
> That's a real problem, as if you don't understand strlcat's return value,
> you can't use it safely.

The correct way to detect truncation with strlcpy and strlcat has always been
exactly the same, and documented:

    
    
      if (strlcat(dst, src, dstsize) >= dstsize) { /* truncated */ }
    

The fact that I wasn't aware of a corner-case scenario where dst does not
contain a string to begin with has in no way impeded my ability to type the
above line correctly. The branch will still be taken if the string would be
too large for the buffer, but your code is broken and strlcat cannot fix it
for you.

> The functionality is too specialized to be general purpose functions

Yes, they perform a very specific function; they are not _Perl_. Yet these are
"general purpose" enough that the less secure variants (and not only these as
of C11) were introduced in the C standard, and are used everywhere. These
functions are convenient, available everywhere, and recognized by any C
programmer, and that is why they are used. The OpenBSD functions came to be
because of the widespread (mis)use of the C functions, to make it easier to do
it right.

> Exactly. They're useful functions for certain purposes. My only point is
> that they are by no means perfect. There are many use cases where they are
> not appropriate. Unfortunately people tend to either praise this family as
> the solution to C memory safety problems, or they dismiss them entirely.

Good to know we are in agreement. :-) Perhaps I interpreted you too strongly
when you said one may as well use another implementation; to me that sounded
like a suggestion to dismiss these entirely.

> These are all the programmer's fault for misusing the functions, of course,
> but safer and/or more versatile functions can be made.

I would be interested in a concrete example. More versatile? Yes, that is
easy. Safer? Well I'm not so sure about that. Perhaps by trading some
correctness. Perhaps in some non-critical user-land application where
unbounded allocations are OK and aborting on memory exhaustion is "safe".
Simpler and safer? Almost definitely not, but please show me I'm wrong :-)

~~~
byuu
> The fact that I wasn't aware of a corner-case scenario where dst does not
> contain a string to begin with has in no way impeded my ability to type the
> above line correctly.

True. And again I was hesitant to say that, so I hope you didn't take any
offense. These functions surprised me as well at first (I was personally
bitten by that strlcpy() running full strlen(source) issue in my own XML
parser.)

> I would be interested in a concrete example.

Well I don't really have a super-secure version. Honestly 99% of string stuff
I do is using a C++ library that keeps track of the allocation size (and also
has small-string optimization.)

What I came up with in place of strlcpy/cat, which worked better for my own
purposes, is below, which I call strmcpy/cat:

    
    
      //return = strlen(target)
      unsigned strmcpy(char *target, const char *source, unsigned length) {
        const char *origin = target;
        if(length) { while(*source && --length) *target++ = *source++; *target = 0; }
        return target - origin;
      }
    
      //return = strlen(target)
      unsigned strmcat(char *target, const char *source, unsigned length) {
        const char *origin = target;
        while(*target && length) target++, length--;
        return (target - origin) + strmcpy(target, source, length);
      }
    

Truncation is detected by seeing if source[strmcpy/cat(...)] != 0, and very
easy to wrap with another function. And in general, I find strcat functions
kind of unpleasant (eg Schlemel the painter), so for that a strpcpy() that
writes to a pointer will let you chain those, eg:

    
    
      void strpcpy(char *&target, const char *source, unsigned &length) {
        unsigned offset = strmcpy(target, source, length);
        target += offset, length -= offset;
      }
    

Now I am sure you can point out a myriad of serious problems with these
functions, just as I have for strlcpy/cat.

Hell they're probably much worse than strl, but they work for me at least. If
nothing else, they are at least easier to read =)

Full info on these functions is at the bottom of this dead article:
[http://web.archive.org/web/20130121234920/http://byuu.org/ar...](http://web.archive.org/web/20130121234920/http://byuu.org/articles/programming/strcpycat)

------
bitwize
Strlcpy and strlcat _are_ stupid.

Strcpy_s and strcat_s for the win :) They have much better error semantics, do
the right thing upon failure (set dst[0] to '\0' if dst is not NULL and return
an error code), and if you are working with C++ in Visual Studio, come with
template specializations that can infer the length of the destination string.
Which is really handy when the compiler substitutes strcpy and strcat calls
with calls to strcpy_s and strcat_s with the appropriate implied length :)

~~~
BudVVeezer
I am entirely uncertain of why this was downmodded. The _s functions are part
of the C standard (Annex K), and truly are better replacements for their
counterparts. It's unfortunate that the glibc maintainers have shown so little
interest in implementing this (optional, to be fair) part of the standard.

~~~
tobiasu
Because the argument why they are "stupid" is missing entirely. Much of the
second paragraph was edited in later.

------
jheriko
something i find more important than how much people use strlcpy, snprintf etc
in terms of safety and easy of use of strings in C is the lack of good
implementation of library functions that give you the string length upfront so
you can safely make an allocation of the right size to use such a function
with.... coupling that with the fantastic level of brokeness of wchar_t*/wcs
functions on the Apple platforms and the poor consistency of the Android NDK
(%S, %ls... can't remember) and Microsoft's general lack of respect for the
standard and desire to encourage their own brand of 'safe' functions and
deprecation I find working with strings in a useful way in C to be an enormous
pain when it really shouldn't be.

This should be embarassing when 'script kiddie' languages like Python, Ruby,
JavaScript and co. do a fantastically better job. It has nothing to do with
the level of abstraction and everything to do with poor implementation of the
standard library on /every platform/.

And I am supposed to trust 3rd party libraries when these giants make mistakes
with both trivial, yet highly important functionality? It makes me cynical
that they want to force me to use their NSStrings, Java string or
Platform::String^ crap, which I really don't like to trust because they are
bad in their own ways too... how can I trust these people with my strings?

C and C++ are ridiculously cross platform and utilitarian, but this area is
definitely one of the weakest imo :/

Still, I don't advocate rolling your own, but making things work safely and
nicely across platforms is a pain in the arse

~~~
tedunangst
What makes python strings fantastic and java strings crap?

~~~
jheriko
i think java strings are pretty fantastic too... its the C string library
function implementations which are crap.

i don't like introducing any dependency when I shouldn't have to - using Java
strings on Android is not something I've yet had to do but I suspect I will if
I want good working functionality because the wcs* functions in the C stdlib
in the NDK are crap.

there are a number of performance and aesthetic reasons for not wanting to use
java strings as well but they are largely irrelevant imo...

------
liw
The non-standard strlcpy can easily be replaced with the standard snprintf.
Both can, and almost certainly should, be replaced by a library that takes
care of memory allocation for strings. Even a simple one will obliterate all
of the problems that the usual style of C string handling has.

([http://blog.liw.fi/posts/strncpy/](http://blog.liw.fi/posts/strncpy/) is my
rant about this. Others have made the point better, of course.)

~~~
ef47d35620c1
I agree. C++ solved this problem many years ago. The solution is standard and
safe to use everywhere on all platforms.

    
    
        std::string

~~~
nly
Exactly, and it's actually MORE efficient than raw C string handling. Even in
C the solution is to give up and use a string handling library that will do a
reallocation on append, not risk truncation.

------
amadvance
Using strlcpy is like suppressing the "Division By Zero" exception and have
divisions to return a fake value just to let the program run.

In some case this could be even useful, but it's for legacy and sloppy
programs that no one bother to really fix.

For anything that requires some minimal level of quality the best option is to
make the program to intentionally crash if a buffer overflow condition is
detect. Exactly like a "Division By Zero".

~~~
clarry
For anything that requires some minimal level of robustness, the best option
is not to make the program crash when a string is too long to fit in a buffer.
This is not a buffer overflow condition. This is not at all like Division By
Zero.

EDIT: In your world the kernel could panic because someone tried to open() too
long a filename. Or should the application trying to open the file crash
instead? And an IRC server would crash because my client sent a message with
513 characters.

I would rather detect these conditions and do the right thing.

------
peterwwillis
Sounds a lot like Theo is making the assumption that most programmers care
about security, when in fact, they're probably primarily interested in
creating code that works, and making it secure later.

It doesn't matter what function you use to copy strings. Hell, you can use
strcpy in a way that avoids buffer overflows! If your application is poorly
designed, it will be insecure. In this case what we're talking is not doing
bounds checking on input before copying data, which will always end up fucking
you in one way or another, regardless of what function you use.

Going on and on over this function vs that function is a distraction from the
bigger problem, which is the overall poor design of many applications.

~~~
clarry
> Theo is making the assumption that most programmers care about security,
> when in fact, they're probably primarily interested in creating code that
> works, and making it secure later.

Insecure software is broken software; it doesn't work correctly (unless
insecurity is an intended feature, such as in a backdoor).

> It doesn't matter what function you use to copy strings.

Technically you're right, you can even roll your own loop. However, the
purpose of library functions is to make your (and your users' & co-
developers') life easier. The strl* functions provide a simple idiom that
_just_ _works_ , and it takes _less_ effort than doing arithmetic prior to
strcpy/cat calls. It makes it easy for everyone to check that it does indeed
work.

> In this case what we're talking is not doing bounds checking on input before
> copying data, which will always end up fucking you in one way or another,
> regardless of what function you use.

The strl* functions are all about _doing_ bounds checking in a simple and
compact way while copying data, and they inform you in case the input was too
large. They're all about replacing broken (or entirely missing!) hand-rolled
bounds checking that you'd write before every strcpy/strcat.

Why don't you try audit code that uses strlcat & strlcpy? Grep for these and
check that every invocation is correct. Now do the same for code using strcat,
strncpy, and strcpy. Check all the arithmetic (which may take up goodness
knows how many lines, spread all over the place, in loops, etc.). After a few
hundred of these, you should begin to understand why strlcpy and strlcat
exist.

People have done such massive audits before. And misuse of the standard string
functions has indeed caused many many bugs. The secure alternatives weren't
invented just for the heck of it; there's actual history behind it.

~~~
peterwwillis
Yes, I understand why these functions exist. That's my whole point: we
shouldn't be nit-picking these functions in regards to security, because
simply using them does not make your program secure. You have to actually
design for security, not just use strlcpy and believe somehow you have
produced secure code. At the same time, just because someone has strcpy in
their code, does not mean their code is not secure. So this whole discussion
is pointless, is what i'm saying :)

~~~
clarry
Yes it's pointless if people continue to miss the point regarding
auditability, and the fact that harder-to-use functions that are also harder
to audit are more like to end up used wrong.

I realize that many out here don't care about auditability, and never spend
hours systematically going through hundreds or thousands of invocations in
other people's code, checking for correctness of exactly the kind of code that
historically has caused lots of easy-to-exploit security problems. I realize
that most people here would probably trust themselves to be able to use
strcpy, strncpy, and strcat correctly; I would trust myself too. But that is
not the point. With the right idioms, I make it easier for other people to
verify the correctness of my code, so they don't need to depend on blind
trust.

And to be honest, it doesn't take a whole lot of "design" to use strlcat
correctly; it is _simple_. Yes you _can_ misuse it, but I would hold it very
likely that most people who actually learn about it, check the manual, and
decide to use it, are going to use it correctly.

~~~
peterwwillis
_> I would hold it very likely that most people who actually learn about it,
check the manual, and decide to use it, are going to use it correctly._

You don't think that's dangerous? If your method of auditing code is skipping
all the strlcpy's because you assume it's being used correctly, you're going
to miss the one or two times it's not, and end up with vulnerabilities.

I haven't needed to audit code professionally, but when I do it personally, I
don't look for common pitfalls in function use - I look for general scenarios
that are dangerous, like working with potentially tainted data. So when I talk
about design security, I don't mean what function you use, I mean how it fits
into the broader scheme of what the application is trying to do.

~~~
clarry
> If your method of auditing code is skipping all the strlcpy's because you
> assume it's being used correctly

That is not at all what I have said, how did you come up with that? I've been
arguing that the idiomatic use of strlcpy is easier to audit than any use of
strcpy.

> I haven't needed to audit code professionally, but when I do it personally,
> I don't look for common pitfalls in function use - I look for general
> scenarios that are dangerous, like working with potentially tainted data. So
> when I talk about design security, I don't mean what function you use, I
> mean how it fits into the broader scheme of what the application is trying
> to do.

What you're doing isn't wrong, but looking for common pitfalls is a very easy
way to get a baseline assesment of code quality. You'd be surprised at the
amount of problems you can find just by looking for arithmetic overflows near
memory allocation, misuse of realloc, and the string handling functions. And
if the search turns out lots of such problems, trying to understand the design
and "broader scheme" is not very useful (at least for me, because at this
point I would've decided the software in its current state is simply not good
enough for me).

There's another reason to look for common pitfalls: you have something very
concrete to focus on. You can work mechanically. You can work with grep. It
really does make a huge difference. And let's not forget: these common
pitfalls are common. And commonly, they are exploitable. And they are the
things many attackers would also initially look for.

If you "just look" trying to find potential security issues in design or
otherwise, but are not looking for anything very specific, you can't be
anywhere near as focused, and it will take more time to usefully process the
code. If it takes a lot of time, focus is going to get worse with the
degrading attention span.. it'll be harder (at least for me) to be certain
that the review has actually been thorough and has managed to flag things that
need attention.

Oh I do look at the broader design of software, but only after getting the low
hanging fruit out of the way.

------
nly
Everything about this post by deraadt is wrong.

First of all, all of those software projects rolling their own implementation
should consider libbsd (installed on my system as a dependency of the Samba
client libs, so it should be on most desktop Linux distros already)

[http://libbsd.freedesktop.org/wiki/](http://libbsd.freedesktop.org/wiki/)

Secondly, it's fallacious to point out that lots of people already use these
functions and therefore it must be sane and should be encouraged. People used
gets() for decades.

[https://en.wikipedia.org/wiki/Argumentum_ad_populum](https://en.wikipedia.org/wiki/Argumentum_ad_populum)

Thirdly, strlcat and srlcpy _are awful_. The fact that people have already
moved from awful to _maybe slightly less awful_ variants actually means they
have some will to improve their code, which should be encouraged. strl* should
not be encouraged. You should be strung up (pun intended) if you're not using
a modern string handling library in 2014.

~~~
sounds
Please add links/references to your preferred string handling library of 2014.

------
eliteraspberrie
_You probably all know about the mainstream users of these functions, like the
Linux kernel, or MacOS, or the other BSD 's, and Solaris._

In fact the Linux kernel uses both strcpy and strlcpy -- strcpy is used more
often. Sometimes strlcpy makes more sense than strcpy, other times not. It's
not a popularity contest.

------
yarou
Again, it's a matter of how you treat strings. Using the standard library
functions to deal with strings is lazy, and eventually will cause problems(tm)
in the future. It's better to be proactive about it and roll your own
functions to deal with it. At worst, you've wasted extra time reimplementing a
library function.

