Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Good for laughs. Thanks. Now the serious follow up: Ok, what else should someone do?

Programs (often) handle text. Apparently, that pretty much means you're fucked. So what is a reasonable way to write such programs?



Make damned sure you know what you're doing. That means making sure you have enough memory allocated to avoid overflows, and that any input is sanitized before putting it down. Meaning, if you're using a function that's expecting a null terminated string, make SURE it's null terminated before copying. Or that you know the exact length to pass into a length specified function.

The problem isn't necessarily the functions themselves, it's coders who make assumptions that don't pan out to be true.


Do all of your string work with these guys:

    struct string {
      char *str;
      unsigned length;
    }


Let's assume your string struct is solid. Then does that mean you can safely use it with `printf, fprintf, sprintf` (e.g. printf("%s", string->value)? Or must you also write custom versions of those functions? How deep does this rabbit hole go?


You don't have to write custom versions of any of those functions; just use the char pointer in the struct instead of a bare char pointer. Keeping track of the length of your strings gives an easy way to provide the 'n' in all of those 'n' functions, and has other advantages besides. But the use of such a struct in and of itself, of course, provides no guarantees of safety. There is no such thing in C anyway :)


You'd probably want

    "%.*s"
or possibly even

    "%*.*s"
if you want it space padded. In principle you can bound your space usage and avoid an snprintf with such constructs; in practice, it's probably better to still use snprintf (if you're using standard-library string functions at all).


If you're able, don't treat text as zero-terminated char arrays. Ideally, you'd have a well fleshed-out library for encoding-aware ropes.


Which libraries do you recommend?


Unfortunately, I don't know of one. My recent C work has worked with text only in a very limited capacity (parsing and building packets in an ascii format - for the later, vectorized write buffers are a poor-man's ropes).

Edited to add:

Apparently there is this: http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/cordh...


I recommend bstring for length-prefixed strings in C:

http://bstring.sourceforge.net/


Bstring relies on undefined behavior for security. Don't use it if you care about security.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: