Hacker News new | comments | show | ask | jobs | submit login
Tearing apart printf() (maizure.org)
231 points by necrodome 6 months ago | hide | past | web | favorite | 42 comments



This is a poorly written article that is convoluted and made unnecessarily complex. This is similar to writing an article titled 'tearing apart google search' and then dwelling on HTTP, TCP/IP, routers etc instead of actually talking about how google search works.

I was expecting the article to start off by 'tearing apart' the source code, then dig into the algorithms with which diverse formats are rendered, then compare a few popular implementations and finally provide valuable insights, something on the lines of what WalterBright mentioned and the brief discussion it started. If not all this, the minimum I expected was an abridged, annotated source code.

This article left me underwhelmed.


Could you point to a better explanation? I'd love to read that instead as I thought the same.


In the 1980s, Borland made up for much of the poor code generation of their C compiler by doing the runtime library all in hand-optimized assembler, including printf.

What I did (Datalight, Zortech) was make large disk I/O buffers. It gave huge boosts to speed, and very few people cottoned on to that :-)

It was a technique I learned from Shal Farley.


As I recall, Borland's printf() was C code with the occasional pieces of inline assembly language; and this was the case for most of the C runtime library. There were occasional pieces actually implemented as fully fledged assembly language modules, but this was far from "all" of the runtime library.


Can you say more? How did it give huge boosts to speed?


A typical buffer size for DOS was 512 bytes, the size of a floppy disk sector. I used 20K. Writing bigger chunks at a time enabled the operating system to optimize it for better throughput to the disk driver. (Fewer seeks, for example.)

Fewer round trips to the operating system almost always speeds things up, too, especially if it involves a context switch.

Things got even better when the switch to hard disks was underway. Hard disks had bigger sector sizes, but the buffer size sometimes remained at 512. This meant the driver had to read the sector, fold in the 512, then write it back out. Oops!

Later on, demand paged virtual memory and memory caching also favored big buffers.


Floppy track seeking is such a pain to wait, so smartdrv.exe never left my autoexec.bat once discovered. I don’t understand how that wasn’t a default for latest DOSes.


Hard disk sectors were 512 bytes and remained that way up until about 2010. ITYM the FAT cluster size?


If I recall correctly, disks emulated 512 byte sectors for compatibility, but actually had larger ones. But I don't have documentation on such pre-google things :-)


Disk sectors remained 512 for a very long time, only in recent years (2010 sounds right) did they make the 4k sectors with emulation for 512 or just native 4k drives.

It is true however that sequential access to hdds and floppies was always much faster than random access or even sequential in small buffers simply because the drive does not need to wait for the platter rotation to reach the right location again for each sector written but rather just write them all in one go.


It's interesting how similar that is to the type of optimizations gamedevs do today. E.g. sub out "floppy disk sector" for GPU memory and crank up the numbers a few orders of magnitude.


It's not just gamedevs. I/O buffering remains an important source of both performances and frustrations (when the I/O is buffered a bit too heavily and nothing appears in the output because in-memory buffers are still filling), and most modern languages provide buffering wrappers or even buffer file I/O by default.


Possible trap: Some compilers may recognize the '1' literal used in printf1.c, fold it in to the string, and avoid printf() in both cases. If that happens to you, substitute an expression that must be evaluated.

Or more "robustly", an expression that comes from external input (such as getchar()). Compilers can be smart enough to evaluate expressions which are constant.

This reminds me, I briefly played around with JIT'ing printf() itself --- if you look at it the right way, it's basically a "bytecode" interpreter for the format string. I did get some pretty good performance gains, but it turned out simpler for the project it was intended to go in to just switch to a binary protocol instead of text for output.

I also think that writing a (subset of) printf() is a useful exercise for those beginning programming C in general; it's not too difficult (at least if you avoid the floating-point stuff...), but also not too easy either.


I recently came across a C++ logging library that parses format strings at compile time using template necromancy:

https://blapid.github.io/cpp/2017/10/31/llcpp-a-quest-for-fa...


F# has statically type checked printf also. I know it's a completely different realm than C++, but still pretty interesting. https://fsharpforfunandprofit.com/posts/printf/


The cool kids are doing it with constexpr. See spdlog and/or fmt


fmt can check format strings at compile time, but does it generate code to do the formatting? Some quick experiments suggests it will re-parse the format string at runtime to do the formatting.


Not strictly a JIT but a preprocessor at compile time that makes sprintf calls faster by pre-splitting them: https://github.com/h2o/qrintf


Back in the day, I found P.J. Plauger's "The Standard C Library" immensely instructive, as it contains a full implementation of the (then) standard library.

Admittedly, the book is more than 25 years old now, but I'm sure it would still make an interesting read:

https://www.amazon.com/Standard-C-Library-P-J-Plauger/dp/013...


Wasn't Microsofts STL based on that? I remember using strings on so many binaries and always wondering about who pj Plauger is


His libraries were licensed to a number of vendors. Specifically, the STL is covered by https://www.amazon.com/C-Standard-Template-Library/dp/013437... and there is a third book covering the rest of the standard C++ library: https://www.amazon.com/Draft-Standard-Library-P-Plauger/dp/0...

However, the "Draft Standard C++ Library" book was finished just before the standard was radically overhauled for the STL, and the STL itself and the language have evolved massively, so in my opinion, those two book have not aged as well as the C library book has.


Yep, his company is called Dinkumware.


> %n - null

> %% - complete form

Wait, what? %n writes the current number of characters outputted to the int * you specify, while %% just prints a percent sign.


%n, also known as a terrible misfeature. It’s useful in Xscanf because it lets you get the number of characters read, but it’s useless and outright dangerous on the Xprintf functions - Xprintf already returns the number of characters written, and allowing %n in printf has enabled a whole class of terrible printf format string bugs that had no reason to exist.


Of course, most of the time the only reason why you have a %n in your format string is because you passed arbitrary, unsanitized user input to it. If you're doing something like that, scanf isn't going to work, either, because the any attacker can just scribble all over your stack. And it's not totally useless: it can be used for aligning things because it writes the number of characters printed into a variable.

macOS, for one, has changed printf to SIGILL when passed a dynamic string that contains %n, though, which mitigates most of the issues present with format string attacks.


> And it's not totally useless: it can be used for aligning things because it writes the number of characters printed into a variable.

If you're going to align anything, you will (probably) need to write some code to compute the right alignment. At that point, you will need to split your printf into pieces, so why not just use printf's return code, which tells you the number of characters written?

glibc on Linux started hardening printf by replacing calls to printf with calls to __printf_chk, which will abort if the format string contains %n and sits in a writable segment of memory. The implementation is somewhat horrific - printf opens /proc/self/maps and range-checks the format string's address, but it works and does provide the necessary security.

It's much less common to pass unsanitized input to scanf than to printf, in my experience, because the most common error case of printf(buf) (instead of printf("%s", buf)) is just not something that you can do with scanf.

----

By the way, which version of macOS did they introduce the SIGILL hardening, and is it documented? I'm on 10.12.6, and the following program:

    #include <stdio.h>
    #include <string.h>
    
    int main() {
        char buf[] = "Hi %d Bye\n";
        strcpy(buf, "Hi %n Bye\n");
        int foo = 424242;
        printf(buf, &foo);
        printf("foo was set to %d\n", foo);
    }
    
runs without complaint (Apple LLVM version 9.0.0 (clang-900.0.39.2)).


> If you're going to align anything, you will (probably) need to write some code to compute the right alignment. At that point, you will need to split your printf into pieces, so why not just use printf's return code, which tells you the number of characters written?

The child comment provides a use case for %n, so I won't go into this here.

> glibc on Linux started hardening printf by replacing calls to printf with calls to __printf_chk

Only with -D_FORTIFY_SOURCE=2 set when compiling. Swapping printf calls with __printf_chk is off by default.

> The implementation is somewhat horrific - printf opens /proc/self/maps and range-checks the format string's address, but it works and does provide the necessary security

Luckily, macOS has a nicer Mach API for this, namely the vm_region/vm_region_64 family of functions: https://opensource.apple.com/source/Libc/Libc-1244.30.3/stdi... (search for __printf_is_memory_read_only).

> By the way, which version of macOS did they introduce the SIGILL hardening, and is it documented? I'm on 10.12.6

You're off by a version; this was introduced in 10.13. Your program crashes nicely on with Apple LLVM version 9.1.0 (clang-902.0.39.2) on macOS High Sierra 10.13.5 (17F70a). Documented? Ha, I wish. It just ended up breaking a couple of programs that relied on this behavior, since it would just call os_crash with "%n used in a non-immutable format string".


At that point, you will need to split your printf into pieces, so why not just use printf's return code, which tells you the number of characters written?

Suppose you are aligning subsequent output lines to the current line, and want to know the number of characters at certain points in the string. In fact this is probably the exact use-case for %.* and %n used together.


Reminds me of the time I realized that the indie multiplayer game I was playing would print jibberish whenever you typed a "%" in the chat and therefore it was probably just doing an unsanitized sprintf somewhere. Then I realized that putting a %n in there would mess things up and yup crash the server (not just the client). It would have taken a lot more skill than I had to actually exploit that to do more than just randomly crash servers.


Yeah, pulling off a successful format string attack generally requires knowledge of where the stack is or having a variable pre-initialized to the address of something on it; then you'll need the address of something useful to jump to. If you have a copy of the non-PIE/ASLR executable, this is easy, otherwise other methods are needed.


Actually, if you (the attacker) receive the printf output, then you can pretty trivially leak almost any memory you want with %N$x, allowing you to first leak the entire stack, then pivot to leak any referenced memory regions (which will almost certainly include the executable and many libraries due to return addresses on the stack). Worse, you can then use %N$n to write to any pointer in memory - if you can find your input string in memory (dereference stuff until you find the heap, for example), you can stuff pointers into it and get a write-what-where primitive (which is game over).

This is part of what makes format string vulnerabilities so nasty - they hand you a free leak in addition to %n being capable of very flexible overwrites.


> if you (the attacker) receive the printf output

This is a luxury that not all of us have ;)

> you can then use %N$n to write to any pointer in memory - if you can find your input string in memory (dereference stuff until you find the heap, for example), you can stuff pointers into it

You first have to get a valid pointer to something, which can be pretty hard when the addresses are randomized. Yes: if you can satisfy all the requirements for a printf attack, there's a whole lot you can do: overflow the stack and write to the return address, find the address of system, perform arbitrary reads or writes, etc. But as you've mentioned yourself, a successful format string exploit requires more than just "someone passed a user-controlled string to printf", at least as far as I'm aware.


In this context, we did get the sprintf output - it was the chat functionality of a video game and that's how I knew anything was amiss.


The most fascinating thing about this article for me was the copyright notice at the bottom of the page consisting primarily of asterisks and hash marks.

Can anyone explain this to me?


Probably brainfuck.


Not like, literally brainfuck, because brainfuck doesn't use #.

Anyone got thoughts on what it might be?

  ##*#**##****#*#**/\##*###*****#**#*#*#**#******#**#*#*####*#*##*


Very, very interesting deep dive. I reminds me of the nand2tetris.org course, which does something very similar (albeit for a toy hardware/OS/language): explain what the heck is going on between a print command and something being written to the screen.

In nand2tetris, they work from a hardware simulation that represents memory where a block of that memory determines what the pixels on the screen are set to. Then you write code that can turn the pixels on or off, then a character OS library that converts binary ascii values into a combination of pixel settings that display a character.

Then you implement your OS's program loop (the "OS" is based around a one-shot "here a bunch of functions, run from an entry point" that you have to specify each time you turn it on) which has a specified entrypoint from which the rest is called.

Obviously, modern systems are a lot faster, but it was a great illustration of what has to happen in between, which this article is doing for real OSes.


I hope the bitwise project helps me see these things a little more clearly. Implementing something from ground up always feels nice.


How can compilers optimize a printf() call without format strings involved towards a simple puts() call? Sure it would be easy in C++, roughly like

  inline void printf(const char* const s) { puts(s); }
  void printf(const char* const s, ...); /* non-trivial implementation */
but that's not possible in C.


They treat it as a built-in function, instead of always linking it to system libraries. Built-in functions they can implement however they want. If you insist on using the library implementation (e.g. because you have your own), you have to tell the compiler not to use its internal one.

E.g. gcc docs on this: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html


Newbie question: is there one context switch per line in the "printf() execution sequence" figure?


I think it's like this: Each line with "###" in bold is a system call, and the preceding non-bold lines show the corresponding backtrace (sequence of nested active function calls at the time of the system call).

A system call is expensive, but not as expensive as a context switch between two user processes.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: