
Tearing apart printf() - necrodome
http://www.maizure.org/projects/printf/index.html
======
vinayms
This is a poorly written article that is convoluted and made unnecessarily
complex. This is similar to writing an article titled 'tearing apart google
search' and then dwelling on HTTP, TCP/IP, routers etc instead of actually
talking about how google search works.

I was expecting the article to start off by 'tearing apart' the source code,
then dig into the algorithms with which diverse formats are rendered, then
compare a few popular implementations and finally provide valuable insights,
something on the lines of what WalterBright mentioned and the brief discussion
it started. If not all this, the minimum I expected was an abridged, annotated
source code.

This article left me underwhelmed.

~~~
jxub
Could you point to a better explanation? I'd love to read that instead as I
thought the same.

------
WalterBright
In the 1980s, Borland made up for much of the poor code generation of their C
compiler by doing the runtime library all in hand-optimized assembler,
including printf.

What I did (Datalight, Zortech) was make large disk I/O buffers. It gave huge
boosts to speed, and very few people cottoned on to that :-)

It was a technique I learned from Shal Farley.

~~~
neonate
Can you say more? How did it give huge boosts to speed?

~~~
WalterBright
A typical buffer size for DOS was 512 bytes, the size of a floppy disk sector.
I used 20K. Writing bigger chunks at a time enabled the operating system to
optimize it for better throughput to the disk driver. (Fewer seeks, for
example.)

Fewer round trips to the operating system almost always speeds things up, too,
_especially_ if it involves a context switch.

Things got even better when the switch to hard disks was underway. Hard disks
had bigger sector sizes, but the buffer size sometimes remained at 512. This
meant the driver had to read the sector, fold in the 512, then write it back
out. Oops!

Later on, demand paged virtual memory and memory caching also favored big
buffers.

~~~
caf
Hard disk sectors were 512 bytes and remained that way up until about 2010.
ITYM the FAT cluster size?

~~~
WalterBright
If I recall correctly, disks emulated 512 byte sectors for compatibility, but
actually had larger ones. But I don't have documentation on such pre-google
things :-)

~~~
baruch
Disk sectors remained 512 for a very long time, only in recent years (2010
sounds right) did they make the 4k sectors with emulation for 512 or just
native 4k drives.

It is true however that sequential access to hdds and floppies was always much
faster than random access or even sequential in small buffers simply because
the drive does not need to wait for the platter rotation to reach the right
location again for each sector written but rather just write them all in one
go.

------
userbinator
_Possible trap: Some compilers may recognize the '1' literal used in
printf1.c, fold it in to the string, and avoid printf() in both cases. If that
happens to you, substitute an expression that must be evaluated._

Or more "robustly", an expression that comes from external input (such as
getchar()). Compilers can be smart enough to evaluate expressions which are
constant.

This reminds me, I briefly played around with JIT'ing printf() itself --- if
you look at it the right way, it's basically a "bytecode" interpreter for the
format string. I did get some pretty good performance gains, but it turned out
simpler for the project it was intended to go in to just switch to a binary
protocol instead of text for output.

I also think that writing a (subset of) printf() is a useful exercise for
those beginning programming C in general; it's not too difficult (at least if
you avoid the floating-point stuff...), but also not too easy either.

~~~
twic
I recently came across a C++ logging library that parses format strings at
compile time using template necromancy:

[https://blapid.github.io/cpp/2017/10/31/llcpp-a-quest-for-
fa...](https://blapid.github.io/cpp/2017/10/31/llcpp-a-quest-for-faster-
logging-part-2.html)

~~~
DerDangDerDang
The cool kids are doing it with constexpr. See spdlog and/or fmt

~~~
twic
fmt can check format strings at compile time, but does it generate code to do
the formatting? Some quick experiments suggests it will re-parse the format
string at runtime to do the formatting.

------
microtherion
Back in the day, I found P.J. Plauger's "The Standard C Library" immensely
instructive, as it contains a full implementation of the (then) standard
library.

Admittedly, the book is more than 25 years old now, but I'm sure it would
still make an interesting read:

[https://www.amazon.com/Standard-C-Library-P-J-
Plauger/dp/013...](https://www.amazon.com/Standard-C-Library-P-J-
Plauger/dp/0131315099/ref=la_B00IP4FPWQ_1_16?s=books&ie=UTF8&qid=1526871663&sr=1-16&refinements=p_82%3AB00IP4FPWQ)

~~~
atesti
Wasn't Microsofts STL based on that? I remember using strings on so many
binaries and always wondering about who pj Plauger is

~~~
microtherion
His libraries were licensed to a number of vendors. Specifically, the STL is
covered by [https://www.amazon.com/C-Standard-Template-
Library/dp/013437...](https://www.amazon.com/C-Standard-Template-
Library/dp/0134376331/ref=sr_1_3?s=books&ie=UTF8&qid=1526922642&sr=1-3&keywords=plauger+library)
and there is a third book covering the rest of the standard C++ library:
[https://www.amazon.com/Draft-Standard-Library-P-
Plauger/dp/0...](https://www.amazon.com/Draft-Standard-Library-P-
Plauger/dp/0131170031/ref=sr_1_4?s=books&ie=UTF8&qid=1526922642&sr=1-4&keywords=plauger+library)

However, the "Draft Standard C++ Library" book was finished just before the
standard was radically overhauled for the STL, and the STL itself and the
language have evolved massively, so in my opinion, those two book have not
aged as well as the C library book has.

------
saagarjha
> %n - null

> %% - complete form

Wait, what? %n writes the current number of characters outputted to the int *
you specify, while %% just prints a percent sign.

~~~
nneonneo
%n, also known as a terrible misfeature. It’s useful in Xscanf because it lets
you get the number of characters read, but it’s useless and outright dangerous
on the Xprintf functions - Xprintf already returns the number of characters
written, and allowing %n in printf has enabled a whole class of terrible
printf format string bugs that had no reason to exist.

~~~
saagarjha
Of course, most of the time the only reason why you have a %n in your format
string is because you passed arbitrary, unsanitized user input to it. If
you're doing something like that, scanf isn't going to work, either, because
the any attacker can just scribble all over your stack. And it's not _totally_
useless: it can be used for aligning things because it writes the number of
characters printed into a variable.

macOS, for one, has changed printf to SIGILL when passed a dynamic string that
contains %n, though, which mitigates most of the issues present with format
string attacks.

~~~
nneonneo
> And it's not totally useless: it can be used for aligning things because it
> writes the number of characters printed into a variable.

If you're going to align anything, you will (probably) need to write some code
to compute the right alignment. At that point, you will need to split your
printf into pieces, so why not just use printf's return code, which tells you
the number of characters written?

glibc on Linux started hardening printf by replacing calls to printf with
calls to __printf_chk, which will abort if the format string contains %n and
sits in a writable segment of memory. The implementation is somewhat horrific
- printf opens /proc/self/maps and range-checks the format string's address,
but it works and does provide the necessary security.

It's much less common to pass unsanitized input to scanf than to printf, in my
experience, because the most common error case of printf(buf) (instead of
printf("%s", buf)) is just not something that you can do with scanf.

\----

By the way, which version of macOS did they introduce the SIGILL hardening,
and is it documented? I'm on 10.12.6, and the following program:

    
    
        #include <stdio.h>
        #include <string.h>
        
        int main() {
            char buf[] = "Hi %d Bye\n";
            strcpy(buf, "Hi %n Bye\n");
            int foo = 424242;
            printf(buf, &foo);
            printf("foo was set to %d\n", foo);
        }
        

runs without complaint (Apple LLVM version 9.0.0 (clang-900.0.39.2)).

~~~
saagarjha
> If you're going to align anything, you will (probably) need to write some
> code to compute the right alignment. At that point, you will need to split
> your printf into pieces, so why not just use printf's return code, which
> tells you the number of characters written?

The child comment provides a use case for %n, so I won't go into this here.

> glibc on Linux started hardening printf by replacing calls to printf with
> calls to __printf_chk

Only with -D_FORTIFY_SOURCE=2 set when compiling. Swapping printf calls with
__printf_chk is off by default.

> The implementation is somewhat horrific - printf opens /proc/self/maps and
> range-checks the format string's address, but it works and does provide the
> necessary security

Luckily, macOS has a nicer Mach API for this, namely the
vm_region/vm_region_64 family of functions:
[https://opensource.apple.com/source/Libc/Libc-1244.30.3/stdi...](https://opensource.apple.com/source/Libc/Libc-1244.30.3/stdio/FreeBSD/vfprintf.c.auto.html)
(search for __printf_is_memory_read_only).

> By the way, which version of macOS did they introduce the SIGILL hardening,
> and is it documented? I'm on 10.12.6

You're off by a version; this was introduced in 10.13. Your program crashes
nicely on with Apple LLVM version 9.1.0 (clang-902.0.39.2) on macOS High
Sierra 10.13.5 (17F70a). Documented? Ha, I wish. It just ended up breaking a
couple of programs that relied on this behavior, since it would just call
os_crash with "%n used in a non-immutable format string".

------
maxk42
The most fascinating thing about this article for me was the copyright notice
at the bottom of the page consisting primarily of asterisks and hash marks.

Can anyone explain this to me?

~~~
Asmod4n
Probably brainfuck.

~~~
anoonmoose
Not like, literally brainfuck, because brainfuck doesn't use #.

Anyone got thoughts on what it might be?

    
    
      ##*#**##****#*#**/\##*###*****#**#*#*#**#******#**#*#*####*#*##*

------
SilasX
Very, very interesting deep dive. I reminds me of the nand2tetris.org course,
which does something very similar (albeit for a toy hardware/OS/language):
explain what the heck is going on between a print command and something being
written to the screen.

In nand2tetris, they work from a hardware simulation that represents memory
where a block of that memory determines what the pixels on the screen are set
to. Then you write code that can turn the pixels on or off, then a character
OS library that converts binary ascii values into a combination of pixel
settings that display a character.

Then you implement your OS's program loop (the "OS" is based around a one-shot
"here a bunch of functions, run from an entry point" that you have to specify
each time you turn it on) which has a specified entrypoint from which the rest
is called.

Obviously, modern systems are a lot faster, but it was a great illustration of
what has to happen in between, which this article is doing for real OSes.

------
abhishekjha
I hope the bitwise project helps me see these things a little more clearly.
Implementing something from ground up always feels nice.

------
ktpsns
How can compilers optimize a printf() call without format strings involved
towards a simple puts() call? Sure it would be easy in C++, roughly like

    
    
      inline void printf(const char* const s) { puts(s); }
      void printf(const char* const s, ...); /* non-trivial implementation */
    

but that's not possible in C.

~~~
detaro
They treat it as a built-in function, instead of always linking it to system
libraries. Built-in functions they can implement however they want. If you
insist on using the library implementation (e.g. because you have your own),
you have to tell the compiler not to use its internal one.

E.g. gcc docs on this: [https://gcc.gnu.org/onlinedocs/gcc/Other-
Builtins.html](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html)

------
cedricvanrompay
Newbie question: is there one context switch per line in the "printf()
execution sequence" figure?

~~~
bloak
I think it's like this: Each line with "###" in bold is a system call, and the
preceding non-bold lines show the corresponding backtrace (sequence of nested
active function calls at the time of the system call).

A system call is expensive, but not as expensive as a context switch between
two user processes.

