
Where the Printf Rubber Meets the Road (2010) - striking
http://blog.hostilefork.com/where-printf-rubber-meets-road/
======
hostilefork
So for the record, I wrote this four years ago. No idea why this is being
linked to from here now.

Let the record show, also, that I in no way stand by this as being a sane
approach. I just walked through it, trying to answer the question. I wanted to
show the path to syscall and it was way wackier than I thought it would be.
That's why it ended up a blog entry.

(Though when my blog went down one day, someone copied the content into the
answer:

[http://stackoverflow.com/revisions/2444508/3](http://stackoverflow.com/revisions/2444508/3)

...and they did so ignoring the license clause difference that my blog is CC-
BY-NC-SA instead of CC-BY-SA. There's been a bit of a tussle over that
distinction lately, with people selling StackOverflow books:

[http://meta.stackoverflow.com/questions/272768/is-this-
site-...](http://meta.stackoverflow.com/questions/272768/is-this-site-
violating-the-tos-of-stack-overflow-and-or-amazon-and-how-to-report)

...and I'll leave it to those with more interest in the issue to decide if
that is worth worrying about, because I don't actually care.)

Anyway, I'm sure the people involved in writing this had their reasons for
doing it this way, and it had to do with code legacy and evolution. I don't
want anyone to mistake my trace through it as endorsement. It's just what it
is.

~~~
DannyBee
You did miss one fun rabbit hole with printf in glibc:

Search for register_printf_function, and realize that printf is now a function
that can have whatever side-effects you like (which really sucks for
optimization around logging code)

~~~
jzwinck
Glibc may have register_printf_function, but GCC does not have
register_printf_attribute, so this is not as practical as it could be. That
is:

    
    
        __attribute__ ((format (printf, 1, 2)))
    

attaches to a function declaration to indicate that the function works like
printf, so GCC can warn if a (compile time) format string does not match the
variadic arguments. But there seems to be no way to extend this attribute, nor
to make entirely new attributes in the same spirit.

So you can register new format specifiers for printf, but it seems you'll then
have to disable warnings about bad format specifiers during compilation. Those
warnings have very few false positives and catch real bugs in real code.

~~~
DannyBee
"But there seems to be no way to extend this attribute, nor to make entirely
new attributes in the same spirit."

Yes, GCC knows, and we're good with that :)

The glibc extension here serves no earthly purpose (and on the gcc side, we've
considered pretending it doesn't exist before so we can actually optimize
better).

It sounds like it may be a good idea, except 1\. It modifies functions that
have a standard purpose, standard set of format specifiers, and are, as you
point out, often error checked to make sure they match those :)

2\. Unless you see the printf handler registration, which may be in a library,
linked to your program, or whatever, you'll have no idea that the code is
wrong.

------
chrisseaton
So what is the reason for all the rest of the indirection and macros? Why
__printf and then aliased to printf? Why is outchar a macro instead of being
factored into a function? If you asked me to implement printf I wouldn't
immediately do all of this - so what would my implementation be missing that
makes these necessary?

~~~
jzwinck
I can speculate: __printf(), having two leading underscores, is a named
reserved for "the implementation," meaning the compiler and standard library.
This enables at least one useful thing, which is that the standard library (or
a third-party library I guess) can use this function when it needs to print
things, and not worry that printf() has been redefined by the application.

You might think, but who in their right mind would redefine printf()? But C
has an infinite number of people using it all the time, so every possible
weird thing has been done a few times by now.

~~~
MaulingMonkey
> You might think, but who in their right mind would redefine printf()?

One cannot discount those in their wrong mind either.

I had a header dedicated to #undef ing things ruby.h redirected so I could
include it in C++ contexts without breaking the SC++L. While I don't see
printf on the list, read, write, close, fclose, sleep, and many more are.

------
lelandbatey
This is interesting and timely since just recently I attempted to make the
same journey in the name of creating a tiny obfuscated C program[0]. However,
right at the part where the author says _" you might start thinking that you
no longer care how printf works"_ is exactly where I stopped caring.

In the end, I did find out _a_ way to write to stdout without calling putchar
or including stdio directly, but there's still some mystery in the call to
_write_ :

    
    
        // Printing to stdout without stdio or putchar
        // Originally adapted from here:
        // http://stackoverflow.com/a/14296280
        void print_char(char item, int len)
        {
            for (;len; --len)
            {
                write(1, &item, 1);
            }
        }
    

[0] -
[https://github.com/lelandbatey/tiny_tree_printer](https://github.com/lelandbatey/tiny_tree_printer)

~~~
cbd1984
How is that function mysterious? It just writes the same character to stdout
len times. The only mystery is how anyone thought that would be efficient.

~~~
lelandbatey
What I mean by mysterious is that the syscall is hidden in the write
statement, vs hidden in the printf statement. Also, I wrote the above function
(actually a much smaller version) as part of a program that would print a
binary tree of any given height[0]. It wasn't meant to be efficient, it was
meant to be small, since the total size of the program was 777 bytes (later
reduced to 505 bytes).

[0] - [http://lelandbatey.com/posts/2014/09/binary-tree-
printer/](http://lelandbatey.com/posts/2014/09/binary-tree-printer/)

------
jbrichter
I had a similar question to this once, and what I mainly found out was that
FreeBSD's (and thus OSX's) libc is much, much, more readable than glibc.

~~~
jedisct1
Any other C library is more readable than glibc. Even on Linux, the Musl C
library is far far far more readable.

The OpenBSD C library is the one I usually look at when I want to understand
how a specific function works. It doesn't have insane optimizations or bloat
like glibc, but it's clean and portable.

~~~
jbrichter
The only thing you can begin to understand by reading glibc is the terrible
genius of Ulrich Drepper.

------
JoshTriplett
The original Stack Overflow question seems to have some fundamentally broken
assumptions about implementation in assembly. The design decisions of one
random compiler don't determine how all implementations work. And even for
that compiler, a lack of inline assembly doesn't mean the standard library
can't use assembly; it just has to use separate assembly files. Beyond that,
the compiler itself generates machine code, and it can (and does) do so as
part of the implementation of varargs.

Regarding the site hosting the article: ugh, it's bad enough to capture the
left arrow key and have it go to a previous article, but pay attention to the
modifiers to avoid breaking alt-left as a keyboard shortcut for "back".
(Browsers shouldn't even allow unprivileged pages to override shortcuts like
alt-left.)

~~~
hostilefork
But I'll patch the alt-left thing if it makes you happy. :-/ Still, if you
want to give feedback, write an email and make it friendly. It doesn't make
people feel good with "ugh". There's so much "ugh"-worthy stuff out there and
I don't think I deserve that.

~~~
JoshTriplett
I appreciate you taking the time to fix the alt-left problem. I've run into
too many sites doing similar things lately, including Google Blogger-based
sites; this was just the most recent one I've run into. I was attempting to
express a very mild annoyance; sorry if it came across as excessively snarky.

~~~
hostilefork
It's all right, I just try kind of hard to be the least ad-having, most
license friendly, site I can host... so it gets my goat a bit given what I
think are better "ugh" targets by far.

I'm traveling and it's not convenient to fix it tonight, but I will do it as
soon as I can.

------
UnoriginalGuy
In my more nieve days I used to genuinely believe that macros were a cool
language feature. Now after spending some time in C/C++ I think macros are the
enemy of code readability, and I think this article summarises WHY nicely.

I'm sure there is some political reason why ldbl_strong_alias was used to
rename printf to __printf, but frankly I don't really care, there are macros
like that all over the C/C++ libraries that make finding the source for
anything pure hell.

The only acceptable macro to me now is a block which only runs in the DEBUG
build and is skipped over by the compiler in RELEASE.

PS - Yes this is why I don't code in C/C++ very often. I'm currently playing
with Go which has no concept of macros.

~~~
zem
the very few times you want the source to something to debug it etc. you'll
have a hard time. the 99% of the time when you want not the implementation
details but some semantically meaningful mapping of one token to one concept,
macros are wonderful.

another huge benefit is that they take care of fiddly bookkeeping for you - if
you have to remember some incantation that involves doing several steps in
sequence and making sure that if you change place A you need to change B and C
in certain ways, just write a macro and be happy.

------
IgorPartola
I messed around with an MSP430 micro controller for a while. There, I didn't
want to use a "standard" (Third-party but popular) library. I ended up
grabbing and modifying a much more direct version of printf, and I got to the
syscall part much quicker. When you only have 256 bytes of RAM and 64 KB ROM,
you don't have such luxury. You also dont need to support C++.

------
userbinator
I think printf() is one of the more interesting functions in the standard C
library to examine, since it is both so often-used and has the very visible
effect of producing output. It also shows some of the ways in which data can
be interpreted. Variants or limited-functionality versions of it can make good
assignments for programming courses.

Printing floating-point numbers is also another topic that is subtly more
difficult than it looks at first glance... here's one of the better articles
about it:
[http://www.ampl.com/REFS/rounding.pdf](http://www.ampl.com/REFS/rounding.pdf)

------
malkia
When I was using actively Turbo/Borland Pascal I always thought that the
"Write"/"Writeln" (forgive me, if mispelled) were magic functions :) - since
they were taking random amount of arguments, but there was no pre-processor,
or "..." as in "C"...

------
emilyst
There is a very quick overview of all this here:
[http://en.wikipedia.org/wiki/Write_(system_call)#Higher_leve...](http://en.wikipedia.org/wiki/Write_\(system_call\)#Higher_level_I.2FO_functions_calling_write)

------
asveikau
I think the question comes from a lack of understanding of varargs, not a
desire to see a bunch of crappy glue code inside glibc.

And for that type of C newbie who hasn't seen varargs... I'd say the answer is
pretty intuitive if you know how to look. First of all the answer is staring
at you in the manpage:

    
    
       int printf(const char *s, ...);
       int vprintf(const char *s, va_list ap);
    

OK, so there is the set of ordinary C functions that do it. Look up this
va_list thing and an implementation becomes obvious... printf sets up a
va_list and calls vprintf... vprintf scans the string for '%' and uses
va_arg...

But the important thing is none of this is magic and you can totally guess it
just by looking at documentation. I don't think it ever confused me when I
learned C. I don't really get why this is so shocking to people who read the
manpages.

~~~
zem
what? the question had nothing to do with varargs; the poster quite rightly
noted that at some point in the implementation of printf, when you needed to
send an actual character to an actual output device, you would need something
conceptually lower-level than C. they were just confused about the specifics
of interfacing between C and the machine; i think one of the comment-level
answers put its finger on the nub:

> Visual Studio x64 doesn't support inline assembler. That doesn't mean you
> can't have assembler code. You can still have assembler, just not inline.

and in the answer section

> The system calls are (on platforms that I know of) internally implemented by
> a call to a function with inline asm that puts a system call number and
> parameters in CPU registers and triggers an interrupt that the kernel then
> processes.

~~~
asveikau
I have a hard time seeing this as a magical part even in the eyes of a newbie.
How common is it to hear about stdio but not also know about write()? I am
pretty sure one man page refers to another. My point, just as it is with
varargs, is you should be curious enough to find these answers yourself, run
some experiments, make some good guesses.

Since Visual Studio is mentioned, I'll say that despite some extra layers
(CRT, kernel32, ntdll) the concepts are identical and when I was a kid
learning C I was able to dig through header files and documentation to
conceptually figure it out.

~~~
zem
it's not a question of anything being magical; the op just got confused by the
difference between writing inline assembly in a c program and calling from c
code into compiled and linked in assembly code.

~~~
asveikau
I'll say, somebody is a little confused.

I still say, they don't understand varargs, don't understand the difference
between stdio and write(), and most importantly, they are not curious enough
to read documentation and make smart guesses. And I am saying, the bullshit
obfuscation involved in this article, sharing so many meaningless
implementation details about a very specific libc, will not help answer this
question or gain insight more than RTFMing and being curious about the world
would.

So... If you are even asking questions like... "Does printf use assembly?" I'd
argue you are not thinking it through. You need to get yourself to a place
where you can answer that yourself. It will be a very obvious answer when you
do. Writing frivolous blog posts is not the way to get there.

~~~
hostilefork
Just noting that the article is not the _source_ of the bullshit obfuscation,
nor does it condone it. I'm part of the Red project.

[http://www.red-lang.org/p/contributions.html](http://www.red-
lang.org/p/contributions.html)

We're trying to undo this stuff. I was just answering a question.

~~~
zem
red looks awesome, and has come a lot further than I expected! (i peeked at it
a few years ago and there didn't seem to be much). I love that small
executable sizes are an explicit design goal.

~~~
hostilefork
The next release will be able to directly build and sign .APK files, with no
JDK or JarSigner required to be installed. Still under 1MB for that same
compiler executable on all platforms...that can also build PE (Windows),
Mach-O (OS/X), and ELF (Linux etc.)

It's moving slower than we'd like, but definitely moving.

