Great article, and nice to have clear examples of calling printf() from x64! *x8...

rayiner · on May 1, 2014

See: https://blog.nelhage.com/2010/10/amd64-and-va_arg ("To start, any function that is known to use va_start is required to, at the start of the function, save all registers that may have been used to pass arguments onto the stack, into the 'register save area', for future access by va_start and va_arg. This is an obvious step, and I believe pretty standard on any platform with a register calling convention. The registers are saved as integer registers followed by floating point registers. As an optimization, during a function call, %rax is required to hold the number of SSE registers used to hold arguments, to allow a varargs caller to avoid touching the FPU at all if there are no floating point arguments.")

When va_start is used, it needs to save argument registers to the stack in the prologue of the function. The program is free to use different conventions (format string, sentinel value) to signal to the callee how many arguments there are. But the code generated for va_start has no way of knowing what convention the program happens to use.

nkurz · on May 1, 2014

Great reference, thanks!

I guess it makes sense to keep a consistent argument passing ABI, but I still find the answer quite sad: to preserve the ability to call functions without prototypes, you pass the arguments in registers and then immediately write them back to the stack.

Putting the number of vectors in %rax/%al seems at odds with the consistent ABI argument. Once you are changing things to require this, it seems like you might as well make some other useful changes as well: like passing the number of arguments and skipping the register to stack conversion.

It would be nice if there was a contortion-free entry point to the x64 printf() that started with the values on the stack. Can vprintf(const char *format, va_list ap) be used in this way instead? Is a 'va_list' just a block of memory containing the arguments?

I guess I need study the article you referred to (and stare at the libc source a while: https://sourceware.org/git/?p=glibc.git;a=blob;f=stdio-commo...)

comex · on May 2, 2014

> to preserve the ability to call functions without prototypes, you pass the arguments in registers and then immediately write them back to the stack.

"Back" to the stack is a big assumption. A lot of the time, there is no reason an argument would be on the stack in the first place; or even if it is a spilled variable, it usually wouldn't be in the right place (without some fairly uselessly clever stack layout), so it's just a matter of whether the caller or callee does the write.

userbinator · on May 2, 2014

The GP is not alone in thinking that the x86-64 ABI is "a total mishmash"; it feels to me like the designers were deliberately avoiding optimisation, when an ABI is really one thing where any optimisation wouldn't ever be "premature" - after all, it's something that's going to be used by millions if not more pieces of software, so little things really add up.

For example, it's not necessary that all the args passed in registers need be written back to the stack by va_start --- ignoring for the moment the complication of structures and SSE registers, a va_list could just contain one field, which would initially hold an index, the "current register number" where the desired argument is stored. Then va_arg (which is now implemented as a compiler primitive) could generate code that checks this index, and if the index is less than the maximum number of arguments that can be passed in registers, resolve to an access to the particular register (here's where a "register-indirect" addressing mode would be really useful, since it could just use that index). Otherwise that field becomes a pointer into the stack like the traditional implementation. In other words, depending on that index, reading a scalar va_arg turns into reading a register or reading from the stack.

It's not that hard to extend this to accommodate structures; and I don't see any limiting reason on why the ABI couldn't pass structures partially in registers and partially on the stack. We just have to define the rules that let us do so, as a structure is only a concept of grouping data together, an HLL construct. There's nothing that restricts a structure to being always a contiguous set of registers or memory locations. Having to "reassemble" a structure in memory is only necessary if its address is taken, and then only if operations other than accessing its members via that address are performed; otherwise its components can be operated on independently, regardless of whether they're in registers or memory. Since the compiler knows where va_arg is used with a structure type, it can decide whether it really needs to generate the code to reassemble the structure, or only a series of scalar accesses. (I touch on this point somewhat here too: https://news.ycombinator.com/item?id=7683823 )

Extending this to the SSE regs is simpler than for structures, since this is merely another bank of registers that arguments can be read from: another field in va_list to hold the SSE register index/pointer.

I find it a bit odd that the ABI would define an actual implementation of varargs rather than only where the arguments will be expected to be, since how they're actually accessed should be outside the scope of an ABI spec; now every compiler vendor is tempted to just copy this (IMHO sub-optimal) implementation instead of thinking about how it could be done, and possibly coming up with something better.

ndesaulniers · on May 1, 2014

I'm not above admitting to using printf debugging. ;) rayiner found some great answers to your question.

nkurz · on May 1, 2014

I think it's often the right tool. My personal flavor involves wrapping the printf() with a conditional that depends on the current function name and the value of an environment variable. I've been trying to figure out how best to write a generic macro for this in x64.

You might be interested in Josh Haberman's post on using GDB's "breakpoint command lists" as an alternative to actually putting the variadic calls into the assembly: http://blog.reverberate.org/2013/06/printf-debugging-in-asse...