Comparing the Cost of Different Multiple-return Techniques in C

acqq · on Dec 23, 2013

I don't see the description of the most obvious (to this old C programmer) approach: if your function "returns multiple values" it probably should be rewritten to operate on a struct. Once you recognize that, you'd probably discover that you'll be able to use the same struct in more other functions, and that you'd need just to pass a single pointer to that stuct around, significantly reducing the overhead of a lot of parameter passing.

dmlorenzetti · on Dec 23, 2013

One of the methods tested was mutate-struct -- Dereference a struct pointer and mutate both its fields.

acqq · on Dec 23, 2013

Thanks. I haven't seen the description under "Three Common Techniques." And I also claim that the benchmarking isn't giving the method the due justice as the benefits of it can be visible only in the context where you really do operate with more functions on the same structures. Moreover, I think it's better to simply know how the C is translated to the assembly then to measure one very limited example. Specifically, some optimizations avoid calling functions at all. But the measurements of such examples don't properly reflect the behavior in bigger programs, where it can be bigger performance benefit to have smaller code than a fast microbenchmark.

Arnor · on Dec 23, 2013

I tend to agree with this approach and use it myself (in the rare instances that I use C). My only complaint is that as the number of functions touching the struct grow, maintainability gets harder. Can you share any rules/strategies to mitigate this problem?

yoklov · on Dec 24, 2013

You basically split it up in a similar way that you'd use in an OO language, except you don't have inheritance and only have composition. Specifically, you break apart the logically distinct portions of your structs and you make them structs themselves. E.g. compare (trivial examples, but you get the idea).

    /* okay */
    struct monster {
      unsigned id;
      int x, y, hp;
      /* etc. */
    };
    /* better */
    struct point { int x, y; };
    struct monster {
      unsigned id;
      struct point pos;
      int hp;
    };

This also allows a form of inheritance. For example if you have

    struct bad_monster {
      struct monster base;
      int badness; /* your imagination is the limit. */
    };

    /* this is safe */
    struct bad_monster bad;
    struct monster *some_monster = (struct monster *)&bad; /* equivalently &bad.base */
    some_function_expecting_a_monster(some_monster);
    /* only safe if it was originally a struct bad_monster */
    struct bad_monster *bad_ptr = (struct bad_monster *)some_monster;

Those are the ways I tend to break up structs when writing C code. Basically by using lots of composition.

acqq · on Dec 24, 2013

And the good aspect of C is the "flatness" of the structures: they are not "objects" with the hidden content and associated functions managing it. It's a way for the programmer to organize the data as efficiently as possible.

acqq · on Dec 23, 2013

Like always "divide and conquer." In a big program, I identify "subprograms" that can have only the "interface" (the very limited set of structures and functions) exposed to the rest. The subprograms aren't "libraries" in a sense that I do much faster changes to the "interface" but just having them makes everything much more maintainable. So some headers are visible only inside of the "subprogram." Second: not everything is worth to appear in headers at all: I use the functions and declarations local to the compilation unit as much as I can.

Back to the main topic, using structures properly will result in the best code you can produce for anything not trivially small. Practically every CPU I know of is specifically optimized to make addressing elements in the C structure as efficient as possible.

In your case, if there are a lot of functions to which you pass the pointer to the structure, then if you take care of the organization it means that they really do have to operate on the structure. But if you have a lot of arguments to any function, you probably need some even bigger structure and pass that around.

klipt · on Dec 23, 2013

> Once you recognize that, you'd probably discover that you'll be able to use the same struct in more other functions

And now you've reinvented (the basic premise of) OOP!

acqq · on Dec 23, 2013

I don't see that in "Fundamental features and concepts" of OOP.

http://en.wikipedia.org/wiki/Object-oriented_programming#Fun...

Let's stay on topic, talking about C. We'll avoid philosophy and hype and will keep talking about the work being done in C.

tokenrove · on Dec 24, 2013

It's the basis of encapsulation, which is on that list.

acqq · on Dec 24, 2013

This direction of discussion is exactly what I wanted to avoid.

hepek · on Dec 23, 2013

Reminded me of a very good book called "21st Century C"

http://shop.oreilly.com/product/0636920025108.do

eropple · on Dec 23, 2013

Sorry, but I seriously have to disagree: 21st Century C is one of the worst tech books I've had the misfortune to read. It's written by a butthurt C devotee whose interest is as much in sneering at other languages as it is in teaching something. It argues brace styles and text editors. It's loaded with inaccuracies. It can't be bothered to mention when it's using a header file to introduce a function. Typos abound. It seriously, without the slightest trace of humor, includes a chapter called "Becoming a Better Typist." It's just...crap. I stopped buying O'Reilly books after I couldn't get a refund for this mess.

This article is by clueful professionals. That book is everything except that.

mitchty · on Dec 24, 2013

Are you reading the same book as me, there is NO (FULL STOP) chapter named becoming a better typist. There is a note with that title at page 100 (ebook not sure on book), its also 4 short paragraphs long.

Is the entire book great? No, to be honest it touched too little on the C side for me, and yeah the punk analogies was a bit annoying at times, but it did get me to look critically at how I wrote c. Much like this blog I'm writing for clarity first and not optimization.

What would you recommend for a c programming book instead of this book?

eropple · on Dec 24, 2013

Sorry, yeah, it was a section, not a chapter. My bad. Its existence is still stupid and is a good anecdote regarding the book's lack of focus.

I haven't yet found a decent C book. I've found many good C++ books and have backported the lessons I've learned there into C, however.

mitchty · on Dec 24, 2013

Agreed as to overall lack of focus, I was annoyed that until chapter 7 there really wasn't much C to speak of, and the last chapters where he finally got into interesting things, were far too short.

That said I did learn a few tricks, and generally appreciated the books overall tone of you don't need to do crazy pointer stuff/malloc/realloc in C in general. If that makes any sense, have you read the Embedded TDD in C book by Pragmatic Programmers? http://pragprog.com/book/jgade/test-driven-development-for-e...

What C++ books would you recommend specifically? I find most to not really be of much help in straight C given the differences of the two languages (basically huge reliance on the standard library etc..).

silentbicycle · on Dec 24, 2013

Author of the top post, here. Two of my favorite C books are David Hanson's _C Interfaces and Implementations_ and Peter van der Linden's _Expert C Programming: Deep C Secrets_.

The former is ostensibly about designing reusable modules for data structures (e.g. ring buffers), but also contains a lot of wisdom about API design, particularly how to make the most of what little API design tools C gives you. The latter is all over the place, from funny anecdotes about compiler bugs and programming contests to all kinds of useful C information that is too advanced for an intro book on C-the-language, but way too practical for most books on a specific domain: valuable details about linkers, symbol visibility, different kinds of allocation, and the memory hierarchy. Also, it's called "Deep C Secrets" and has a giant fish on the cover. What more do you need to know?

I skimmed "21st Century C" briefly a couple months back and didn't have strong feelings about it one way or the other.

justincormack · on Dec 23, 2013

The other method is to change the calling convention and use a different ABI. NetBSD syscalls have multiple return values in registers, although it is little used (pipe returns two args, plus functions that return 64 bits on 32 bit platforms. If you were inventing a new ABI this could be nice. Multiple return values are very idiomatic...

mikeash · on Dec 23, 2013

From the ABI perspective, struct returns are multiple return values. ARM64 allows structs to be returned in registers if the struct members are suitable for it.

justincormack · on Dec 23, 2013

I didnt know that. Apparently amd64 lets you return two 64 bit values in registers too, but no more.

mprovost · on Dec 23, 2013

Two is still an incredibly handy feature. I would guess that the most common case for returning multiple values is data and size, like all the functions that return a string and the length of the string.

justincormack · on Dec 24, 2013

Yes. Or the value. error style as used by node.js. forcing errors unto the domain gets messy.

MichaelGG · on Dec 23, 2013

When compiling at full optimization and link-time code gen optimization, will the compiler and linker "get smart" about passing structs around? I'm inexperienced here, but it always seemed odd that a struct with two words can't just be shoved into registers (calling or returning).

Is there a reason the x86/64 ABIs don't specify such a way? Also, it seems that the 128-bit and larger registers aren't used as often to pass values. For something like a GUID, wouldn't they be a perfect fit? Or does the processor optimize stack handling so that it's almost as cheap as registers?

pcwalton · on Dec 23, 2013

In LLVM, internalized functions (which get created in LTO mode) are converted to fastcall. So yes, the ABI becomes smarter in that case.

The platform ABIs mostly are the way they are for backwards compatibility at this point. They aren't particularly efficient and have weird corner cases (e.g. complex numbers are treated differently than the equivalent struct, IIRC). On x86-64 the situation is a little better, but still not ideal.

pmjordan · on Dec 23, 2013

The x86-64 ABI does specify for multi-word structs to be returned in registers. If I remember correctly, it's up to 4 registers - rax, rdx and 2 others which I'll need to look up. This only happens if the struct fields on their own would be returned in this way, so an array of 16 chars would not typically be returned in 2 8-byte registers. To use SSE or AVX registers for anything other than float or double scalars, you'll need to use the special vector types.

I think x86 lets you return in eax and edx, but only if you're returning a single 64-bit value, not a struct with 2 32-bits. Not 100% on all of this.

stephencanon · on Dec 23, 2013

> I think x86 lets you return in eax and edx, but only if you're returning a single 64-bit value, not a struct with 2 32-bits.

Depends entirely on the platform ABI. Under the OS X calling conventions 2x32b is returned in eax:edx (even if the 32b values are floats, which is annoying).

pmjordan · on Dec 23, 2013

Ah, true. x86 ABIs are much less unified than x86-64 ones.

ruggeri · on Dec 24, 2013

vinkelhake, you seem to be dead (for no discernible reason).

Peaker · on Dec 23, 2013

IIRC the ARM ABI for C actually uses an implicit output parameter for a returned struct - so it's identical to passing the output parameters manually.

j_baker · on Dec 23, 2013

I'm curious: why are lower numbered benchmarks on the openbsd machine so much slower than higher-numbered benchmarks? Clearly, the machine is older and less powerful than the others, but I wouldn't expect that to alter the timing in this way. Is this a result of the machine, the OS, or the version of the OS?

silentbicycle · on Dec 23, 2013

I'm not sure, and digging into it would likely be a separate post. It's probably due to OpenBSD having very different cache strategies from OSX or Linux. It seemed to consistently do badly on the first run or two, then catch up on on the rest. It's often recommended to throw away the first run in a set of benchmarks, due to the added overhead of disk IO, but I wanted to retain that because something interesting is going on there.

The graphs also show more overhead for returning a struct with additional padding. OpenBSD's memory allocator favors security rather than speed, and the extra size may have costs for buffer overruns felt elsewhere. I like testing C code on OpenBSD -- it's almost as good as valgrind for detecting memory bugs, but with significantly less overhead, since the memory checking is integrated into the OS.

evmar · on Dec 23, 2013

Here's a post where I walk through the assembly generated by some different approaches. http://neugierig.org/software/blog/2011/12/return-by-value.h...

One possibility is that OpenBSD uses the alternative ABI for returning structs mentioned in that post.

silentbicycle · on Dec 27, 2013

The code for the benchmarks is up, if you want to investigate: https://github.com/atomicobject/multiple_return_in_c_benchma...

wyager · on Dec 23, 2013

How feasible is it to change the C standard to allow for implicit struct unpacking? So if I have `point foo(void)`, I can do `x, y, z = foo()`. It seems like it would work well with the many ABIs that return small structs in registers.

IsTom · on Dec 23, 2013

Was the data allocated on stack or heap when using pointers? I would assume the later as it's not stated explicitely and perhaps there is something to win here for short-lived values.

Tloewald · on Dec 23, 2013

Well, I guess the final point is almost worth it -- code for clarity.

mitchty · on Dec 23, 2013

Would've liked to see the exact code to better gauge what clarity involves specifically.

silentbicycle · on Dec 23, 2013

I'm planning on posting a repo with the benchmarks soon.

mitchty · on Dec 23, 2013

Cool beans. I saw this this morning and was excited.

I've been dealing with rewrites of a few internal utilities in C (was going to use go, but the runtime has issues) and was going to start testing how struct passing with errors in a struct for example would look in C.

I'll keep an eye out thanks!

silentbicycle · on Dec 27, 2013

The code is up: https://github.com/atomicobject/multiple_return_in_c_benchma...

silentbicycle · on Dec 24, 2013

Due to an inconveniently timed cold, it may still be another day or two. Soon.

theoh · on Dec 23, 2013

Why not just write everything in Continuation Passing Style? (joke)

silentbicycle · on Dec 25, 2013

That's why I mentioned compiling SML to C, actually. :) It's worth evaluating costs for what generated C looks like when something else (SML, Chicken Scheme, etc.) is compiling its constructs to C, just as much as it is for what people will typically write.

andrewla · on Dec 23, 2013

As with all benchmarks, without standard deviations, or, preferably, the complete test data, it is meaningless to attempt to infer anything from this data.

silentbicycle · on Dec 25, 2013

Yeah, I'm planning on posting the benchmark source, but between Christmas, a rather nasty ice storm, and a bad cold, I'm a bit behind. It will be up in a few days.