

Higher-order programming in C - sthatipamala
http://vedantk.tumblr.com/post/26182854460/higher-order-programming-in-c

======
caladri
For the set of features you have, with most calling conventions on most
platforms you don't need to synthesize all that. You could simply have your
apply function be written in assembly and handle setting up the parameters
fetching them from memory.

That also isn't higher-order programming as such. Can you create new functions
from doing that? There are ways to compose functions in C by hand, but the
distinction is that it cannot be done elegantly. (One could always just do
higher-order programming in C by constructing an abstract syntax tree, after
all.)

A good first effort in that you did correctly handle calling functions without
requiring them to fetch their arguments in an unusual way, which is what most
people would try at first. With a varargs apply function, you can handle the
case of passing values larger than a register and passing pointers without
having to cast them to a list, too. Consider the case of an apply function
which takes an unsigned integer as the size of the following argument. If it's
zero, then the argument list is finished. Otherwise, it then fetches an
argument of that size from the va_list and goes on to the next one. So then
your apply function is slightly uglier (int j = 42; apply(printf, sizeof
(const char _), "Hello, %s! %u\n", sizeof (const char_ ), "world"), sizeof j,
j, 0) but perhaps more flexible, and at least avoids relying on a few nasty
implicit casts.

Also, function pointers should not necessarily be cast implicitly to void *,
but on only a few architectures are function pointers so strange that you'll
have any trouble with your approach.

As it is, you could even avoid writing or generating any assembly by just
having the apply routine be a static inline function that does a switch on the
argument count and then casts the function pointer to a function pointer that
takes the right number of arguments. The compiler could do constant
propagation for the number of arguments and you'd have little overhead in your
function calls. You couldn't handle arbitrarily-large argument lists, but it
would work pretty well.

Keep at it!

~~~
davvid
_For the set of features you have, with most calling conventions on most
platforms you don't need to synthesize all that. You could simply have your
apply function be written in assembly and handle setting up the parameters
fetching them from memory._

Like this? <https://github.com/davvid/libapply>

unittest.c shows how it can be used:
<https://github.com/davvid/libapply/blob/master/unittest.c>

------
huhtenberg

      +-----------------------------------------------------------+
      |                                                           |
      |  C has support for dynamic typing (in the form of void*)  |
      |                                                           |
      +-----------------------------------------------------------+
    

A quotable thinker.

~~~
chrisohara
Why are people so scared of void*? Are there any VM's that don't resort to
some form of pointer tagging when implementing dynamic typing?

~~~
pmelendez
I don't think people is scared of void*. However, it is pretty easy to
introduce memory corruption bugs when it is used in big projects.

I find very wise to avoid it when it is not necessary or when there is an
equivalent safer alternative (e.g C++ templates if available and applicable).

In this case however, is not only completely fine but I don't think there is a
better way to do it.

BTW, my favourite phrase of the article:

"“… Why do this?” Because I can, of course."

The truly spirit of a hacker :)

------
ogoffart
I made it simpler (and slightly more portable ;-) ) :

    
    
        typedef void* (*f)();
        extern void* apply(void* func, void** args, size_t argc) {
            switch (argc) {
                case 0: return  ((f)(func))();
                case 1: return  ((f)(func))(args[0]);
                case 2: return  ((f)(func))(args[0], args[1]);
                case 3: return  ((f)(func))(args[0], args[1], args[2]);
                case 4: return  ((f)(func))(args[0], args[1], args[2], args[3]);
                case 5: return  ((f)(func))(args[0], args[1], args[2], args[3], args[4]);
                case 6: return  ((f)(func))(args[0], args[1], args[2], args[3], args[4], args[5]);
                case 7: return  ((f)(func))(args[0], args[1], args[2], args[3], args[4], args[5], args[6]);
                case 8: return  ((f)(func))(args[0], args[1], args[2], args[3], args[4], args[5], args[6], args[7]);
                case 9: return  ((f)(func))(args[0], args[1], args[2], args[3], args[4], args[5], args[6], args[7], args[8]);
                /*...*/
            }
        }

------
ori_b
If you ever want to do this in the real world, look at libffi:
<http://sourceware.org/libffi/>

------
tspiteri
I don't get the point. Why not use the generic function type?

    
    
        void *(*routine)(void *)
    

If you have a function already, just create a simple wrapper once. In the code
shown, the wrapping has to be done by the caller of apply().

And this apply() doesn't work on all functions anyway. Casting an array of
long into an array of void* makes quite a few assumptions and is very limited.
What would you do if your original function takes a struct as an argument?

------
mappu
I'd quite like to use bind() in C / nasm for my current project. I think you
can use a similar approach - create an executable area on the heap that pushes
an extra parameter and jmps to the target function, then just return a
function pointer to it.

EDIT: there might be a problem with arranging the stack parameters - if you
use call on the target, then you're pushing a return address onto the stack,
which messes up the arguments. But if you simply jmp, then neither the caller
or callee know exactly how to clean up the stack when the function returns...
hmm. I guess you could permanently reserve a register for this shimming but
that's hardly an elegant solution.

~~~
vedantk
I think the callee expects the return address in %rbp, and by System V
convention will pull certain arguments from above %ebp on the stack.

This might not help, but it's an interesting article on stack frames for all
interested: [http://eli.thegreenplace.net/2011/09/06/stack-frame-
layout-o...](http://eli.thegreenplace.net/2011/09/06/stack-frame-layout-
on-x86-64/)

~~~
mappu
Looks like fastcall, or any convention on x86_64.

I think you can do bound functions if you do callee-cleanup, then your
intermediate bound trampolines can just jmp and you don't end up with (too
many) weird problems. But it means that the final return address ends up at
the bottom of the stack, rather than at the top, like it would with a
conventional push/call system.

In any case it looks like cdecl is out of the question.

Some further discussion in a slightly different context, if you're interested:
[http://stackoverflow.com/questions/11271848/implementing-
bou...](http://stackoverflow.com/questions/11271848/implementing-bound-
function-parameters-in-a-compiler/11272296#11272296)

------
forgotusername
The ability to cast a function pointer to a regular pointer is an
implementation behavior (supported on every machine you'd want to use, but
nonetheless it's not standard C)

~~~
MichaelGG
That's interesting to know. Is there no general data type that's guaranteed to
hold a function pointer?

~~~
caladri
Only function pointer types. An indirect function call on some architecture,
for example, could require that when one makes an indirect call one sets up
several registers in addition to jumping to a particular location in memory,
and none of the registers that need to be set up can be derived from the jump
target. So in practice your function pointers would need to actually be
structures containing those register values and the jump target. Or you might
need to pass around a tuple of target and calling convention, although the C
standard limits the ways you could implement that (namely by requiring that
function pointers cast to and from other function pointer types will compare
as equal, and will still work when they are cast to the correct type.)

------
crb3
The blog's github interface is annoying: in order to save-to-disk more than
just the opening paragraphs in Firefox, I had to bring up Web Developer's View
Generated Source function and save the result of that. Not cool for anybody
who prefers to study offline.

~~~
drucken
Strange. I had zero trouble saving the blog to disk with Firefox on a Windows
PC.

\- it works with both an old generation of Firefox (3.x) and a post 3.x
version.

\- it correctly opens with the code, in both FF versions as well as IE.

\- NoScript also correctly blocks and unblocks the code depending on what
access you give to github.com and tumblr.com.

So, not sure what you are seeing, but I am unable to replicate it. As you can
probably guess, I am just as sensitive to these saving glitches when they
occur as you are too... :)

~~~
crb3
Firefox 3.5.6 on Mepis Linux 8.0 here; NoScript permissions open to github and
tumbler otherwise I wouldn't see code onscreen fwiw, tnx for feedback

~~~
chc
Firefox 3.5.6? Holy cow. It probably doesn't run well in NCSA Mosaic either.
Is there no newer version you can run? I find at least 4 is necessary to avoid
huge with the modern Web most of the time.

------
zvrba
Sigh, if the program is relying on undefined behavior [like this one], it's
not C. Plus, gcc has __builtin_apply as extension.

~~~
daeken
This is _not_ undefined behavior. It's implementation specific behavior --
there's a huge difference. If you specify the calling convention for your
functions, this is 100% deterministic and predictable. Nothing at all wrong
with this approach.

~~~
zvrba
Not undefined behavior, really? Run it on a 64-bit SPARC system and see if it
works. Not to mention that this implementation is dog-slow (two rather slow
syscalls [mmap/munmap] for each apply-invocation).

~~~
daeken
Again, you're conflating undefined behavior with implementation-specific
behavior. By your definition, anything that does implementation-specific
typedefs, uses inline assembly, or even makes syscalls is taking advantage of
undefined behavior. That's silly and ignores the actual definition.

~~~
zvrba
> Again, you're conflating undefined behavior with implementation-specific
> behavior.

No, I'm not. Casting between data and function pointers (or even a function
pointer to a function pointer of different signature!) is undefined behavior.
Look up the C spec.

~~~
to3m
6.3.2.3.8 - "A pointer to a function of one type may be converted to a pointer
to a function of another type and back again; the result shall compare equal
to the original pointer."

6.5.9.6 confirms that two function pointers that compare equal point to the
same function.

From my careful study of the code, I have a nagging sensation that this might
not quite be the end of the story as far as portability goes...

~~~
zvrba
6.3.2.3.7 allows for conversion between pointers to data ("objects") and
6.3.2.3.8 allows for conversion between pointers to functions; there is no
provision for casting between data and function pointers.

6.3.2.3.5 and .6 allow for abritrary (implementation-defined) conversions
between integer types and pointer types, so any conversion between function
and data pointers must go through a suitably large integer type if such exists
(converting a function pointer to a too small integer type is UB).

~~~
to3m
Yes - my response is to the suggestion that casting between pointers to
functions of different signature is undefined. I took this to be a suggestion
that the part of the approach that involves having a general-purpose function
that can accept a pointer to any kind of function, and reliably use that
pointer (after it's been casted, if necessary) to get a pointer to the
function in question, is fundamentally non-portable.

I remain unconvinced that this is a serious issue, all things considered. The
odd feeling that there could be other roadblocks to a fully portable
implementation still remains. Nevertheless, it appears that this particular
part could actually be done portably.

------
comex
Incidentally, if you want "real" higher-order programming in C, take a look at
blocks (Apple).

~~~
dchest
...if you want "real" higher-order programming in "C"...

~~~
SoftwareMaven
You're just not looking at the problem objectively...

Seriously, though, is there a reason the Objective C blocks wouldn't work in
C, just substituting functions for methods?

~~~
chc
The blocks are already supported in plain C mode as long as you enable the
Clang extension. You just use, for example, block_Copy(aBlock) instead of
[aBlock copy]. Apple framed it as a C feature in its proposal for
standardization (because Objective-C doesn't have a real standard AFAIK).

------
maybird
This makes me wish I could use "PROT_WRITE | PROT_EXEC" with mmap() on iOS.

~~~
buzzert
You actually can by exploiting a bug in the code signing check in the kernel!
[http://books.google.com/books?id=1kDcjKcz9GwC&lpg=PT92&#...</a>

------
aklofas
There is a macro way to do this. It involves lots of magic and beating the
preprocessor with a two-by-four:
[https://groups.google.com/forum/?fromgroups#!topic/comp.lang...](https://groups.google.com/forum/?fromgroups#!topic/comp.lang.c/9G50dW26u_E)

Go down this hole with caution.

~~~
bowyakka
I often want to beat the pre-processor with a two-by-four

