

Ask HN: How does a JIT-compiled function communicate with a bytecode function? - tomdtpink

consider we have two functions compiled into bytecode first. if the first function is JIT-compiled, how does the second function call the x86 code of the first function ? how does it extract the results ?<p>basically how does this leap into the "x86 land" and from there back into the "bytecode land" occur ? also how is the "x86 land" stored and accessed ?
======
mahmud
By using function pointers!

The concept of "function" or "procedure" doesn't really exist concretely.
Sure, some processors provide opcodes to view the accessible memory as a
"stack", to accommodate some specific, if popular, block structure
implementations, namely C and Fortran's.

The most critical part is mapping the language's concept of "function", an
executable body of code where all the referenced variables have specific
instances/allocation, reachable at the time of its invocation, _with_ the
processor's, or most likely, C's view of how the processor underneath
haphazardly implements block structure.

If your JIT compiler can seal a body of code and guarantee to implement the
environment (the mapping from address/name to value) in away accessible to the
dumb processor, then you will win.

This requires that the compiler hacker implement code to convert values from
HLL types, to lower level ones. And convert the implementation of closures and
environment, to something more palatable to the lower implementations.

Below is a simple "JIT" interpreter. It uses pre-compiled C code in a shared
library, along with its own "natively" compiled operations. A real JIT
compiler following this protocol would just make sure it's able to find the
functions by any other means, other than compile-time name lookup, as this one
does. It can maintain its own explicit symbol table, hashing a function name
along with type signature, and dispatch on it at run time.

In the code below, ADD and SUB are the usual addition and subtraction
operations, precompiled "for speed", while DIV and MUL are "interpreted" byte-
codes. In the current design, both types of functions (native and bytecode)
are mapped into a single namespace; typically, there are predicates to
distinguish between the two. In most Common Lisp implementations, the two co-
exist side by side, and you can call and interpreted function just as you can
call a native one.

This is just a quick 1-hour hack, let me know if it's not clear.

Here is the runtime library.

    
    
      /* --- compiled-functions.c   */
      
      /* Compile with:  gcc -fPIC -g -c compiled-functions.c  */
      
      
      extern int add (int x, int y);
      extern int sub (int x, int y);
      
      int add (int x, int y) {
        return x + y;
      }
      
      int sub (int x, int y) {
        return x - y;
      }
      
    

Here is the JIT compiler:

    
    
      /*  jit-test.c   main driver */
      /* compile as gcc jit-test.c compiled-functions.o */
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      
      /* library functions */
      extern int add (int x, int y);
      extern int sub (int x, int y);
      
      /* compiled opcode values */
      #define ADD 0
      #define SUB 1
      #define MUL 2
      #define DIV 3
      
      /* string opcode names */
      char *opcode_names[] = {"ADD", "SUB", "MUL", "DIV"};
      
      #define OPCODE_COUNT 4
      
      /* the values returned by the parser */
      struct input {
        int op, arg1, arg2;
      };
      
      
      /* rudimentary parser, converts strings to internal structure. */
      struct input *parse_input (char *op_name, char *arg1, char *arg2) {
        int i, opcode = -1;
        struct input *parsed_input = (struct input *) malloc (sizeof (struct input));
      
        /* convert opcode name to opcode value */
        for (i = 0; i < OPCODE_COUNT; ++i) {
          if (strcmp (op_name, opcode_names[i]) == 0)
            opcode = i;
        }
        
        if (opcode >= 0) {
          parsed_input->op   = opcode;
          parsed_input->arg1 = atoi(arg1);
          parsed_input->arg2 = atoi(arg2);
        }
        return parsed_input;
      
      }
      
      int main (int argc, char **argv) {
        int result;
       
        /* this points to an unknow jit function linked against us */
        int (* jit_fn) (int, int);    /* <--------- Function pointer! */
      
        struct input  *input = parse_input ("DIV", "4", "3");
      
        if(input) {
          printf ("%s %d %d ==> ", opcode_names[input->op], input->arg1, input->arg2);
        }
      
        switch (input->op) {
        case ADD:
          jit_fn = &add;
          result = jit_fn (input->arg1, input->arg2);
          break;
        case SUB:
          jit_fn = &sub;
          result = jit_fn (input->arg1, input->arg2);
          break;
        case MUL:
          result = (input->arg1 * input->arg2);
          break;
        case DIV:
          result = (input->arg1 / input->arg2);
          break;
        }
        
        printf ("%d\n", result);
        
        return 0;
      
      
      }

~~~
mahmud
Also, you might benefit from reading this paper:

[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.2...](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.238)

The original block-structure implementation in Algol is still accessible in
Pascal and Oberon literature.

------
_delirium
Do you have a specific virtual machine in mind, or are you asking for general
approaches to the problem? A simple approach is to only JIT function bodies,
and deal with all function calls at the VM level. Then all function calls
within JITted functions call back into the VM, e.g. by being compiled to some
variety of _vm_do_function_call(function, params). Actual VMs include a bunch
of other optimizations, of course; which ones are possible depends partly on
the language's semantics (e.g. can we assume functions are statically defined,
or can they be redefined at runtime?).

------
nwmcsween
There is no leap back into an AST, JIT works on hotpaths and applies
optimizations to an AST (bytecode) directly to machine code, there is no
reinterpretation of machine code. 'Communication' if you will is simple -
everything has an address.

------
thepumpkin1979
Dependent method calls are compiled into the single image version of x86 code,
these dependencies are resolved and compiled the first time the call is found
in the bytecode.

