
Show HN: A minimal stack based VM in C - codr7
https://github.com/codr7/liblg
======
matheusmoreira
The design section of the README is greatly appreciated!

> The core loop uses computed goto, which means that new instructions must be
> added in identical order

> Values are represented as tagged unions.

> Fundamental types are global (as in not tied to a specific VM instance)

What exactly is a global fundamental type? Is there a local counterpart?

~~~
codr7
You may choose whatever storage duration you feel like for your own types, but
built in types are globally declared and therefore shared between VM
instances.

~~~
matheusmoreira
I see, so it refers to the C storage class. There can be several virtual
machine instances referencing the same built-in integer type instance. This
imposes on users the need to call lg_init() and lg_deinit() before and after
using the library. I feel like this could have been avoided by statically
enumerating the built-in types instead of initializing a data structure
dynamically.

The data type structure contains pointers to functions that implement
operations such as addition, subtraction, copying and cloning. An integer type
instance is initialized with addition and subtraction functions. The integer
value representation is part of the value data structure though. So how could
a user of the library define new data types? It seems like it would be
necessary to modify the library's source code in order to add new members to
the tagged union.

Also, perhaps the virtual machine could be optimized further by using tagged
pointers to make integer values immediate, avoiding the need to dereference
the pointer.

~~~
codr7
You would either have to reuse one of the existing representations or modify
the union atm.

I'll add a void *as_any to the union eventually which means you're just one
level of indirection from supporting any representation without touching the
union.

------
tarruda
For those interested in the subject, another great read is Lua register based
VM: [https://webserver2.tecgraf.puc-
rio.br/lua/local/source/5.1/l...](https://webserver2.tecgraf.puc-
rio.br/lua/local/source/5.1/lvm.c.html)

------
gergo_barany
> ideas on how to improve its performance further without making a mess are
> most welcome.

One approach would be designing the input language for performance. In
particular, having statically typed operations. A specialized iadd instruction
for values that you are sure you want to treat as integers would save you a
lot of indirecting through function pointers. A disadvantage of static typing
is that you need to implement type checks if you want to guarantee well
typedness.

Another (orthogonal) approach would be to consider a JIT backend. Not one you
write yourself, that could definitely be considered "making a mess". But in
the past I've had success using LibJIT
([https://www.gnu.org/software/libjit/](https://www.gnu.org/software/libjit/))
for speeding up a stack interpreter. In that case, it was a subset of Python
bytecode (see
[https://github.com/gergo-/pylibjit](https://github.com/gergo-/pylibjit), the
code has very probably bitrotted).

------
jiive
This piqued my intrest. I’m a hobbyist C programmer so forgive me if this is a
rudimentary question. What is the rationale/convention for naming a variable
“_”? I’ve never seen this before.

For example:

    
    
      struct lg_buf *lg_buf_init(struct lg_buf *_) {
        _->data = NULL;
        _->len = _->cap = 0;
        return _;
      }

~~~
mostlylurks
An underscore is primarily used as an identifier to denote a
variable/parameter whose name does not really matter. It's more commonly used
when you're not planning on using the aforementioned variable, but it's also
sometimes used when the identifier is used but its name doesn't matter.

One such instance is the abbreviated scala lambda syntax, where (x:Int) => x +
2 can be abbreviated to _ + 2 in the same way that kotlin would allow
abbreviating it as { it + 2 } with its equivalent default lambda parameter
name, "it".

In the example you quoted, the identifier denotes the sole parameter, so in a
sense its name does not matter, and as such people from certain programming
circles might be inclined to use an underscore instead of taking the time to
come up with an appropriate name. It's not like a more descriptive name would
help in that example, the type name and function name already give sufficient
context for it to be perfectly clear what the parameter is for, and it's not
like parameter names have any semantic significance in C.

~~~
jiive
> It's not like a more descriptive name would help in that example, the type
> name and function name already give sufficient context for it to be
> perfectly clear what the parameter is for, and it's not like parameter names
> have any semantic significance in C.

I agree, and I’ve been thinking about this today. There is also a minimalistic
quality to this style, the _ pointer is more prominent simply by not having a
real name. I like it!

------
fwsgonzo
Funny, I actually have some fib benchmarks for my RISC-V emulator! It uses
fib(40), but I added one for 20. Interestingly, that is the one benchmark that
LuaJIT crushes my emulator in, so I'm still working on beating it, but i don't
really have any plan. :)

    
    
        libriscv: fib(20) median 317ns    lowest: 310ns      highest: 356ns
        luajit: fib(20)   median 146ns    lowest: 145ns      highest: 170ns
        lua5.4: fib(20)   median 631ns    lowest: 598ns      highest: 694ns
    

Running your emulator: $ ./fibrec 567us

Modern compilers used with emulated machines can beat even v8 at times. Cool
project! I divided your number by 100, is that correct? Is there some
additional overhead in setting up / tearing down something?

~~~
codr7
Correct, all benchmarks run 100 repetitions of fib(20).

There is no additional overhead that I'm aware of except calling a function in
a loop, which is intentional.

------
lebuffon
You might want to peruse the C sources for GForth which has been under
continuous development for 20 years or so. It introduced a concept called
super-instructions that speeds things up quite a bit. I am not an expert on
the internals, just a casual user.

~~~
addaon
Thanks for the introduction to this. See paper at
[http://www.euroforth.org/ef03/ertl-
gregg03.pdf](http://www.euroforth.org/ef03/ertl-gregg03.pdf).

