Seems like [1] could benefit from use of the X macro [2], should make adding new instructions much easier and you can avoid the hassle of having to keep two separate tables in sync. There's probably quite a few places where the code could be made clearer by using it. Also in the implementation of your functions, there's a hell of a lot of repetition in all the binary operations, another macro which you pass in the operation and the function name would make life earier:
#define binop(NAME, OP) definstr (NAME) { \
long long b, a; \
if (carp_stack_pop(&m->stack, &b) == -1)\
carp_vm_err(m, CARP_STACK_EMPTY);\
if (carp_stack_pop(&m->stack, &a) == -1)\
carp_vm_err(m, CARP_STACK_EMPTY);\
carp_stack_push(&m->stack, a OP b);}
binop(ADD,+)
binop(MUL,*)
...
The repetition of
if (carp_stack_pop(&m->stack, &a) == -1)
carp_vm_err(m, CARP_STACK_EMPTY);
seems like a good place to just use a function to encapsulate all the error checking and handling.
This is a technique I first came across in the code for lcc, as described in David R. Hanson's book "A Retargetable C Compiler: Design and Implementation".
The code in that book feels pretty dated in a lot of places, but this was a neat trick. Every piece of information related to a token is defined in one place:
The binop stuff you can certainly use freely, the X macro stuff is quite well known so I guess that it's also fine to use. I certainly won't be claiming any copyright on the code I shared =)
My personal preference for licensing is one of the BSD or MIT licenses, I'd rather see things of mine used to makie the world a better place than force people to share what they've done with it.
My first reaction was to say "it should be based on the number of CPU/GPU/*PU instructions". But then, there are things like this: http://www.norvig.com/spell-correct.html. I wonder how much actual new-ness you can pack into, say, 100 instructions on a modern CPU.
That would be a a silly rule, for better or worse 10 lines of Haskell might perform the computation of 500 line of Java (even if they took you the same amount of time to write.. heh).
The implementation is really clean! The mixing of general-purpose registers, special-purpose registers and the stack makes the instruction set a bit weird, though.
For example, instead of using registers, why not have OR pick values off the stack like ADD, or vice versa? Why use EAX for the conditional jump instructions when you could look at the top of the stack? Why have REM when you can just MOV from ERX?
It does make it more complex since you now have another layer to deal with. I wouldn't roll by own JIT compiler though, otherwise it would definitely be a crap ton harder. Using something like libjit or llvm's JIT compiler are two good options.
It's nice having a normal, interpreted VM then building a JIT compiler later on for extra speed. A recent Rust project called `js.rs`[0] started out with a simple interpreter and swapped it for a JIT compiler using libjit.
The biggest difference being you now have to interface with the actual CPU and such.
if you're already writing in C, no. Just write some functions to convert those instructions into asm, and write an assembler (or use one of the many out there) to output them to bytes. Then use mprotect(), tadah!
Very concise implementation. I keep thinking there's missing files or at least I'm missing something in my understanding. :)
I wrote a 0-operand VM in C a few years ago that used a lot of the same concepts but considerably more code, like 10x at least. I will learn a lot from this.
Unfortunately, with the ever-increasing processor-memory gap (the growing discrepancy between processor and memory speeds, also called Memory Wall, if I'm not mistaken), 0-operand, or stack VM's are seeing ever-increasing performance penalties.
Perhaps, if you're interested in learning to write well-written C code, you might be interested in the book "C Programming: A Modern Approach"[0] by King. Coding style is an important topic in his book.
Thank you, seems very interesting. Usually, I only write pure C code when I really have to. Otherwise, I see no reason why someone should restrict "shimself" from using C++.
As inspiration, I would suggest to look at the virtual machine (byte code interpreter) of the Lua language. You can find several papers describing the design at http://www.lua.org/docs.html. The code base is also very small and clean.
A year (or two) ago, I wrote a simple virtual machine [1] of my own. Fun time. Interesting exercise. Wish I had courage to post it here and get feedback from the HN. Anyways, yours is much cleaner and concise code. Kudos.
Posted my reply on wrong comment (Got confused between akkartik and tekknolagi) :). Anyways, thanks for sharing.
Are you planning to add memory management opcodes (Allocate memory, free memory)? I see DBS, DBG instruction, but it is like key-value store (since it uses carp_ht internally).
Nope, rebuilding didn't help.. never mind, I'll look into it later tonight.
Edit: I just noticed you changed the target (why?) and are also immediately deleting carp.out. So I was indeed using the stale version in spite of using 'make clean' :)
Still, I'm concerned that you might be triggering some sort of undefined behavior, which means it might work for you but not on a slightly different machine or compiler version. I tried printing out token lexemes in carp_run_program, right after the call to tokenize(), and some of the tokens printed binary garbage, suggesting that they might be missing a terminating null, or worse. Does this look right?
>The goal is to try and build a small (and decently reliable) VM from the ground up, learning more and more C as I go.
The author explicitly mentions, he is in the process of learning C while he's building something using it.
This is interesting because the folks at [osdev.org](http://wiki.osdev.org) keep stressing that you must be a god-level expert in C, before you even think of getting into systems programming.
I think the guys at osdev mean you need to be good at C before doing systems programming at any form of a professional level. If you don't intend for your code to actually do anything (I mean this is just a side project or something, not for somebody to use in production) than you can really do whatever you want.
This is neat. For others interested in this sort of thing, https://challenge.synacor.com/ specifies a virtual machine, and comes with a binary that performs a self-test -- it is perhaps a neat way to get started.
Can anyone link me to some literature on VMs? Sure, I can analyse the code, but from a design perspective, I would love to learn more about how VMs work, where they can be used, etc etc.
"and took a look at the MSP430's instruction set."
I wonder in a general sense if there are many VMs that explicitly copy good older CPUs. I've always thought the 6809 would make an awesome VM. After all, it was pretty awesome to code on in the real world. Or a VM based on the classic 1802, 6502, or Z80 with some minor mods.
If you really want to warp peoples minds give them a VM based on IBM HAL assembler, basically turn Hercules and VM into a hypervisor rather than an application.
CISC would be better than RISC because one should keep in mind that for many operation most of the execution time is VM overhead.
But if you want performance you'd rather tailor your instruction set to the language it will execute (assuming you're not making a general purpose VM). CPU makers like Intel do actually keep a close look on what programs are doing in order to evolve their instruction set.
[1] https://github.com/tekknolagi/carp/blob/master/src/carp_inst...
[2] http://www.embedded.com/design/programming-languages-and-too... and the following parts