
TinyVM: A lightweight, fast virtual machine in &lt; 500 lines of ANSI C - coderdude
https://github.com/GenTiradentes/tinyvm
======
anon_comenter9
Unless this is relying on the compiler optimizing this, the dispatch looks to
be not good

if(vm->pProgram->instr[instr_idx] == MOV) _arg0 =_ arg1; else
if(vm->pProgram->instr[instr_idx] == PUSH) stack_push(vm->pStack, arg0); else
if(vm->pProgram->instr[instr_idx] == POP) stack_pop(vm->pStack, arg0); else
if(vm->pProgram->instr[instr_idx] == INC) ++( _arg0); else if(vm-
>pProgram->instr[instr_idx] == DEC) --(_arg0); else
if(vm->pProgram->instr[instr_idx] == ADD) _arg0 +=_ arg1; else
if(vm->pProgram->instr[instr_idx] == SUB) _arg0 -=_ arg1; else
if(vm->pProgram->instr[instr_idx] == MUL) _arg0_ = _arg1; else if(vm-
>pProgram->instr[instr_idx] == DIV) _arg0 /= _arg1; else if(vm-
>pProgram->instr[instr_idx] == MOD) vm->pMemory->remainder = _arg0 % _arg1;
else if(vm- >pProgram->instr[instr_idx] == REM) _arg0 =
vm->pMemory->remainder; else if(vm->pProgram->instr[instr_idx] == NOT) _arg0 =
~(_ arg0); else if(vm->pProgram->instr[instr_idx] == XOR) _arg0 ^=_ arg1; else
if(vm->pProgram->instr[instr_idx] == OR) _arg0 |=_ arg1; else
if(vm->pProgram->instr[instr_idx] == AND) _arg0 &= _arg1; else
if(vm->pProgram->instr[instr_idx] == SHL) _arg0 <<= _arg1; else
if(vm->pProgram->instr[instr_idx] == SHR) _arg0 >>= _arg1; else
if(vm->pProgram->instr[instr_idx] == CMP) vm->pMemory->FLAGS = (( _arg0 ==_
arg1) | ( _arg0 > _arg1) << 1); else if(vm->pProgram->instr[instr_idx] == JMP)
instr_idx = _arg0 - 1; else if(vm- >pProgram->instr[instr_idx] == JE &&
(vm->pMemory->FLAGS & 0x1)) instr_idx = _arg0 - 1; else
if(vm->pProgram->instr[instr_idx] == JNE && !(vm->pMemory->FLAGS & 0x1))
instr_idx = _arg0 - 1; else if(vm- >pProgram->instr[instr_idx] == JG &&
(vm->pMemory->FLAGS & 0x2)) instr_idx = _arg0 - 1; else
if(vm->pProgram->instr[instr_idx] == JGE && (vm->pMemory->FLAGS & 0x3))
instr_idx = _arg0 - 1; else if(vm- >pProgram->instr[instr_idx] == JL &&
!(vm->pMemory->FLAGS & 0x3)) instr_idx = _arg0 - 1; else
if(vm->pProgram->instr[instr_idx] == JLE && !(vm->pMemory->FLAGS & 0x2))
instr_idx = *arg0 - 1;

~~~
jws
Update: I changed it to a switch statement. The euler1 program went from
0.081s to 0.091s with gcc on a core i3.

The cascading _if_ statements are indeed faster than a switch.

EOU - original comment follows…

Without the benefit of profiling, I can suggest it may not be as bad as it
looks.

The relative frequency of the opcodes could make this faster than the switch.
See MOV, PUSH, POP in the front? Theses likely have tiny, inline
implementations and are also probably a large percentage of the opcodes
implemented. They may all fit in the same cache line and give the pipelines on
deep pipeline, branch predicting machines lots of good stuff to work on.

Likewise, the infrequently occurring opcode comparisons in the back part of
the statement, by definition, are almost always false, which should tell the
branch predictors which way to go to again keep the pipelines full.

A switch on the other hand is pretty much a guaranteed pipeline flush.

But… if you asked me to code this control structure without being able to
profile and to get it fast the first time… I'd use a _switch_. (well, maybe
with a couple _if_ statements out front if I knew I had a heavily lopsided
distribution.)

Then if I'd go reread that article that came by HN a couple weeks ago and turn
the control structure inside out and see how much better it was.

~~~
palish
You're very wise. Thank you for sharing.

It's satisfying that you were able to say "There's a good chance you're wrong,
here's why" and then be confirmed by testing. You obviously have a great deal
of experience with low-level programming; relatively rare, nowadays. Mine
comes from graphics programming.

Also, which article are you referring to? Any keywords I can search for?

~~~
jws
I think I am remembering this one:
<http://news.ycombinator.com/item?id=2593095> _The Common CPU Interpreter Loop
Revisited_

But after looking at tinyvm it may be a special animal. The native
implementation of its virtual opcodes are only one or two instructions.
Locality, cacheing, and pipelining are working at their finest here and it
might be best to just let them do their thing.

------
sehugg
This made me think back to a tiny virtual machine that took less than 300
lines -- in fact it took less than 500 bytes of code (thanks to the Woz):
<http://en.wikipedia.org/wiki/SWEET16>

Aw, heck, let's get to the meat of it:
<http://www.6502.org/source/interpreters/sweet16.htm>

------
GrooveStomp
The code actually doesn't need documentation. It's written in a very clear,
straight-forward manner that's easy to mentally parse. Very nice!

~~~
AndresNavarro
As a c programmer I find it quite odd/unnerving actually. It may be simple to
understand its structure and it may look clean, but that isn't much to say of
a project this size, especially if you know what it does. I found the file
division & include pattern quite strange actually, and don't get me started in
the cascading "if/else if" for the instruction dispatch: Not only is it almost
impossible to optimize for the compiler, it's also more difficult to read than
the obvious switch. Also, if the instructions were enums instead of defines,
the compiler could even generate warnings when there are unimplemented
opcodes. I don't use github but I'm tempted to make a branch and a pull
request because it's driving me crazy!

EDIT: I was about to fork it when I noticed that in the other two branches
(fast & simple, default is master) the dispatch is implemented with a switch.
Please somebody explain to me why this wasn't moved to the default branch...

~~~
ehsanu1
Please see <http://hackerne.ws/item?id=2722896>

------
jrockway
No error checking around mallocs?

~~~
Locke1689
jrockway's comment should really be addressed -- this can cause some really
nasty crashes.

~~~
palish
Really? How would dereferencing a NULL pointer cause "some really nasty
crashes"?

Yes, it will crash. That is a Good Thing(tm).

1) It is extremely unlikely that malloc() will ever return NULL (on a PC).
This is due to virtual memory. If you manage to allocate more than 1.5GB of
memory, then yes. Otherwise no.

2) Even if malloc() does return NULL, then NULL checking it is pointless
because program operation cannot realistically continue. You _need_ that
memory. The best and cleanest solution is to crash. You will get a call trace
leading to the exact point of failure (which you won't get if you do NULL
checks + continue program execution). Also, as far as I know, it's almost
impossible for a dereferenced NULL pointer to be a security vulnerability,
unlike e.g. a buffer overrun.

In summary, just let it crash. There isn't any reason not to, _unless_ you're
developing a "third-party library" (e.g. you're a Lua developer, etc) and you
want to leave the decision of whether to crash to the programmers using your
code.

~~~
jrockway
Are you kidding? Most real software does something like allocating some
emergency memory at startup time, and then handles a failing malloc call by
using that reserve pool to save work, perhaps do a GC, and print a decent
error message before quitting.

Losing all of the user's work and printing "Segmentation fault (core dumped)"
is never the right answer.

~~~
palish
The point is, the crash you describe will only occur on 32-bit operating
systems, and only after you've allocated more than 1.5GB of memory. For most
projects, that is extremely unlikely.

Your solution would work. But the extra code complexity, development time, and
maintenance isn't worth it except in very, very specific scenarios, like if
you're writing a third-party library such as Lua or FMOD or FreeType or...
etc.

Here's an alternative solution that doesn't require NULL checking malloc, and
provides all of the same benefits:

1) create a function (I call mine "Sys_Exit()") which allows you to cleanly
exit your application from anywhere in your codebase. For example in my game
engine, my main() function looks like:

    
    
      int main()
      {
        App_Startup();
        while ( App_Frame() ) {}
        App_Shutdown();
        return 0;
      }
    

and Sys_Exit() looks like:

    
    
      void Sys_Exit( int code )
      {
        App_Shutdown();
        exit( code );
      }
    

2) write a "my_malloc" function which forwards to malloc. Use it for all
memory allocation. When it detects that more than 1.2GB of memory has been
allocated, then it prints an error message and calls Sys_Exit().

This allows you to save the user's work, etc, and your shutdown sequence has a
fairly large amount of remaining memory to work with (so that you don't truly
run out of memory after you've fake-run-out-of-memory).

This is a very simple solution if you _really_ care about the extremely
unlikely out-of-memory crash.

\----

tl;dr: It is almost always a bad idea to NULL check malloc(). It tends to
destroy readability, is error prone, and is hard to maintain, for very little
practical benefit.

~~~
daeken
Even ignoring the possibility of out-of-memory crashes, null pointer bugs can
lead to memory corruption and code execution and have been popular targets in
recent years:
[http://www.google.com/search?sourceid=chrome&ie=UTF-8...](http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=null+pointer+dereference+vulnerability)

Edit: [http://flashmypassion.blogspot.com/2008/04/this-new-
vulnerab...](http://flashmypassion.blogspot.com/2008/04/this-new-
vulnerability-dowds-inhuman.html) is a copy of a blog post (which seems to be
missing from the Matasano chargen; odd) about Mark Dowd's crazy Flash null-
pointer vuln. The first comment from Dino Dai Zovi (another crazy good
security researcher) is very relevant:

> Oh, the _sweet_ _sweet_ vindication. I remember an argument that I have had
> twice in the last several years about whether one should check the return
> value of malloc. I argued that it should always be done for two reasons:
> Reading address zero might crash, but not necessarily zero-plus-offset and
> because a compare register to zero is so free performance-wise that it isn’t
> even funny.

> Guess who was arguing the contrary

Edit #2: tptacek's responses on the thread are quite interesting, and I agree.
These are your friends:

    
    
        void *safe_malloc(size_t size) { void *ptr = malloc(size); if(ptr == NULL) abort(); return ptr; }
        #define safe_free(ptr) do { if((ptr) != NULL) { free(ptr); (ptr) = NULL; } } while(0)

------
kragen
A friend asked me if this was a good example to learn C from. I had to say no;
in half an hour of looking at the code, I found seven bugs in a single 60-line
routine. Details at:

[https://github.com/GenTiradentes/tinyvm/commit/e78c192313c35...](https://github.com/GenTiradentes/tinyvm/commit/e78c192313c358959f1aa322b4097f0f3aeb056c#commitcomment-459534)

------
saintfiends
I wish I could read C code like this. I really want to, but where do I begin.
If someone could point out that would be very helpful.

~~~
leon_
find main() and follow the parts that interest you - that's how I start with
reading a new codebase

~~~
saintfiends
first thing I did. Opened the main.c, tried to follow the include files.
That's where the problem starts, as it branches out I start to lose track.

When ever I start reading code, my confidence hits rock bottom. Seriously, I
just feel really bad about myself and just walk away.

~~~
rat
Try opening it up in an editor/ide with ctrl+click to step into function or
similar like gvim or eclipse(and hopefully a way to jump back. eclipse also
has a nice expand macro feature which lets you look at macros fully expanded
and at different expansion stages.

~~~
saintfiends
you are a life saver. Kind of embarrassed that I didn't do a Google search on
that one.

I just found ctags. :D !

~~~
rat
darn forgot to mention ctags. If you are up for it supposedly scope is better
but I couldn't get it set up. I don't know of any vim plugin for eclipse like
macro expansion but if you see hairy macros you can always do gcc -E |vim -
(gcc -E does full includes so macro expansion + importing of all headers)and
search for your code or jump to the bottom.

~~~
kragen
> supposedly scope is better

cscope?

------
monopede
Looking at the instruction set it doesn't even support subroutines. It would
only need a way to get and set the current program counter (like ARM) and you
could implement subroutine calls using push, pop, and jmp.

~~~
jws
I think subroutines would go something like:

    
    
       PUSH the_address_you_want_to_come_back_to
       JMP  some_other_spot
     the_address_you_want_to_come_back_to:
       BLAH
       BLAH
       ...
     some_other_spot:
       BLAH
       BLAH
       POP some_scratch_address
       JMP some_scratch_address
     

So yes, JSR is two opcodes and RTS is also two opcodes, but it works.

~~~
kragen
In your first snippet, you seem to be assuming that JMP is direct, and in the
second, you seem to be assuming that JMP is indirect. It appears to be direct:
[https://github.com/GenTiradentes/tinyvm/blob/master/tvm.c#L5...](https://github.com/GenTiradentes/tinyvm/blob/master/tvm.c#L59)

You could, of course, do it with self-modifying code if that's possible. (I'm
not familiar enough with the VM to know if it is.)

------
udoprog
This is an epic introduction into virtual machines for the brave C coder. Very
neat and clean, thank you.

------
voidmain
It appears to have no indirect load or store instruction. I don't think it's
Turing complete!

------
zwieback
Made me think of Sweet16, the 16 bit virtual machine Woz put into the Apple ][
ROM.

~~~
sehugg
Creepy man. Are you my clone or something? :)

~~~
gcb
he's running you in a vm

------
phektus
check the commit message:
[https://github.com/GenTiradentes/tinyvm/commit/35c2fdbdabc63...](https://github.com/GenTiradentes/tinyvm/commit/35c2fdbdabc63e660798deed7ca01191f00f64d9)

------
leon_
Ah, I love virtual machines/cpus :] Some weeks ago I found out that you can
create pointers to goto labels in C (gcc and clang that is - it's a non
standard extension afaik).

I then implemented a super small subset of the 6502 to test how messy the code
would like:

<https://github.com/jsz/6502/blob/master/vm/cpu.c>

I don't know if I should like it or not :)

~~~
lobster_johnson
That's neat. I like how your code is so concise and readable. Now all you need
to do is to emulate a display, a sound chip and a tape cassette drive. :-)

~~~
RodgerTheGreat
I have a project that's kinda along those lines, but it's not based on the
6502: <https://github.com/JohnEarnest/Mako>

------
Locke1689
While academically interesting, I'll wager a guess that QEMU wins in every
performance comparison, right? While writing a TCG-style thunking generator is
not "simple", it is extremely fast.

I'm also not really sure what instruction set you're targeting here. This is a
very small subset of IA-32(e) if I'm not mistaken.

~~~
lobster_johnson
This is, as I understand it, not an emulator but a VM core for running
bytecode. A better comparison would be the JVM, or LLVM, or any of the VMs
used by popular languages such as Ruby and Python.

~~~
erikb
Can you explain the difference? From my point of view, both are in principle
loops that execute a set of Assembly-like instructions.

~~~
lobster_johnson
Emulators like QEMU are designed to emulate an existing instruction set such
as x86, and they also need to emulate a specific memory model, interrupt
model, I/O subsystem etc. VMs such as in this post are designed without such
constraints.

Both are designed around the core principle of a "virtual machine" — decode
some kind of stream of machine code instructions, execute each instruction,
and maintain the stack and memory state along the way — but they target
different use cases with different nuances in the implementation.

