Hacker News new | comments | ask | show | jobs | submit login
TinyVM – A small and easy to understand virtual machine in C (github.com)
241 points by jxub 10 months ago | hide | past | web | favorite | 30 comments

After seeing C4[1], everything else doesn't seem tiny at all... and maybe it's just me, but this is another one of those projects where I found the directory layout rather confusing, especially for something that claims to be "small and easy to understand." bin/ is empty, there's only a single nearly-empty file in src/, lib/ is also empty, include/tvm is two levels but one is empty, and all the interesting stuff actually appears to be in libtvm/ .

More importantly, even after going through all the files in libtvm/ , I still haven't managed to find the main instruction execution loop nor the decoding switch. Sorry, but I don't think "small and easy to understand" applies, and I've had experience with VMs and interpreters and the like for many years. Compare with this, for example:


A rule-of-thumb when investigating the source code of a project for the first time: if I have to go more than 2 directories deep to get to the "meat" of the code, my desire to explore further drops significantly.

[1] https://news.ycombinator.com/item?id=8558822

C library public headers are commonly two levels deep to so that projects using them can add the "include" directory to their header search path and in their code have #include "<libname>/header.h". It helps avoid filename clashes.

I dunno, I find it pretty easy to understand... seems like the main instruction loop is in tvm.c, in the function tvm_vm_run(), which is exactly the first place I looked and seems totally sensible to me.

  for (; vm->prog->instr[*instr_idx] != -0x1; ++(*instr_idx))
    tvm_step(vm, instr_idx);

Compare that to my 6809 emulator [1]. I have a mc6809_run() function that is pretty much the same, but mc6809_step() is in the same file and not hidden in a header file. There also seems to be a lack of memory operations (like reading or writing) other than PUSH/POP.

[1] https://github.com/spc476/mc6809 [2]

[2] Yes, it lacks a README. I know.

>bin/ is empty, there's only a single nearly-empty file in src/

Because that's where the generated binary will go.

>lib/ is also empty

Presumably the same for any lib files?

>include/tvm is two levels but one is empty

That's as intended too, poor man's namespacing.

> Because that's where the generated binary will go. > Presumably the same for any lib files?

Why is even tracked, then? Surely putting it in .gitignore is a better way to go?

I find leaving them in the repo makes it pretty clear how your have to configure your build process. For GCC, I instantly know I need `-I ./tinyvm/include/` and can use `#include <tvm/tvm.h>` etc.

I’ve found that switch fast, in a few clicks, reading this on my phone:


I’ve used the info you provided as the start, of course.

In a .h file?

Have to agree with userbinator.

Isn't it odd to have:

/* nop */ case 0x0

Rather than #define NOP 0x0? Namespacing? How about vm_NOP?

"easy to understand" is added by the poster to the title, the repo doesn't make this claim.

With zero explanation and code it's small experiment, but not really educational.

By C4 you meant C in 4 functions? https://github.com/rswier/c4 here?

Why does he/she have .gitignore files in subdirectories? That seems to make those subdirectories linger for no reason.

Usually because a build script depends on their existence or something.

Then have the scripts create them?

>Why does he/she have .gitignore files in subdirectories? That seems to make those subdirectories linger for no reason.

He has it precisely to make them linger. And what "no reason"? It's so subsequent scripts can put results there.

Better form to just have those scripts create the directories.

Sure, but it's not like why having them persist is a mystery, or hurts understanding the structure of the repo, as the parent claimed.

That's not even what was claimed. http://www.dictionary.com/browse/rhetorical-question

Another good simple one is NekoVM (though not sure how they compare size wise) which is one of the target platforms that Haxe compiles to:


Being interested in wanting to write my own languages (though never finding the time with other pet projects) I always wanted to write something that would ultimately be usable with NekoVM as one of my side goals for a language. Neko also has a module for Apache.

I like an elegant VM, but the ones I find the most interesting are the ones you can actually employ as a compilation target.

For example, there's a C-like language for SUBLEQ machines: http://mazonka.com/subleq/hsq.html

Here's the switch statement, if you're looking for it: https://github.com/jakogut/tinyvm/blob/master/include/tvm/tv...

rather recently I remembered [0], which I then rebuilt in C using the same memory layout and tried to approximate the functionality of the original thing [1]...

[0] https://www.randelshofer.ch/fhw/gri/holzi.html

[1] https://gist.github.com/mar77i/46bd25504dd9e81d0ca7778efcee4...


Scanning the syntax, is there an operation for addressing memory? I see an operation for moving one value to another and that's it. I don't see any method of addressing a variable position in memory (or a variable position in an array if one wants to be more managed about it).

I suppose you could handle all operations from the stack.

But I think a lot of things require memory reads and writes, at least to do efficiently.

I was also looking for it and couldn't find it. You can't do much with the stack if you can't load/store.

I'm kind of worried about usage of strcmp here:


It's also very easy to crash the thing, either with a malformed input file or afl-fuzz. Are you sure C was the right choice here?

For the actual emulation, I would say C is okay. For the assembler? I can think of half a dozen other languages better suited for that.

There is also the Java based "PC emulator":


However, development seems to have slowed down drastically.

Great work. After scanning the source tree with my eyes, I have yet to find the implementation of the instruction set. Therefore, I wouldn't say this is small.

Because it has been hidden in a .h file: https://news.ycombinator.com/item?id=16608174

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact