Hacker News new | past | comments | ask | show | jobs | submit login
A JVM in Rust part 5 – Executing instructions (andreabergia.com)
98 points by andreabergia on Aug 28, 2023 | hide | past | favorite | 33 comments



Rust noob here so please be gentle - can someone explain why in `Vm<'a>` the lifetime is needed? I see that in tests, `Vm<'static>` gets created, presumably because there's nowhere to grab a better lifetime from? But if that's the case, would it be possible to express this in a different way? ISTM that Vm owns everything it needs, so is there a way to avoid bubbling the lifetime up?

Similar question applies to ClassManager - it already owns all the classes (they are in an arena), so it looks like a lifetime is needed because there's no way to say "things live as long as ClassManager lives" w/o introducing the lifetime at `ClassManager<'a>` level...


The problem arises by the fact that `Class` needs to refer to other classes:

    pub struct Class<'a> {
        pub superclass: Option<ClassRef<'a>>,
    }
    pub type ClassRef<'a> = &'a Class<'a>;
Given that classes are managed by the arena, I know the reference will not be dangling thanks to the 'a lifetime.

Initially I had implemented this with raw pointers, without lifetimes, but then I switched to the reference because it felt more "idiomatic" and I had to put the lifetimes just about everywhere to make the compiler happy.

If there are better ways to do this, I would be really happy to learn, though!


For things like classes that aren't going to be created and destroyed quickly it likely is best to just use a Rc. Then you know it will be available and don't need to bother with lifetimes.

Of course there is a good chance that you will eventually want a VM-scoped lifetime for something else so maybe it is best to just start now.


Yeah, I think I understand why you had to write it this way, but the fact that this lifetime needs to be bubbled up so far up is really non-intuitive to me.

> If there are better ways to do this, I would be really happy to learn, though!

Me, too. :) I find myself limited by my way of thinking here, coming from C++ ("I _know_ it lives long enough, why don't you let me express this?").


Agreed, it _is_ annoying. It bubbles up everywhere and it feels like "something to silence the compiler" more than "something to express the safety of the code", as other people have pointed out.


The better way to do this is to use an indexing arena such as `slotmap` or `generational-arena`, so `ClassRef` is not actually a memory pointer to anything but can still be looked up in the context of the class collection. Or, if you never need to remove any values, you can use Vec as your arena and usize as your key.


IMHO, using an vector index as a pointer should be a last resort (as an alternative to unsafe). Although it's memory safe, you still have many of the traditional issues like stale pointers / use-after-free, plus you also have the cost of a bounds-check on every access and making sure you use your stored indexes with the correct container instance. If you can solve your problem by using safe code with lifetimes, that should be preferred.


What is presented here is a self-referential type, and Rust does not allow you to declare self-referential types, and an indexed arena is a solution to that. You are correct that you still have all the issues you described - that doesn't mean it should be a 'last resort' to stop using an incorrect system of lifetimes, because there is no 'safe code with lifetimes' that encodes a self-referential type. The type safety issue is solved by using slotmap or typed-generational-arena, which use unique key types.


Same way to write self referential structs - use index types into the whatever arena you are using. Indexes are usually 32 bit so they are a bit faster than pointers.

If you are building one off trees such as for parsing and ast transforms, bumpalo is your friend.

In your case, you can look into generational arenas and slabs which are useful for graphs.


You're right there – It looks like the author is trying to express "this lifetime lasts as long as the VM exists", which isn't possible to express in Rust. For VM::new, the callsite just picks a lifetime that lasts long enough – even if that outlasts the VM, making the lifetime useless. This is a bug and unsound.


My understanding is that if Rust let you compile it, then there is no bug with a lifetime (the code doesn't use `unsafe`). There may be something magical happening under the hood that I do not fully understand, but in Rust I trust. ;)


Your understanding of Rust is correct, but the code uses unsafe: https://github.com/andreabergia/rjvm/blob/93e7e48db085e780b0...

As a result, when the VM gets dropped, the memory gets deallocated even if there are still objects that reference it.

Somewhat related, there also is the issue that when you create multiple VMs, you can use classes/objects/… created by one VM with another.


IIUC this should be fine - MemoryChunk doesn't seem to deallocate the memory (so we leak), unless there's a dealloc somewhere (sorry, not logged in to GH so cannot search). Which is not the worst outcome here, because as you said, we don't want to deallocate this memory since there may be places that still refer to it...

As for where the static lifetime gets into play, VM creation doesn't use unsafe & compiler is happy about it, even though ISTM that there could be a tighter bound on the lifetime? https://github.com/andreabergia/rjvm/blob/93e7e48db085e780b0...


VM creation is using unsafe (the line I pointed out above). You're right, I can't find any deallocation either – but weirdly it /does/ segfault (or sometimes have other memory related errors) if I drop the VM while it's heap is still referenced.

The thing with a tighter bound on the lifetime is that VM carrying a lifetime for itself is fundamentally flawed. A solution would be to remove said lifetime, and have all methods on VM that return lifetimed objects take said lifetime from the reference to the VM.


That sounds reasonable (and way better than what I have implemented).

I am not sure I am going to try it though - it probably is a bit too much code to change for something that I consider "done" and I am not working on. :)


Nice and detailed, thanks! As a struggling Rust n00b it's instructive with these kinds of walk-throughs.

Found a typo: "[...] and the program counter will be implemented." -- the last word qouted should be "incremented", right?


Thanks, fixed!


Pretty interesting to see the progress on this. Wonder if any of the JVM devs are looking at this, and curious how far it will progress.


Maybe they bookmarked to see if there'll appear fun stuff later, it's still pretty much in the gruntwork phase of adding definitions/loading/basic interpretation to Rust. (Not working on any official projects myself but the experiments I did was in C or Java itself and with the latter you get tools like ASM doing gruntwork quickly)

One funny/annoying part of implementing a Java runtime is that you want mostly coherent handling of basic classes like Object, Class, String,etc.. (don't want tons of special cases in a JIT,etc) but the reflection data that defines Object and Class contains String's (That inherits from Object for a fun circular dependency).

For the most complete runtime I did (supported basic reflection,etc) I went with faux-object's,strings,etc when starting up created from C definitions that mirrored the real object shapes and then once far into loading walked the heap to modify the faux-objects to point the vtables to the actually loaded types for those classes.


Why would they? The direction for safer JVM research is to implement JVMs in Java, which has the advantage of being both memory safe outside a few tiny core areas (same as rust) whilst having cleaner code than the Rust impl (none of these problems with lifetimes).

See here: https://github.com/oracle/graal/tree/master/substratevm/src


Many of the safety benefits of Rust disappear once you start JITing machine code and jumping to it...


When storing an object in an array, the value is checked to be a Value::Object (which can't be null), but you also have Value::Null, so storing null in an array fails (I think). Storing null in local variables, a static field or an instance field works fine.


Good point. Someone (probably you) opened a GitHub issue about this too.

I might actually fix this one :-)


From the first post in the series:

> I am very happy with what I have learned, about Rust and about how to implement a virtual machine. In particular, I am super happy about having implemented a real, working, garbage collector. It’s quite mediocre, but it’s mine and I love it. Given that I have achieved what I set out to do originally, I have decided to stop the project here. I know there are bugs, but I do not plan to fix them.

Toy projects are fine, but I just think this should be pointed out clearly that this is not a serious, ongoing project.


Well, I think it is written rather clearly in the first blog post of the series and in the github readme, whose second line is:

> Important note: this is a hobby project, built for fun and for learning purposes.

I am not going to repeat that in every part of the series. :-)


I never said you should, but I probably would because another top level comment on this post currently says that they’re “curious how far it will progress”. Clearly it’s not obvious.


You can reply to toplevel comments.


So can we have a JVM written in Rust, running on WASM, running some Java, running MineCraft?



only if that MineCraft runs Terraria running Pong


Take it full circle. Minecraft running Terraria emulating a 32-bit RISC-V CPU running Pong written in Rust - someone already did the latter part I believe, https://youtu.be/zXPiqk0-zDY?si=M0IHSzRLkaddwwKC


Thank You. I had not seen that. It was incredible.


haha yeah, I saw that last week! so incredibly cool




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: