I have a few questions about the garbage collection. One of the hard parts of implementing a garbage collector is making sure everything is properly rooted (especially with a moving collector). you have the `do_garbage_collection` method marked unsafe[1], but don't explain what the calling code needs to do to ensure it is safe to call. How do you ensure all references to the heap are rooted? This is not a trivial problem[2][3][4].
Also note that I cloned the repo and tried to run `cargo test` every test fails with 'should be able to add entries to the classpath: InvalidEntry(".../vm/rt.jar")' vm/tests/integration/real_code_tests.rs:15:10
It's pretty straightforward. Their VM maintains its own notion of a callstack instead of using the native callstack. That lets them iterate over it and find all of the parameters and locals on the VM's callstack and use them as roots.
There is a performance cost for a VM having its own virtual callstacks like this, but it makes GC tracing much simpler. (It also makes implementing interesting concurrency and control flow primitives like coroutines or continuations much easier too.)
Seems like that would take care of roots for the bytecode's themselves, but not for "native" functions[1]. Allocating a new object could call gc[2], and native functions are using the native callstack. It seems like it would be easy to allocate in a native function and any unrooted references would be invalidated. In fact I see a case like that here[3]. That method creates a reference with `expect_concrete_object_at` and then calls gc with `new_java_lang_class_object`. It avoids UB by not using `arg` after the call that gc's, but there is nothing stopping you from using `arg` again (and having an invalid reference).
Indeed you are right, this is definitely a bug and could cause errors.
I guess the solution would be to add an explicit API to create a GC root, invoked by native methods (which is a bit complicated by the fact that I use a moving collector).
Many years ago I was using SpiderMonkey in a c++ project and I seem to remember there were some APIs for native callbacks to invoke that rooted values. Same problem and similar solution. :-)
> I guess the solution would be to add an explicit API to create a GC root, invoked by native methods (which is a bit complicated by the fact that I use a moving collector).
This is why I do in the Wren VM. Any time a native C function has the only reference to a GC-managed object and it's possible for a collection to occur, it calls a function to temporarily add the object to a list of known roots.
What kind of support is there for generics in the JVM? Maybe I'm too naive to assume that due to type erasure on bytecode level everything is just an Object, ie. a reference type? Or do you mean the class definition parser - but then, you don't really have any checks in place to see if the class file is valid (other than the basic syntax)?
About the generics - some people have pointed out the same on reddit, and yeah, you are correct. The only thing that should be done is to read the Signature attribute that encodes the generic information about classes, methods, and fields (https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.ht...)
As a matter of fact, I just did a test and the following code works! :-)
public class Generic {
public static void main(String[] args) {
List<String> strings = new ArrayList<String>(10);
strings.add("hey");
strings.add("hackernews");
for (String s : strings) {
tempPrint(s);
}
}
private static native void tempPrint(String value);
}
pretty much this - generics have (rare) implications to the reflection (but it's unsupported as well) but overall they are replaced with the nearest class/interface when compiled.
OTOH lack of string interning is super strange [it's trivial to implement], and w/o it JVM is not a thing. String being equal by reference is important, and part of JLS.
Lack of thread makes the entire endeavor a toy project.
Not entirely correct, last I checked string interning was ONLY guaranteed for those strings defined in source and read in during class loading, strings created via the String constructor (f.ex. via StringBuilder) CAN duplicate those strings that you hardcoded in your sources, to get the "canonical" string in those cases you have to invoke String.intern() if memory serves me correct.
Also interning strings to optimize equality checks to be able to use pointer comparison is dangerous for external inputs since iirc at some point interned strings could permanently be stored (unless implemented by a WeakSet) and attackers could fill up your heap (or cause other GC issues since the entire interning functionality is a cache) by filling up your interning lists with crap.
>String being equal by reference is important, and part of JLS.
I never said String must be equal by reference when their content is. However string literals must be equal by reference. I thought Mentioning the JLS would make it obvious, esp. having 'intern' in the context
There's also the new JVM option which eludes me at the moment which sweeps the strings which are promoted to the older generation and interns them.
Not certain about whether `String.intern` is permanently stored; I rather suspect that it sweeps the existing strings since iirc the java string has a hash associated with it anyway.
The rest of the stuff, incl. I/O is actually on the trivial side - threads do require planning. This is what I meant by being a 'toy' project, threads (and JMM) would be impossible to bolt in later on.
The reason you're being downvoted is you keep dismissing this as a "toy" project, and pointing out that it would be hard to make a real project.
But, as the previous commenter attempted to point out to you this project is a *self-described* toy project.
On the very page that is linked, the author of the JVM specifically says:
"I want to stress that this is a toy JVM, built for learning purposes and not a serious implementation."
Thus, absolutely no one disagrees that its a toy JVM. They just want you to stop being dismissive of someone's toy project by repeatedly pointing out its a toy project and not a "not a thing"
Right, just because something is a "toy" doesn't mean it's not still impressive. If someone implemented a "toy" database that could parse and execute SQL queries, distribute data across nodes, etc., you would probably not want to use that in production, but it's still a very impressive project for a single person to pull off, even if it's riddled with bugs. Getting a very complex system to "just barely functional" is still a huge achievement and very cool!
I've pointed the only part that makes it a toy project is the lack of Threading support, the rest is not hard to add. So the items in list of things missing after 'toy' thing should have totally different weights (with Threads being the added to the last).
You're still missing the point—it was always intended to be a toy project, and the author has explicitly declared that they are completely done with it and won't be doing any more work. What does it matter how they sort the list of missing items? It's not a todo list in need of prioritization, it's just an "FYI, these are some of the things I never got to".
Java strings are compared by reference, if they do not match, they're compared by value. There's no guarantee every single string has a single instance. That would hurt performance.
I think op meant "String literals". For those the spec seems to require interning:
> Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.
And later:
> Literal strings within different classes in different packages likewise represent references to the same String object.
Thanks, makes sense yes. Still if the JVM look up in all cases defers to value after ref mismatch, it should work identically, no? Even if interning is mandatory as per spec, I'm not sure how it'd change the outcome of evaluation.
Sure, but also consider that this JVM (intentionally) lacks support for other things that all but the most trivial programs would use. I don't think it's expected by the author that you can throw any random program at it. It's really there just to run your own programs that you've written specifically for it in order to play around with things. And since you know you're writing for this particular JVM, you should know not to do anything that depends on string interning, among other things.
>I don't think comparing everything by value makes or breaks the implementation.
Nothing much to think -- distinct objects must have distinct references [e.g. new String("a")!=new String("a')], literals must have the same references for the same values [e.g. "a"=="a"].
nope - it'd be plain wrong. Literals must be equal by reference, comparing them by value would just break JLS, as they would be equal to any other composed string by reference as well.
> Lack of thread makes the entire endeavor a toy project.
yeah, as stated by the author in the line that says "I want to stress that this is a toy JVM, built for learning purposes and not a serious implementation."
Yeah, that's the only part that makes it a toy project - the rest can be added w/o too much of an effort. This is pretty much what makes it a toy project.
That is pretty awesome! When I joined the Java effort in '92 (called Oak at the time) the group I was with was looking at writing a full OS in Java. The idea being that you could get to just the minimal set of things needed as "machine code" (aka native methods) you could reduce the attack surface of an embedded OS. (originally Java was targeted to run in things like TV's and other appliances). We were, of course, working in C rather than Rust for the native methods. The JVM in Rust though adds a solid level of memory safety to the entire process.
IMAO, Android kind of achieve that...kind of. They write lots of OS logics in Java (or Kotlin) but mixing lots of system services written in native code at the same time, interconnected by the famous (or infamous?) Bind IPC.
Compiling to native isn't exactly black magic and works just as well.
JVM wastes cycles on things like classes, which is not necessary at all. Going forward, Rust has already proven that you can do things at compile time to guarantee things like memory safety.
Compiling to native on whatever random chip is always a recipe for hassles and random incompatibilities which need work to unravel/fix. Having a spec that chip manufacturers can implement (and that can be trivially tested) and/or a JVM that can be exercised and ported to a chip once and validated removes the vast majority of the integration work.
Does it always make sense? No. But clearly a bunch of folks have found it valuable in many niches.
You can justify every project that has succeeded against any odds by saying "but it did succeed". It's true, but imo not a very good way of judging when a technology was a good fit or not.
Is it also too much to ask that someone explain how they think a technology which has been chosen by market participants and is used widely and (apparently) successfully by them does not actually match the real constraints of the market or participants?
And perhaps proposes a concrete alternative that matches those constraints better?
Don't forget to include things like long term support, developer time, interoperability, etc.
> Is it also too much to ask that someone explain how they think a technology which has been chosen by market participants and is used widely and (apparently) successfully by them does not actually match the real constraints of the market or participants?
Kind of, yes. Since instead of evaluating based on an understanding of the technology you're saying "In this universe was chosen for a project, the project was successful, show me the universes where the alternative decision was made".
I'm saying that if you think something is crap and doesn't meet customer needs, at least propose a concrete alternative you believe is better so someone can respond meaningfully! Or concretely what concrete needs are not being met!
Currently, we have one example of something that all evidence leads us to believe fits the universe as it exists, at least in that specific niche.
If you think it doesn't, how doesn't it? Or if you're saying there is something better, are you saying that is hand written assembly? Or TurboPascal? Or ADA? Or some as yet not designed system?
I'm not asking for an alternative universe. I'm asking you to support your statement with enough details it can be assessed in the current universe.
Them: People are using Java for this because there aren't alternatives
You: If it is the convenient alternative than it is a good choice
Me: That is a bad justification for something being a good choice
You: What criteria then?
Me: An understanding of the engineering principals/ domain
Am I accurately summarizing this conversation so far? This isn't about alternatives, or what else they should have done. Something can be a bad fit and the right choice.
As an example, I could say "Java dominates that section due to historical artifacts of business, not technology. Java is a bad fit for this type of work otherwise because of the complexity involved in implementing a Java VM in hardware". I can then also say "Java is the only real choice because of those historical artifacts so I have to recommend that you use it unless you're willing to build your own hardware from scratch".
I actually don't have to propose any alternatives at all, hopefully you can see that - we can just evaluate Java as a language (complex VM, assumes a heavy runtime) against the constraints (custom hardware, low energy) and see that the fit is weak. Obviously people overcame that and made it work, and because of that Java is the obvious choice for this technology.
From an engineering perspective, I don't think it's fair to say something CAN be a bad fit and yet a good choice? At least not without acknowledging it was the best choice available, and therefore not likely actually a bad fit?
In that scenario, it's literally the best possible fit.
That you don't have any viable better alternative at hand may be further evidence of that? (and I don't mean from a standards basis 'well, it's locked into financial rules now, so gov't intervention'). I mean, what else was going to work considering all the factors involved? What else could work better, considering the factors involved?
JavaCard is in fact so widely used and implemented (SIM cards, bank cards, health cards, passports, etc.) that it probably has literally 10's of billions of devices manufactured using it (3.5bln claimed as of 2010 - https://www.oracle.com/technical-resources/articles/javase/j...), in essentially every high value target rich environment niche you can think of, and at extremely low costs. Literally sub-cent per-item.
And with very high environmental stresses (like debit cards getting sat on, left in hot cars, run over, dropped in puddles, jammed into random dirty readers over and over again, etc.), those devices keep working.
And everyone from random countries gov'ts to random financial firms to telcos have managed to implement what they need in it without too much difficulty, and a minimum number of security issues. Which is frankly astonishing if you've ever dealt with folks like that.
So love or hate Java, or JavaCard from a stylistic perspective - any perceived complexity for implementing a Java VM in hardware has had no practical economic effect, or slowed down implementation meaningfully.
It's fit for purpose.
Probably also ugly and feels gross using them sometimes, but a lot of fit for purpose stuff is until you've experienced the alternatives. Hopefully you never have to fix a sewage lift station pump, or clear a clogged sewer line, or clean out a transmission after it's burned out.
Each of these has literally hundreds of years of specialized knowledge and expertise behind their often boring looking facades. They're all amazingly complex if you learn about them. And they're all better than throwing sewage in the street, or carrying everything on horseback. And they're beautiful in their own way when you appreciate why they are how they are.
Even if they're not shiny and flashy, there is beauty in them, because they work well.
And they're still amazing engineering marvels, necessary for our lives as we know them and based on the actual engineering principals involved and the problem domain.
> I don't think it's fair to say something CAN be a bad fit and yet a good choice?
So we fundamentally disagree.
> has had no practical economic effect, or slowed down implementation meaningfully.
This goes back to my "to prove me wrong you have to show me alternate universes where other options had that investment made under the same circumstances".
Hardly. That would require an assertion like 'any perceived complexity for implementing a Java VM in hardware did not exist because it was rolled out in the most economic way possible given non-technical constraints, and it also did not slow down the implementation beyond the minimum absolutely necessary for any technical solution possible.'.
Notice the difference?
To propose alternative solutions based on practical economic effect just requires a reasonable degree of comparison to projects of similar scope at other times, reports of difficulty from various vendors, and comparisons of end price for this solution, end price for other solutions of similar scope, to the overall scope of the solution and value it brings. None of which requires perfect alternative universe A/B testing to come to some reasonable analysis.
If another solution could be done for half the price (say $0.005 per unit, instead of $0.01 per unit) but the perceived value for vendors is $1/unit - then it's hard to say there is any practical economic effect going either way. Neither solution would block profitability or value. That said, they could easily be compared and better/worse solutions could also be determined or tradeoffs analyzed based on that data, also without perfect alternate universe A/B testing to come to some reasonable analysis. Industry does this all the time at scale, including projected costs of implementation of various solutions.
If the solution was rolled out within a timeframe considered useful/expected for this kind of solution, then it also didn't slow down implementation meaningfully - as in it didn't block it, or add serious delay. If there is another solution which could have been done in half the time, that's cool. But it wasn't required. Identifying such an alternative, if one exists, could be done if you have any data, without having to do an alternative universe A/B test. Though since the proof is in the pudding, to REALLY be sure maybe it would. But that's hardly what I've been referring to or asking for, clearly.
Doesn't mean they wouldn't have been better solutions, and proposed them as alternatives can easily be done without parallel universes! In fact, chances are they have already been implemented somewhere in another niche, so there is adequate data to do so.
> If the solution was rolled out within a timeframe considered useful/expected for this kind of solution, then it also didn't slow down implementation meaningfully - as in it didn't block it, or add serious delay.
It's actually very well suited to low level extremely low power embedded systems. The toolkit and dev experience targeting these platforms is actually pretty good DX.
32 bit 4mhz processor with ~64kb of nvram all running off of an induction charge!
also every SIM card. which is potentially quite a bit less benign - the baseband chip is this completely separate processor which the SOC can't see into, and the SIM card can snoop messages, send commands, and generally act like a secure enclave. It is/was? used for a couple banking systems for feature-phones like MPESA, where the app can run as a featurephone app with the menu item toolkits the phone provides.
I think the e-sim idea is probably a net security benefit imo, apple has leveraged the carriers out of a fairly dangerous tool.
Recently I was thinking about what a program written to take advantage of Optane's persistent-memory model would have looked like, if you use it like RAM it's forever gonna be slow shitty RAM. Javacard seems to be the closest hit for that, in some ways. Maybe some of the higher-tier javacards have GC.
I guess at that point it's basically a JVM application state snapshot, which is the same thing, so maybe not any better.
Java Card is a very different language that's only superficially similar to Java. It prohibits OOP, only allows static methods, and makes you store everything in byte arrays and use System.arrayCopy. It's basically a less convenient C, with its complete lack of structs.
After reading your comment I was surprised to find out that credit card chips have any processing capabilities whatsoever, which they apparently do, though at least according to gpt4, they are far too basic to run java/jvm. ?
Google appears to be significantly more useful than GPT-4 here. [1] is the third result for me for the query "credit card jvm". [2] is the second result and gives a direct (and more importantly, actually correct) answer. That post links to the Oracle documentation for Java Cards [3] which is the fourth result.
All of this is just as easy as, if not easier than, using ChatGPT. It's unclear that such a tool even serves this purpose (retrieval of basic facts) adequately, so it should probably be avoided in the future.
Fair enough, there is a "Java Card". I'm not convinved that Java is running on any of my or your credit cards in your wallet today, though I'm not willing to bet on it.
It's running on many e-Passports and e-ID cards, i can't find the documentation from my e-ID card which runs on Java, but the chips are quite common
like in
Visa became the first large payment company to license JavaCard. Visa mandated JavaCard for all of Visa’s smartcard payment cards. Later, MasterCard acquired Mondex, and Peter Hill joined as their CTO, licensed JavaCard, and ported the Mondex payment platform to JavaCard.
That's pretty neat. It sounds like Java Card (or at least some other cards) actually "boot up" by way of inductive coupling, ie, via the "contactless" card readers where you just hold your card in proximity to the reader thing. Did not know that, I assumed it was just reading a key via NFC or something.
The whole reason there’s a thing called a chip in the card is that it actually does computing (indeed they’re called smart cards) and that it does the sort of computing (cryptographic challenge-response) that makes these cards much more secure than oldschool magnetic stripe cards.
Even fully passive NFC tags contain logic that needs power to talk NFC back to the reader, there’s no such thing as just reading data via NFC.
This is a classic defcon talk about it. They developed their own cell network with their own sim cards for an event and even built custom JavaCard applications their users could use. They have released all their information and tools used to build and compile Java Card software.
A lot of people assume this, but with contactless EMV there is a whole transaction flow between the card and the reader going on.
You know those 4 lights on a contactless card reader? They indicate different transaction stages between the card and the terminal. I don’t find them that useful because it’s so fast they all appear to light at the same time, but that’s what they are!
If they were just passive tags, they wouldn’t be very secure, they have cryptographic processors onboard with private keys that can sign stuff for the terminal and your bank. The specs for the interaction are all public if you’re interested (I wouldn’t be!) and lookup contactless EMV.
How about respect other people instead of rushing to condescending judgement? the stakes are incredibly low here, I asked ChatGPT for fun, like tens of millions of others are doing every day.
For some reason you announced that you were subjecting us to a low quality information retrieval method, and after 6 months of this, people are irritable. The social norms is to do that sort of thing in private, doing it in public and seemingly proudly came across as coarse and impolite
It didn't help any that it was clear from the initial post you were questioning someone with domain knowledge, which was later gently indicated to you
It's not asking ChatGPT that is anti social but posting it, diluting the entropy of the conversation. Thousands of brains read every posted word before they can discard redundant information. Together we can keep a high quality shared medium, which benefits everyone!
oh, yeah, they do. They often have multiple 'applications' on them, for different purposes (for example, withdrawing from an ATM with a card is a different application to making a purchase, which is different from the ill-conceved idea of using your card and PIN as a two-factor authentication token)
Android isn't conventional Java. For starters, its runtime uses its own bytecode (dex) that's based on registers instead of a stack. But then, also, many things that aren't related to GUI are C++ with a thin Java wrapper on top.
When I think about a "Java OS", I imagine a JVM running in kernel mode, providing minimal OS functionality (scheduler, access to hardware I/O ports) and there not being any kind of userspace.
Nope. For a more realistic example of such a JVM, look at SubstrateVM (written in "SystemJava" and compiled to native code ahead of time along with the app it runs), and "Java on Truffle" (a.k.a. Espresso), which is a JVM written in Java designed to be compiled to run on top of SubstrateVM. Both projects are a part of Graal.
The reason to do this, beyond the inherently neat Inception factor, is that JVMs are a PITA to work on because they're normally written in languages like C++ or Rust which optimize for performance and manual control over productivity. That makes it hard to experiment with new JVM features or changed semantics. If you could write a JVM in a high level very productive language like Java (or Kotlin or Scala) then the productivity of people writing and experimenting with JVMs would go up. It would also make it feasible for "ordinary" Java devs to actually fork the JVM and modify it to better suit their app, at least in some cases.
There's also something conceptually cleaner about having a language and its runtime implemented purely in itself. As long as you don't mind the circularity, that is.
Espresso for example has hot-swap features HotSpot doesn't have, so you can modify your program as it's running in more flexible ways than what regular Java allows.
I find that Rust is like maybe 1.5-2x more productive to code in than say C or C++. Part of that is the tooling has so much less arcane baggage, part of that is that I need to reach less for external tools for metaprogramming, part of that is fewer crazy macro/template compiler errors, and part of that is less time spent debugging.
I've heard very inconsistent things about this, which is interesting. Often people say Rust is less productive, as there's often "makework" involved with satisfying the borrow checker. I suspect a lot of it revolves around how you perceive that sort of thing: it can be cast as both productivity (satisfying it can potentially rule out bugs) or a loss of productivity (you were already satisfied the code was correct).
But I don't have enough experience with Rust to really have formed an opinion on that yet.
I'm about 2 years into primarily writing Rust (after a long time in the .NET world).
The borrow checker slowed me down a lot at the start but these days it's not a big deal. Eventually you internalize the easiest ways to make it happy (clone and pass references as needed) and can reserve tricky optimizations for hot paths where they actually matter.
I would say that today, most of the times I sit down to write some Rust I'm nearly as productive as in higher-level languages... but then ~20% of the time I get bogged down in complicated type signature stuff (especially with async code, ugh) and it's a time suck.
Right, its maybe just a little less productive than something like Java because you do have to work out types a bit more and think about memory a bit more. The end benefit tends to be though that... your code compiles, and it just works. So maybe a little more time up front, for, potentially, less time down the road debugging.
There is definitely a hump to get over (maybe you are never fully as "productive" as languages that just let you shoot yourself in the foot, I say as a full-time Go developer for a few years now). Simple, straightforward code is often harder to write. Don't even get me started on stuff that uses async.
But IMO it's very worth it. What do you get out of wrestling with the unholy, fractured C/C++ ecosystem? The privilege of being able to _use other people's code_. You get that without much hassle in Rust.
"The goal of Metascala is to create a platform to experiment with the JVM: a 3000 line JVM written in Scala is probably much more approachable than the 1,000,000 lines of C/C++ "
He's not saying that. What he is saying is that are simple, non-production quality implementation in Scala is much more amenable to experimentation than a sophisticated, production-quality implementation in C++ that weighs in at 300x the LOC.
But a simple non-production quality implementation in C++ would also be amenable to experimentation and not have the bootstrapping issues as well as provide an easier starting point to incorporate more of the existing optimizations as desired.
That is true, but the author of Metascala wanted to write it in Scala. Other people are free to write a simple C++ implementation of the JVM themselves.
I'm sure it's not complete, but I also wouldn't be surprised if 99% of what's in HotSpot is optimization tweaks and performance boosts which aren't essential to the JLS.
It's also unclear to me if the architectural assumptions made in GC & VM engineering 20 years ago still apply today. I'm not an expert in this field, so take with a grain of salt... but I do wonder if a VM engineered from the ground-up today targeting recent ISAs would look substantially different.
That is a very interesting name for a programming project lol. The Jacobins were a revolutionary political club during the French Revolution in the 1790's. It's also the name of a magazine at https://jacobin.com
When I try to add a lifetime to the `Err` variant of a `Result` and that lifetime is invariant (which it is due to `vm` and `call_stack`) it usually means that I can't use the question mark operator or have early returns in the code[1]. This makes error handling more verbose and less readable. Is that your experience as well?
EDIT: Looks like this is not an issue because the invariant lifetime 'a is not used for the mutable reference of vm or call_stack. So it's not the invariance that is the problem, but rather how Rust reasons about the lifetime of mutable references, which this avoids.
In that case I don't understand what the point of 'a is on VM and CallStack. You can create[1][2] those with any unbounded lifetime (including 'static[3]), which means it is not constraining anything. What is the lifetime 'a doing here? Why not remove it?
I wanted to express the fact that everything that gets allocated (call stack, frames, classes, and objects) is alive and valid until the "root" VM is, thus I used 'a more or less everywhere.
I also struggled with a got a ton of errors from the borrow checker initially, and I fixed many of those with a lot of explicit lifetimes, but it's not impossible that in some places they are unnecessary.
> I wanted to express the fact that everything that gets allocated (call stack, frames, classes, and objects) is alive and valid until the "root" VM is, thus I used 'a more or less everywhere.
That's not being expressed in the type system. The lifetime 'a is unbounded (meaning you can make it anything you want, including 'static) so anything that shares 'a can outlive the vm without rust complaining. it would be no different then if you removed 'a completely. If you wanted to ensure anything couldn't outlive the vm you could tie the lifetime to a reference to the vm, but then the vm can't hold those values (it would be a self-referential lifetime).
Funnily enough I did the same in an early wip version of my toy JVM. Ended up using unsafe to use 'static references internally but only hand out wrappers that include a reference to the JVM. This also ensures that objects/classes/… from one JVM can't be used in another one.
Great learning project, I'm glad the author is having fun. Implementing a VM from scratch is a blast, and I have learned so much in the past doing that kind of thing.
If they're interested in bolting on a GC, it couldn't hurt to look at MMtk. (https://www.mmtk.io/) Some high quality collection algorithms, written to be pluggable to various VMs, and written in Rust.
That's a bummer -- I guess I never noticed that, when I played with it before it was on an M1 Mac, but compiled into an x86 Julia executable & running through Rosetta (which surprisingly did not suck).
Haven't read, but I bet it's likely related to expectations around the x86_64 memory model & atomics. In the long run I see no reason why it couldn't be made portable, but I imagine the authors efforts are elsewhere for now.
I only became aware of it because a former employer (RelationalAI) was heavily interested in replacing Julia's GC with it (for some workloads): https://pretalx.com/juliacon2023/talk/BMBEGY/
When I see such cool projects, I feel very overwhelmed. How do you get started with Rust and master basics to even attempt doing such a thing? Can OP explain?
Likewise. Not to go onto too much of tangent, but on a more personal note I've been generally struggling with this feeling a lot lately.
I've been a professional software developer for almost 10 years, and I _know_ I'm competent (and not an impostor) as demonstrated by my current position and ability to ship things.
However, lately after viewing developer blogs I become overwhelmed that I actually don't know enough and am not a "real" developer. I seem to have formed a notion of an ideal developer in my head and I compare myself against this imagined construct which leads to these feelings. I admire how these people have so much deep knowledge and can express themselves so clearly and concisely, then wonder why I am not like that.
I barely have the energy after work after taking care of my family to do anything further, and I know programming isn't everything but I do have a desire to learn more and improve myself.
I recognize this isn't healthy nor is it rational, but it's just a feeling I can't shake lately.
What you're describing is very common amongst developers. So common in fact, that I've written a post about this https://alic.dev/blog/comparisons
In short: recognizing your insecurities is the first step. The next step is figuring out what's important to you, shedding impossible to achieve and irrational ambitions, prioritizing your goals in life, and articulating concrete steps to further them.
Well, you're probably comparing yourself against the top 1% of developers. It's okay to not be the very best, being in the top 30% of this field already is very rewarding.
I happen to personally know the author, and I'm not really surprised he pulled this off. Using him as a baseline of who is a real developer his extremely unhealthy. Please don't :)
Well, _I_ feel impostor syndrome half the times I open HN honestly!
I did have a bit of experience with VMs before, I wrote many years ago a short series of posts about it on my blog, and at my previous job I dabbled a bit in JVM byte code to solve one very unusual problem we had for a customer. I also read the _amazing_ https://craftinginterpreters.com/ years ago and that gave me some ideas.
But this project was definitely big and complex. It took me a lot of time, and it got abandoned a couple of times, like many of my side projects. But I'm happy I finished it. :-)
Not OP nor am I a Rust expert. I can speak regarding another technology: sockets.
I've been deep-diving into sockets recently. 2 weeks ago I had only a high-level understanding of sockets (learned from casually reading manpages, docs, blog posts, etc.). I decided to read as much as possible because I wanted to understand networking fundamentals, and after a week I learned enough to write some sockets code in Python and C. I know Python quite well, so reviewing the ``sockets'' library made more sense after my deep dive.
If you want to get better at technology A using language X, I suggest either reading/watching as much as you can about tech A, and build stuff with it in language Y. Then you can circle back to learning language X and you've already mastered much of the concepts around technology A.
Break things down. A simple language VM is going to have a way to represent objects in memory, a byte code interpreter, a simple garbage collector, and a way to load things.
A byte code interpret is a stack, some way to represent functions on that stack, and then a loop to interpret beach byte code and move the program counter.
How much do you code in your free time? Like average hours per week?
If it's zero (and no judgement from me if it is; plenty of other things to focus on), then it shouldn't be surprising that someone for whom that number is (speculatively) 10-20 hours per week on average for years has impressive side projects.
Do some embedded work, implement a bare metal program on an ARM microcontroller in C or Rust. Make a LED blink. Then, make the same LED blink in pure ASM. The RP2040 is easy to bring up.
osdev.org, sandpile.org, RBIL, and freevga. The biggest PITA is hardware support. There are many good vintage hardcopy books with recipes for things like reliable port IO and undocumented hardware tricks.
- Microsoft MS-DOS Programmer's Reference (also includes real-mode BIOS calls)
- PC Interrupts
- Undocumented PC
- PC Intern
- Programmer's Guide To The EGA, VGA, And Super VGA Cards
- Graphics Programming Black Book Special Edition
Also, it's worth toying with advances in OS dev past the era of monolithic, microkernel, and hybrid.
1. Capability-based like seL4. It has a number of inherent performance and security advantages including capabilities and excellent IPC.
2. POSIX compatibility layer. Even embedded OSes without the concept of threads or processes can implement POSIX.
3. Hypervisor. They're much easier to add with intel's VT-[xd]. Failing that, fall back to emulation. Translational emulation is very performant.
4. Get good at generalizing interrupt handlers, making them fast, avoiding race conditions, and using lock-free patterns.
Also:
5. Rewriting or trapping unsupported instructions including x87 and MMX.
6. The failure of pure microkernel was the added complexity and management of sequencing multiple resources in a transactional manner. There are great theoretical security and operational advantages in microkernel architectures but they never caught on widely in a pure form.
Sorry, but I followed most of this tutorial, just to be greeted with "and luckily enough we dont have to write this because theres a crate for it" every time a new concept is introduced. Interrupts? x86_64 crate does all that. Keyboard handling, etc? Theres a crate for that.
Of course theres a crate for all that, but thats not the point of making an OS.
> 2. POSIX compatibility layer. Even embedded OSes without the concept of threads or processes can implement POSIX.
Do you have resources about this? I can't quite fathom how it would work, but then again I have no expertise.
> 3. Hypervisor. They're much easier to add with intel's VT-[xd]. Failing that, fall back to emulation. Translational emulation is very performant.
Speaking of which, QubesOS was on HN recently. The essence is having many VMs to minimize cross-app attack surface and privilege escalation.
> 6. The failure of pure microkernel was the added complexity and management of sequencing multiple resources in a transactional manner. There are great theoretical security and operational advantages in microkernel architectures but they never caught on widely in a pure form.
I recently saw an interesting brief article[0] about how memory-safe languages can supercede the compartmentalization that microkernels provide. I'm reminded of Theseus[1], written in rUsT, which happens to reflect the sentiment. I've actually been putting off rereading the Theseus USENIX paper. Again, I'm not nearly qualified to answer whether the security of this is comparable to the best of that of microkernels, barring formal verification. Still, I think it should be explored more.
[0] https://catern.com/microkernels.html (Write modules, not microkernels)
[1] https://www.usenix.org/conference/osdi20/presentation/boos
Just for kicks, has anybody tried to Rube-Goldberg it, and see how many VM's can stack on top of each other. Like have Java App running on JVM written in RUST, Running on WASM, Running on JVM, etc.. etc...
Started working on something very similar a few years back and gave up pretty soon for some stupid reason. Maybe I should try again, am getting better at getting stuff done.
You should. I've been doing this kind of stuff on the side for 25+ years, and nothing ever gets "done." But the educational value is much higher than you think. And the skills learned can eventually propagate out into interesting paid work.
I think of them as retirement projects before I retire. When I actually retire I'll maybe finish them.
(That said, I have in the past tried taking jobs that were adjacent to my "research" interests, and found the joy of building these things from scratch is much better than fiddling with the levers on the side of someone else's thing they built from scratch years ago. I like working on and improving production systems, but if they intersect too closely to my personal interests, it can be demoralizing.)
I know the feeling - this project, like most of my other side projects, got abandoned a couple of times. But I was really curious about implementing a GC and, for once, I managed to finish something. I'm glad I did! :-)
Getting the attention needed for front page is a huge chance. I've seen great stuff get posted 5 or more times before it makes it out of obscurity. HN is highly non-deterministic in these things.
Also note that I cloned the repo and tried to run `cargo test` every test fails with 'should be able to add entries to the classpath: InvalidEntry(".../vm/rt.jar")' vm/tests/integration/real_code_tests.rs:15:10
[1] https://github.com/andreabergia/rjvm/blob/be9c54066c64a82879...
[2] https://manishearth.github.io/blog/2021/04/05/a-tour-of-safe...
[3] https://without.boats/blog/shifgrethor-iii/
[4] https://coredumped.dev/2022/04/11/implementing-a-safe-garbag...