Hacker News new | past | comments | ask | show | jobs | submit login
Bytecode VMs in Surprising Places (dubroy.com)
112 points by todsacerdoti on April 30, 2024 | hide | past | favorite | 49 comments



The SIM card in your phone is running a Java VM.

> Java Card bytecode run by the Java Card Virtual Machine is a functional subset of Java 2 bytecode run by a standard Java Virtual Machine but with a different encoding to optimize for size.

> Did you know, for example, that a SIM can run apps and communicate over the mobile network entirely independently of the host phone? Or that the SIM is arguably more in control of the phone baseband than the phone’s “main” operating system? Or that SIMs can support TCP/IP and run a web server?

https://en.wikipedia.org/wiki/Java_Card

https://mobiforge.com/news-comment/the-sim-the-tiny-computer...


Contactless and EMV Chip cards are the same.

During provisioning the server sends a signed Java Applet to the Secure Element which validates the chain of trust then installs the applet. The EMV protocol involves "application selection" which lets the payment terminal tell the Secure Element OS which applet it wants to communicate with.

ARQC (Authorization ReQuest Cryptogram) is where the applet on the card takes the payment terminal's payment request and identity info, signs it with the PK keypair unique to that provisioned card, and sends it back to the payment terminal which then hands it to the card issuer for validation and approval.

A bunch of Java bytecode: the chip/SE, the payment terminal, and likely also on the server side.


The really cool part is that they are powered wirelessly, boot up cold, do the signing with their key and complete the transaction in a tap.


So much can be done in so little resources when you don't need to load 4177 files from node_modules folder


Not all contactless payment cards are EMV cards (or even smartcards!), and not all EMV cards are Java cards, though.

Early contactless payment protocols also started out as something adjacent to, but distinct from, EMV; they eventually got included under the EMV umbrella, but unlike for contact payments, the protocols (called "kernels") vary considerably between e.g. Visa and Mastercard.

There's still a wide range of non-EMV contactless payment cards in use today, mostly for stored-value transit use cases (EMV works best when there's a network connection and doesn't handle long-term two-sided offline cases very well). NXP is an important player there (with their Mifare series of cards/chips); Felica is another (mostly in Japan).


Your credit and debit card is too – even when using tap and pay (it's powered wirelessly through the field)!

I think the fact that it's running loadable code at all is what's unexpected here, though, not really the fact that it does so using bytecode; that's mostly for compatibility across card OS vendors.

Java/the JVM was what was cool at the time these high-level programmable smartcards started becoming popular. Introduced today, they'd probably be running WASM.


The Java installer's installaation text "_ billion devices run Java" makes more sense after this. That's pretty amazing; thanks for the link.


While this is true in the general case, some number of smart cards and SIMs run Java Card directly on the hardware, using the Jazelle instruction set from ARM. So a Java Actual Machine, rather than a Java Virtual Machine.

https://www.design-reuse.com/news/4228/arm-securcore-support...


Wonder how many flops can one get from it. Did someone port Doom to a sim card already?


Historically, smartcard chips were very underpowered 16- or even 8-bit MCUs that couldn't even perform Java bytecode validation on-chip (which caused some security vulnerabilities in the past; there's interesting academic literature on this if you're curious, for just one example see [1]).

These days, these are sometimes ARM M0 cores which are significantly more powerful. I think Doom over SAT (SIM application toolkit) is within reach :)

[1] https://xavierleroy.org/publi/oncard-verifier-spe.pdf




Apple's reference: https://developer.apple.com/fonts/TrueType-Reference-Manual/...

(I spent a lot of time with both of these when hand-hinting my programming font.)


Python pickles are bytecode programs, but different bytecode than the Python bytecode compiled from .py files. https://docs.python.org/3/library/pickletools.html#pickletoo...


VMs are a very interesting way to decompose a problem - you can implement the fundamental operations, test them thoroughly, and then generate bytecode and debug the bytecode implementation.

A couple years ago I thought about giving a presentation on a Golang VM I wrote to solve a simple problem, but it was seen as kind of a heterodox approach and I didn't want to deal with a million "umm acksully" questions about whether it was a real VM.


Interpretation is the most fundamental pattern in computer science, and I'd argue the most powerful. It (tries) to decouple semantics from implementation. It recurs at all levels of the stack.

I'd also argue it's the most beautiful.


I love how POCSD[0] generalizes the interpreter concept to include everything from the CPU up the stack. (Similar to how they generalize naming to include everything from physical memory addresses and registers up the stack.)

[0] https://ocw.mit.edu/courses/res-6-004-principles-of-computer...


Agreed. And one of the most difficult one to get right in security sensitive contexts.


There's also DTrace, which IMO clearly influenced Linux tracing (with the decidedly un-DTrace-like and disastrous SystemTap being replaced by eBPF which... is a lot more similar to DTrace than it's not).

Bytecode VMs are all over the place.

TFA added PostScript and TrueType, and comments here mention others. There's also Java, which is oddly not listed. And Lua. And...


Here’s a fun one: A proprietary bytecode VM allows body makers to execute custom software built with Diamond Logic Builder on Navistar trucks and buses. Instead of cutting and splicing wiring harnesses, builders can add various devices and logic to operate their systems (tow winch, water pump, lift bucket, etc.) with DLB.



And there are the examples from host firmware, from FCode in OpenFirmware https://github.com/openbios?language=forth that booted the IBM, Sun, and PowerPC Macs https://github.com/openbios?language=forth to the UEFI world's EBC https://github.com/pbatard/fasmg-ebc


A few more: gnu bc, see execute.c; gnu awk, see eval.c. I've also been told Valgrind has one but I'm not sure where.

And a few more databases [0]: Mongo, Postgres, and SingleStore.

[0] https://notes.eatonphil.com/2023-09-21-how-do-databases-exec...


Valgrind also has a JIT, iirc!


But where! I only hear the rumors I haven't found the code. :)


The IR is here: https://github.com/rantoniello/valgrind/blob/master/VEX/pub/...

The rest can probably be found by searching around but I'm on my phone right now



The first software written for the first microprocessor - the Intel 4004 - powering the Busicom calculator, included a Bytecode VM.

https://thechipletter.substack.com/p/bytecode-and-the-busico...


Miri [0] is an interpreter for the mid-level intermediate representation (MIR) generated by the Rust compiler. MIR is input for more processing steps of the compiler. However miri also runs MIR directly. This means miri is a VM. Of course it's not a bytecode VM, because MIR is not a bytecode AFAIK. I still think that miri is a interesting example.

And why does miri exist?

It is a lot slower. However it can check for some undefined behavior.

[0]: https://github.com/rust-lang/miri


If I remember correctly, jq (https://github.com/jqlang/jq) also uses a bytecode interpreter under the hood



There are other implementations that might do other things.



That's a VM running in their users' browsers, not their backend, no?


* EDIT: the below statement isn't true: Integer BASIC wasn't implemented with SWEET16 – it was simply included in the ROM

Apple II Integer BASIC was implemented using a 16-bit bytecode on a VM called SWEET16 to do 16-bit arithmetic, optimizing for space over speed. [1]

[1] https://en.wikipedia.org/wiki/SWEET16


My 25 line obfuscated BLC interpreter [1] uses a bytecode VM, with the 9th and 10th lines containing a little bytecode Basic I/O System (each byte xor-ed with 46 to make it printable).

[1] https://www.ioccc.org/2012/tromp/tromp.c


Multiple times in my game development career have implemented or worked with proprietary byte code in a project. I've done it on PCs, embedded devices, consoles, etc. The first game I worked on used MDL from the old Infocom days.


I'd add regular expression engines and the Apollo Lunar moon lander. They were extremely underpowered machines by today's standards, a standard Mac charger and most electric toothbrushes include better computers, but they still contained a bytecode interpreter.


The UCSD P-System is the ancestor of many later byte-code VMs: https://en.m.wikipedia.org/wiki/UCSD_Pascal



Another nice one: Uxn

https://100r.co/site/uxn.html


The TrueType engine used to render the fonts on your screen uses bytecode.


Cool — didn't know about that one! Thanks, I added it as an addendum to the OP.


SQLite, Bitcoin Script, Ethereum’s EVM


Not sure if EVM is surprising though considering it means Ethereum Virtual Machine


Same with bitcoin script. Programming platforms are the least surprising places to find bytecode VMs


Maybe after the fact.

Running arbitrary code inside a financial transactions system isn’t the most obvious decision.


also: Delphine software Another World.


Such an underrated masterpiece, even with all of the praise it has gotten. Reading through the internals of the engine makes for a great evening.

Highly recommended Fabien Sanglard series on Another World: https://fabiensanglard.net/another_world_polygons/index.html




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: