Hacker News new | past | comments | ask | show | jobs | submit login
Jazelle DBX: Allow ARM processors to execute Java bytecode in hardware (wikipedia.org)
111 points by vincent_s 5 months ago | hide | past | favorite | 75 comments



The gains seem to not have been high enough to sustain that project. Nowadays CPUs plan, fuse and reorder so much of micro-code that lower-level languages can sort of be considered virtual as well.

But Java and similar languages extract more freedom-of-operation from the programmer to the runtime: no memory address shenanigans, richer types, and to some extent immutability and sealed chunks of code. All these could be picked up and turned into more performance by the hardware; with some help from the compiler. Sort of like SQL being a 4th-gen language, letting the runtime collect statistics and chose the best course of execution (if you squint at it in the dark with colored glasses)

More recent work about this is to be found on the RISC-V J extension [1], still to be formalized and picked up by the industry. Three features could help dynamic languages:

* Pointer masking: you can fit a lot in the unused higher bits of an address. Some GCs use them to annotate memory (refered-to/visited/unvisited/etc.), but you have to mask them. A hardware assisted mask could help a lot.

* Memory tagging: Helps with security, helps with bounds-checking

* More control over instruction caches

It is sort of stale at the moment, and if you track down the people working on it they've been reassigned to the AI-accelerator craze. But it's going to come back, as Moore's law continues to end and Java's TCO will again be at the top of the bean-counter's stack.

[1] https://github.com/riscv/riscv-j-extension


The Java ecosystems initially started with optimizing Java compilers. That setup could benefit from direct hardware support for Java bytecode. Later, it was discovered that it is more beneficial to remove the optimization from javac in order to provide more context to the JIT compiler. Which enables better optimizations from JIT compilers. By directly running Java bytecode, you would loose so many optimizations done by Hotspot, that it is hard to get on par just by interpreting bytecode in hardware. The story may be different for restricted JVMs that don't have a sophisticated JIT.


The current (largest) end-user Java ecosystem is in practice Android and it ahead-of-time compiling ART.

Java itself got very good. Though Oracle was blocked to leech money, or have return for their investment, depending on the viewpoint.


I don’t really get your last point - java’s improvements are due to Oracle, not despite it. They have a terrible name, but they have been excellent stewards of the platform.


Android ART would unlikely to exist if Oracle would have been enforcing licensing requirements as they wished.

ART runs on devices for 1B+ users and is more relevant for the world population as Oracle. Although we can speculate likely Android would have switched to something else if Oracle were to win in the court.

Ironically Android more realised Java’s original light client vision “Write once, run everywhere” if you consider “everywhere” as all around the world, by every human, with various device architectures.


I wouldn’t be so quick to dismiss the huge deal of internet services running OpenJDK. Like, AWS itself, Apple’s backends, a huge part of Google’s infrastructure, the whole of Alibaba that is responsible for some crazy amount of transactions, just to mention a few.


Java is a light client programming language that found a small niche in server programming.


It doesn’t really make sense to attribute the initial domain of a language to the whole platform. Especially when Java can have so many faces, and that it’s numero 1 use case is backend applications. Android is a comparatively small market.


> Java ... found a small niche in server programming.

Lol, virtually all business server applications are running on th e JVM in the corner of the world that I see.


Dalvik runned like shit, was only a basic interpreter at the begining, and it was Sun that was ripped off before Oracle took over.

Google could have acquired Java, after screwing Sun, and decided to take a bet on not doing it.


As free beer AOT compilers for Java are commonly available, and as shown on Android since version 5, I doubt special opcodes will matter again.

Ironically when one dives into computer archeology, old Assembly languages are occasionally referred as bytecodes, the reason being that in CISC designs with microcoded CPUs they were already seen that way by hardware teams.


I'm still not decided on AOT vs JIT being the endgame.

In theory JIT should be higher performance, because it benefits from statistics taken at actual runtime. Given a smart enough compiler. But as a piece of code matures and gets more stable, the envelope of executions is better known and programmers can encode that at compile-time. That's the tradeoff taken by Rust: ask for more proofs from the programmers, and Rust is continuing to pick up speed.

That's also what the Leyden project / condensers [1] is about, if I understand correctly. Pick up proofs and guarantees as early as possible and transform the program. For example by constant-propagating a configuration file taken up during build-time.

Something I've pondered over the years: a programmer's job is not to produce code. It is to produce proofs and guarantees (yet another digression/rant: generating code was never a problem. Before LLMs we could copy-paste code from StackOverflow just fine)

In the end it's only about marginal improvements though. These could be superseded by changes of paradigm like RAM getting some compute capabilities; or programs being split into a myriad of specialized instructions. For example filters, rules and parsing going inside the network card; SQL projections and filters going into the SSD controller; or matrix-multiplication going into integrated GPU/TPU/etc just like now.

[1] https://openjdk.org/projects/leyden/notes/03-toward-condense...


The best solution isn't AOT vs JIT, rather JIT and AOT, having both available as standard part of the tooling.

Android has learnt to have both, and thanks to PGO being shared across devices via Play Store, the AOT/JIT outcome reaches the ideal optimum for a specific application.

Azul and IBM have similar approaches on their JVMs with a cluster based JIT, and JIT caches as AOT alternative.

Also stuff like GPGPU is a mix of AOT and JIT, and is doing quite alright.

I am not so confident with LLMs, when they get good enough programmers will be left out of the loop, and will have to contend to similar roles as when doing no-code SaaS configs or some form of architects.

A few programmers will remain as the LLMs high priests.


> A few programmers will remain as the LLMs high priests.

That's interesting.

It's controversial to say that in 2024, but not all opinions have the same value. Some are great, but some are plain dumb. The current corporate right opinion is to praise LLMs as end-all be-all. I've been asked to advise a private banking family office wanting to get into LLMs. For advising their clients' financial decisions. I politely declined. Can there be a worse use case? LLMs are parrots with the brain size of the internet. With thoughts of random origin mixed together randomly. It produces wonderful form, but abysmal analysis.

IMHO as LLMs will begin to be indistinguishable from real users (and internet dogs), there's going to be a resurging need to trace origin to a human; and maybe to also rank their opinions as well. My money is on some form of distributed social proof designating the high priests.


I see the current state of LLMs are when we read about Assembly programmers being suspicious something like high level languages would ever take off.

When we read about history of Fortran, there are several remarks on the amount of work put into place to win over those developers, as otherwise Fortran would have been yet another failed attempt.

LLMs seem to be at a similar stage, maybe their Fortran moment isn't yet here, parrots as you say, but it will come.


I do think, that in the general case, a JIT compiler is required: you can’t make every program fast, without having the ability to synthesize new code based on only-runtime available information. There are many where AOT is more than enough, but not all are such. Note, this doesn’t preclude AOT/hybrid models as pjmlp correctly says.

One stereotypical (but not the best) example would be regexes: you basically want to compile some AST into a mini-program. This can also be done with a tiny interpreter without JIT, which will be quite competitive in speed (I believe that’s what rust has, and it’s indeed one of the fastest - the advantage of the problem/domain here is that you really can have tiny interpreters that efficiently use the caches, having very little overhead on today’s CPUs), but I am quite sure that a “JITted rust” with all the other optimizations/memory layouts could potentially fair better, but of course it’s not a trivial additional complexity.


I am sure there will never be the conclusion that AOT is always better or that JIT is always better. E.g. JIT has a large advantage in long running applications that run at different customers with very different configurations. The scenarios where AOT has an advantage are well known, so I won't iterate them.


> Pointer masking: you can fit a lot in the unused higher bits of an address. Some GCs use them to annotate memory (refered-to/visited/unvisited/etc.), but you have to mask them. A hardware assisted mask could help a lot.

If you're building hardware masking, it should be viable for low bits too. If you define all your objects to be n-byte aligned, it frees up low bits for things too, and might not be an imposition, things like to be aligned.


The sparc ISA had tagged arithmetic instructions so that you could tag integers using LSBs and ignore them


Remember when for a while Azul tried to sell custom CPUs to support features in their JVM (e.g. some garbage collector features that required hardware interrupts and some other extra instructions). Although they dropped it pretty quickly in favor of just working on software

https://www.cpushack.com/2016/05/21/azul-systems-vega-3-54-c...


IBM's Z14 (and later I assume) supported Guarded Storage Facility for 'pauseless Java Garbage collection.'


One of the few elements left like this is the ARM Javascript instruction: https://news.ycombinator.com/item?id=24808207


> as Moore's law continues to end

more like Wirths law proving itself still


My brainfog claims some blurry memories of this ... for one, documentation is lacking so much that an opensource JVM using Jazelle never happened; you wanted to develop a JVM on top of it, you'd pay ARM for docs, professional services, and unit licenses. And second, that once things got to the ARM11 series cores, software JITs beat the cr* out of Jazelle. I don't remember any early Android device ever used it.

ARM is quite capable in vapourware generation. 64bit ARM was press-released (https://www.zdnet.com/article/arm-to-unleash-64-bit-jaguar-f...) a decade before ARMv8 / aarch64 became a thing.

(I'd love to learn more)


Sun wanted to do the same thing in late 90ties - picoJAVA (embedded), microJava and UltraJava (VLIW workstations).

Relegated to the dustbin of history.


Java Card still survives, though.

I find Java Card pretty puzzling. You go from high-level interpreted languages on powerful servers, to Java and C++ on less powerful devices (like old phones for example), to almost exclusively C on Microcontrollers, and then back to Java again on cards. If. it makes sense to write Java code for a device small enough to draw power from radio waves, why aren't we doing that on microcontrollers?


The Java Card environment is quite limited, though, due to resource limitations.

There have been several more-or-less successful attempts at running higher-level languages on microcontrollers, e.g. .Net Micro Framework and CircuitPython. In all of these cases, though, you tend to struggle with all the native device behavior being described/intended by the vendor for use with C or C++ and the BSP for the higher level environment being an afterthought.


JavaCard is used in smartcards e.g. for banking cards. There you want to have more language guarantees to avoid losing money.


FYI UltraJava was renamed to MAJC[0] which IIRC was only used in Sun's XVR graphics cards.

More from Ars (1999) https://archive.arstechnica.com/cpu/4q99/majc/majc-1.html

[0] https://en.wikipedia.org/wiki/MAJC


There is a bit of info including example code on https://hackspire.org/index.php/Jazelle


> I don't remember any early Android device ever used it.

It couldn't have, as Dalvik VM is distinct from JVM.


It executes Java Bytecode. Whether Dalvik VM was/is a "Java" VM is hardly relevant there (not the least because "Java" is so much more than Java Bytecode, and Jazelle does nothing to help with anything on top of the latter).


Apparently it's not even Java bytecode. Would make sense, after all Dalvik is register-based. https://stackoverflow.com/a/36335740


Java bytecode is transpiled to Dalvik's own bytecode as a build step. Dalvik itself doesn't run Java bytecode. This is one of the reasons why Oracle sued Google: clearly Google was trying to appropriate Java with some clever IP law dodges with this Dalvik business.


Huh, I always took their rationale for having the .class -> .dex compilation step at face value (i.e. space efficiency due to shared string literals etc.), but this actually makes more sense, given that J2ME seemingly was fine without it on much more lightweight hardware.



The performance of Dalvik was far below J2ME on Nokia and Sony Ericsson feature phones for a very long time, and Android relied on pushing a lot to C libraries to compensate.


As Nokia alumni it was incredible how much of the Google fanbase believed in Dalvik's performance fairy tail.

ART is another matter, though.


Sure, but J2ME can't seek backwards in open files. (That was added in Java 1.4, and J2ME is 1.3)


IIRC jazelle left it to the implementers which bytecodes to handle in HW and what to trap to SW. Since SW JIT beat the Jazelle implementtions, by the arm11 times, the cpu implementers would just leave everything for the SW traps... So while the original raspberry pi was arm1176j and J meant Jazelle support, it was all already hollowed out.


There was a successor ThumbEE ("Execution Environment") that was comprehensively documented. But it didn't get much attention either and later chips removed it.


The trend of posting a page here based on some detail from a comment, that was posted in another thread ("What's that touchscreen in my room?" in this case) a few days ago, has become quite frequent and a bit annoying.

To everyone who wants to write: but I didn't read that thread and I find this quite interesting; you are free to find it interesting, but I did read about it 2 days ago and to me it looks like karma farming.


Why you gotta yuck other people's yum, man?

To me it seems like Hackernews, as a whole, goes off on the same kinds of thought-tangents as I do, and that makes the site more interesting. And I was one of the commenters about Jazelle on the thread you mentioned.


Because this is a community and I care about what goes on in it. I think this is precisely not a thought-tangent, it is taking the tangent that occurred elsewhere and posting an article about it to score internet points. If we keep doing that, this becomes even more of an echo chamber and a very boring place.


It's exactly the opposite to me: I might not necessarily be interested in a box on someone's wall (even though I did enjoy that particular article) but at the same time be extremely interested in efficient mobile Java implementations with or without hardware support.

Reposting that tangent as a separate submission increases the chance of me finding the conversation.


Then it would be great if the existing conversation could be posted instead, similar to what 'icegreentea2 suggested.


You might be the only one concerned with internet points.

Even so, why does it affect you so much to see other people getting these internet points? Or that they were somehow unearned?

Who cares.


I think I've adequately explained what it is that bothers me about it, and it's not them getting points.


Not everyone spends so much time on this site that they can easily spot a post as an extension of a related discussion elsewhere on the site. Someone posted this page, it got upvoted by others who found it interesting, and now it's on the front page. What's wrong with that?


Popular content is not necessarily good content (very boring to say this, but just look at reddit). And posting articles to get upvotes, which I'm not saying this post is necessarily doing but at least _some_ are doing, leads to lower quality. HN barely has any methods for maintaining overall quality of the website and it will automatically degrade as it gets larger.

To simply allow these posts and having them hit the front page when they get upvotes is a valid position. But I think it contributes to a website that is less interesting.

I don't think these posts should be removed, but they should at least be frowned upon, and/or linked to the original comment thread.


Regarding the quality of posts in general the problem is not what gets posted, there is a ton of junk in the "new" queue and most of it never makes it to the front page. It's what gets upvoted.


I think attribution would be nice. Both from a honesty standpoint, but also generally useful.


In a similar spirit, Apple seems to have made sure some critical OSX idioms were fast on the M1, perhaps even influencing their instruction set.

Retaining and releasing an NSObject took ~6.5 nanoseconds on the M1 when it came out, comparing with ~30 nanoseconds on the equiv gen Intel.

In fact, the M1 _emulated_ an Intel retaining and releasing an NSObject fast than an Intel could!

One source: https://daringfireball.net/2020/11/the_m1_macs


The M1 emulation with Rosetta is actually dynamic recompilation so of you're measuring only that specific small section it's not surprising that Rosetta could have emitted optimal code for that instruction sequence


That seems very similar to ARM introducing an instruction set that can be used to efficiently implement common Java idioms to me, just that x86 bytecode wasn't explicitly conceived as a VM target.


Running bytecode instructions in hardware essentially means a hardware-based interpreter. It likely would have been the best performing interpreter for the hardware, but JIT-compilation to native code still would run circles around it.

During years when this instruction set was relevant (though apparently unutilized), Oracle still had very limited ARM support for Java SE, so having a fast interpreter could have been desirable -- but it makes no sense on beefier ARM systems that are able to support decent JIT or AOT support available nowadays.


I remember reading about Jazelle many years ago - before the release of the iPhone and suchlike. This was the age when people were coming up with things like 'Java Card' - smartcards programmed directly in Java.

I never heard of anyone actually using Jazelle, though - I assume JIT ended up working better.


I'm a little in the realm of speculation here. Part of the issue with Java for embedded devices was "a bad fit". What made Java thrive in the server or even applet spaces wasn't the instruction set but the rich ecosystem around Java. Yet, threading - as "inherent" to Java it is - is provided by the OS and "only" used/wrapped by the JVM. All the libraries ... megabytes of (useful) software, yet not implemented (nor even helped) by hardware acceleration. The "equivalent" of static linking to minimize the footprint never quite existed.

So on a smartcard ... write software in a (uncommon, and when compared with ARM which is a very "rich" assembly language) form of low-level instruction set, and pay both Sun and ARM top$ for the privilege - nevermind the likely "runtime" footprint far exceeding the 256kB RAM you planned for that 5$ card - why? Writing small singlethreaded software in anything that compiles down to a static ARM binary has been easy and quick enough that going off the ARM instruction set looked pointless for most. And learning which parts of "Java" actually worked in such an environment was hard, even (or especially?) for developers who knew (the strengths of) Java well. Because developers and specifiers expected "rich Java", and couldn't care less about the Bytecode. JITs later only hoovered up the ashes.



IIRC people didn't "really believe" that Java could actually be performant because they assumed that since it has a JIT layer, it would never even get close to native code.

But the reality was that JIT allows code to get faster over time, as the JIT improves.

Things like Jazelle let chip manufacturers paper over a paper objection.


> But the reality was that JIT allows code to get faster over time, as the JIT improves.

Ehh .. PGO is only somewhat better for JIT than AOT. More often for purely-numerical code the win is because the AOT doesn't do per-machine `-march=native`. It's the memory model that kills JVM performance for any nontrivial app though.


Well, code size is another interesting aspect here. A JIT compiler can effectively create any number of versions for a hot method, based on even very aggressive assumptions (an easy one would be that a given object is non-null, or that the interface only has a single instance loaded). The checks for these are cheap (e.g. it could be encoded as trapping an invalid page address in case of NPEs), and invalidation’s cost is amortized.

Contrast this with the problem of specialization in AOT languages, which can easily result in bloated binaries (PGO does help here quite a lot, that much is true). For example, generics might output a completely new function for every type it gets instantiated with - if the function is not that hot, it actually makes sense to rather try to handle more cases with the same code.


Specialized hardware has been losing out against general for years.

There were those "LISP machines" in the early 1980s but when Common Lisp was designed they made sure it could be implemented efficiently on emerging 32-bit machines.


Part of the reason is anytime a specialized hardware is found that works, the generalized hardware steals the feature that makes it faster - basically the story of all the extensions to x86 like SSE, etc.


Java Card is still around! There's a very high chance it's running on more than one chip inside your phone right now, and at least on one card in your wallet every time you use it.


I think many smartcards, including SIMs, are still programmed in Java. Seems like it is the only standard of programming smartcards that was ever developed


Fun fact, both the Wii's seconday ARM chip used for security tasks and the iPhone 2G's processors had Jazelle but never used them.


It was on every Arm926, 1136 and 1176. Lots of devices of a certain era had it but didn’t use it.



A little related, back in the day Sun Microsystems came up with picoJava, https://en.wikipedia.org/wiki/PicoJava, a full microprocessor specification dedicated to native execution of java bytecode. It never really went anywhere, other than a few engineering experiments, as far as I remember.

For a while Linus Torvalds, of the Linux kernel fame, worked for a company called Transmeta, https://en.wikipedia.org/wiki/Transmeta, who were doing some really interesting things. They were aiming to make a highly efficient processor, that could handle x86 through a special software translation layer. One of the languages they could support was picoJava. IIRC, the processor was never designed to run operating systems etc. natively. The intent was always to have it work through the translation layer, something that could easily be patched and updated to add support for any x86 extensions that Intel or AMD might introduce.


If you want a Java processor, there’s a cheap one for FPGA’s here:

https://www.jopdesign.com/

It’s more for embedded use. Might give someone ideas, though.


Jazelle and its replacement, ThumbEE, have been deprecated and removed in later architectures.

On modern Cortex-A systems, there are enough resources to make JIT feasible. On smaller systems, AOT is a reasonable alternative.


I was going to say that history repeats itself. This is incorrect. This is actually just history.

This is so old that its replacement, ThumbEE, had already been deprecated as well.


I hope people get it: an arm, and then thumb.


I was never clear on how a Java bytecodes would be implemented on a register-based CPU. Efficiently.

The JVM is stack-based, right? So it'd be an interpreter (in microcode)? Unless there's some kind of "virtual" stack, as spec'd for picoJava.

I'm less clear on how Jazelle would implement only a subset of the bytecodes.

Am noob. And a quick scholar search says the relevant papers are paywalled. Oh well; now it's just a curiosity.

Stack-based CPUs are cool, right? For embedded. Super efficient and cheap, enough power for IoT or secure enclaves or whatever.

But it seems that window of opportunity closed. Indeed, if it was ever open.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: