Hacker News new | past | comments | ask | show | jobs | submit login

Wow, you read a way more argumentative tone into my comment than I intended. It's all good, I'm trying to expand my understanding. :) Though I'm not sure I agree with your characterization of my points as "nonsense."

> [Inline-threaded dispatch] gives a 20% speed boost, for a few dozen lines of code. Hardly "limited usefulness".

Unless I'm missing something, I think you would be very hard-pressed to generate basic blocks of native code in only a few dozen lines of platform-dependent code. I'm not seeing how you could do it without implementing your own code generator (which would take far more than a dozen lines). Is there some clever way of leveraging the C compiler's output that I'm missing?

Also, once you are generating basic blocks of native code, I would consider it a JIT at that point, even if it is not as sophisticated as a JIT that's doing register allocation and optimization. Writing this sort of JIT that uses a static register allocation and a fixed instruction sequence for each bytecode op doesn't seem that much more work than writing an interpreter in assembly language (especially if you use Mike Pall's excellent framework DynASM: http://luajit.org/dynasm.html).

> Its certainly common to use a single indirect branch. However, it is suboptimal.

I don't believe that a single indirect branch is a priori slower; while replicated dispatch makes better use of the branch predictor, it also results in less optimal i-cache usage (the LuaJIT code notes this trade-off, but observes that replicated dispatch is 20-30% faster in practice: http://repo.or.cz/w/luajit-2.0.git/blob/2ad9834df6fe47c7150d...).

I'm not saying switch is better, what I'm saying is that you can't reliably replicate dispatch in C (at least according to Mike's message that I cited earlier), which makes switch() a totally reasonable choice for interpreters written in C.

By the way, I'm not just making stuff up, I have written a JIT of my own: https://github.com/haberman/upb/blob/master/upb/pb/decoder_x...




> Wow, you read a way more argumentative tone into my comment than I intended. It's all good, I'm trying to expand my understanding. :) Though I'm not sure I agree with your characterization of my points as "nonsense."

Indeed, that was much too harsh and I retracted it. I thought I might have retracted it before anyone noticed, but I guess not. My bad, sorry about that.

> Unless I'm missing something, I think you would be very hard-pressed to generate basic blocks of native code in only a few dozen lines of platform-dependent code.

Whoops, I thought we were still talking about token-threading. I believe people implement token threading by fiddling with the compiler output. John Aycock had a paper about doing this for Perl or Python as I recall (also, since you're into JITs, you might enjoy his "a brief history of Just-in-time").

> Also, once you are generating basic blocks of native code, I would consider it a JIT at that point,

Yes, people often consider this to be a JIT. However, its much much easier to write than a "real" JIT. The complexity of writing a method-jit or trace-jit is very very high. This could probably be done in an afternoon by a great hacker.

Don't underestimate the work of writing the assembly language interpreter like Mike Pall did. Most incredibly hardcore compiler guys with whom I have discussed it were simply blown away by it. It might be one of the most hardcore feats in VM implementation history. He manually register allocated across dynamic flow control boundaries for gods sake!!

> reasonable choice for interpreters written in C

But not a fast one.


> also, since you're into JITs, you might enjoy his "a brief history of Just-in-time"

Indeed, I've bookmarked it, thanks! I couldn't find the other paper you mentioned about fiddling with compiler output; would be very interested in seeing that.

LuaJIT2 was the first interpreter and JIT that I ever studied deeply, so while I am very impressed by it, I don't have a great understanding of how it differs from previous work. How is its interpreter substantially different or more impressive than previous interpreters written in assembly language? Don't all assembly language interpreters need a fixed register allocation?

I've been a big fan of Mike's work for a long time and helped him get some Google funding: http://google-opensource.blogspot.com/2010/01/love-for-luaji...

By the way, when I was searching for the John Aycock paper I came across your dissertation which I've also bookmarked. I see that you went to Trinity College Dublin -- I visited Dublin a few years ago and what a beautiful university that is! Looks like there's a lot of interesting JIT-related research coming from there lately.


> How is its interpreter substantially different or more impressive than previous interpreters written in assembly language?

Traditionally, there were two implementation strategies: compilers and interpreters. JITs didnt become mainstream until Sun bought Hotspot and put it into the 2nd version of Java.

So you had to choose between compilers and interpreters. Compilers led to optimization, fast object code, but were complex to implement (especially multiplatform). Interpreters were simple to implement, very portable, but were very very slow.

Obviously there is more to both, but until recently, basically this is how people thought about language implementation. Considerations like a REPL, slow compilation speed, ease of prototyping, etc, were a complete side show (perhaps people cared, but you'd rarely see it discussed).

When all of the dynamic languages were implemented, they all used interpreters, written in C. They all used a simple dispatch loop to opcode implementation, and let the C compiler do the heavy lifting. All the research into fast compilers (the David Gregg/Anton Ertl stuff for example), looked at instruction set (registers/stack) and dispatch type. So when making interpreters fast, there were only 4 strategies:

- make a compiler,

- use better dispatch,

- rewrite using a register interpreter set,

- make a JIT.

Making a JIT is lunacy of course, because JITs are ridiculously hard, they're not portable. So that Pall was making a 1-man JIT (LuaJIT1) was incredible.

But that he made an interpreter that was as fast as a JIT was even more insane. In Trinity, all of us language/JIT/scripting language people were in one room, and when we heard about this we were just amazed. Nobody had even thought about the stuff - it was all brand new and novel in a field that barely had anything novel in decades! Until that point, basically all interpreters were one big while loop.

> How is its interpreter substantially different or more impressive than previous interpreters written in assembly language?

I wouldnt know, since I've not heard of any mainstream interpreters in assembly. I can only imagine that they were exactly the same as C interpreters: essentially a while loop with a switch statement, just written in assembly.

I find it amusing that you started at LuaJIT2. I would liken it to studying modern war, then wondering "why didnt they just use drone strikes at the Somme" :) Looking back from LuaJIT2, interpreters must seem really really primitive.


I don't think assemblers written in assembly were that bad. LuaJIT 2 uses direct threading (not new at all), register-based bytecode (relatively new), and manually optimised register assignment (perhaps new). AFAICT, the key innovations are that he did not use Lua 5.1's register-based bytecode format, but simplified it even further so it can be decoded efficiently on x86. The second key component is that he pre-decodes the following instruction in order to take advantage of out-of-order execution. This technique also required fixing some register's roles.

Don't get me wrong, I think LuaJIT2's interpreter is great, but interpreters before LuaJIT2 weren't complete crap, either. Many emulators, for example, have very good interpreters written in assembly (some aim to be cycle-accurate).


I was trying to describe how it looked from an academic standpoint. Direct threading and register bytecode was well known (register stuff is actually very old, but the jury was out until about 2003), but everything else Pall did was basically new to programming language researchers and implementers.


Aycock paper is available here: http://pages.cpsc.ucalgary.ca/~aycock/

title: "Converting Python Virtual Machine Code to C."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: