
JITs are un-ergonomic - awinter-py
https://abe-winter.github.io/2020/03/28/jitu-brutus.html
======
lpghatguy
An anecdote from a non-JS JIT, but similar: I once spent a summer working on a
game engine with a couple others where the host language was LuaJIT.

It started out great. The iteration cycles were incredibly short, performance
was competitive, the code was portable, and we could do wild metaprogramming
nonsense that was still fast. If you haven't worked with LuaJIT, its C FFI is
also incredible!

As we started scaling though, the wheels fell off the wagon. We'd add a new
feature and suddenly the game wouldn't run at interactive framerates. One
time, it was the 'unpack' function, which would trigger a JIT trace abort. We
would drop from 12ms frames to 100ms frames. I wrote a length-specialized
version that didn't abort and moved on.

Another time, it was calling Lua's 'pairs' method (iterator over a map). Okay,
so we can't do that, or a few other things that made Lua productive before.

The other problem we hit was GC predictability being impossible. We tried to
mitigate it by using native data structures through the C FFI, taking control
over the GC cycle to run it once or twice per frame, etc. In the end, like the
JIT problem, we weren't writing Lua at the end, we were writing... something
else. It wasn't maintainable.

That summer ruined dynamic languages for me. I didn't really want to be
writing C or C++ at the time. I ended up picking up Rust, which was
predictable and still felt high-level, and the Lua experience ended up getting
me my current job.

~~~
beetwenty
I'm working with Lua right now(gopherlua) as a scripting option for real-time
gaming. I've done similar things to your story in the past with trying to make
Lua the host for everything and I'm well aware of the downsides, but I have a
requirement of maintaining readable, compatible source(as in PICO-8's model) -
and Lua is excellent at that, as are other dynamic languages, to the point
where it's hard to consider anything else unless I build and maintain the
entire implementation. So my mitigation strategy is to do everything possible
to make the Lua code remain in the glue code space, which means that I have to
add a lot of libaries.

I'm also planning to add support for tl, which should make things easier on
the in-the-large engineering side of things - something dynamic languages are
also pretty awful at.

~~~
yumaikas
You might still run into GC problems, but none of the Go-based Luas (built on
Go, rather than binding to another Lua library) I am aware of have a JIT built
in.

------
pron
The OpenJDK JVM (aka Hotspot) addresses both issues: control [1] and
monitoring [2] (there are built-in compilation and deoptimization events
emitted to the event stream). You can also compile methods in advance [3], and
inspect the generated machine code when benchmarking [4]. You can even compile
an entire application ahead-of-time [5] to produce a native binary.

[1]: [https://docs.oracle.com/en/java/javase/14/vm/compiler-
contro...](https://docs.oracle.com/en/java/javase/14/vm/compiler-
control1.html)

[2]: [https://docs.oracle.com/en/java/javase/14/jfapi/why-use-
jfr-...](https://docs.oracle.com/en/java/javase/14/jfapi/why-use-jfr-api.html)

[3]:
[https://docs.oracle.com/en/java/javase/14/docs/specs/man/jao...](https://docs.oracle.com/en/java/javase/14/docs/specs/man/jaotc.html)

[4]: [http://psy-lob-saw.blogspot.com/2015/07/jmh-perfasm.html](http://psy-
lob-saw.blogspot.com/2015/07/jmh-perfasm.html)

[5]: [https://www.graalvm.org/docs/reference-manual/native-
image/](https://www.graalvm.org/docs/reference-manual/native-image/)

~~~
amelius
The GC is still nondeterministic though.

~~~
tomp
So is malloc

~~~
pkolaczk
It is much easier to avoid malloc writing C than to avoid new in Java.

~~~
tomp
Nothing to do with GC though. It's also fairly easy to avoid `new` in Go and
C#.

~~~
pkolaczk
To some degree, probably.

Structs have many limitations though and two different argument passing
semantics in the same language mean more complexity. Also you may be lucky to
eliminate allocations in your own code, but what about libraries? Existence of
GC shepherds programmers and library creators into heap allocation. You don't
need to write new explicitly to allocate on the heap.

Also high level languages with no GC make manual memory management much more
usable and provide way better ergonomics in this area, just because they have
to. E.g. automated reference counting, ownership/lifetime control, move
semantics etc.

~~~
pjmlp
Automated reference counting is a garbage collection algorithm, and it is
possible to have ownership/lifetime control without giving up on GC, Chapel,
ParaSail, Haskell, OCaml, Swift, D are following up exactly this path.

~~~
pkolaczk
How do you avoid heap allocations in Haskell?

Automated reference counting is technically a GC, but a kind of GC that can be
enabled only for a subset of objects. In languages which force GC on
everything (even reference counted GC) the incentive to provide abstractions
that work without GC are much weaker. It is just hard to opt-out from GC once
you have it and once all your stdlib relies on it. I thing D learned it the
hard way.

~~~
pjmlp
Stack and register allocation in Haskell is under compiler control via escape
analysis, however you can allocate native heap outside GC via
mallocBytes/allocaBytes/free, and if they get integrated into mainstream GHC,
linear types.

Besides, plenty of GC enabled languages offer the option to stack allocate,
static global allocations, or native heap.

Some examples, including languages that for whatever reason failed on the
mainstream market.

D, C#, Swift, Oberon, Oberon-2, Active Oberon, Component Pascal, Mesa/Cedar,
Sing#, System C#, Nim, Modula-2+, Modula-3, VB, Xojo, C++ (via C++/CLI, C++/CX
and Unreal C++), Common Lisp.

------
banachtarski
This article has a number of issues. JS with JIT is waaay faster than Python.
Not “between python and java” as purported. Second, generalizing jits as “un-
ergonomic” seems silly given that what’s being specifically looked at is
benchmarking. But what makes this claim ridiculous is that _nothing_ is easy
to benchmark. Even native code is hard to profile and this is literally my day
job. If the JIT makes your code _that_ much faster, this strikes me as a
pretty suspect complaint

~~~
pizlonator
I think that by “between python and java” they meant “faster than python and
slower than java”. I think Java still beats JS unless you get lucky.

You’re totally right that benchmarking and profiling is hard even for native
code. I think this post fetishizes whether or not a piece of code got JITed a
little too much. Maybe the author had a bad time with microbenchmarks. There’s
this anti pattern in the JS world to extract a small code sample into a loop
and see how fast it goes - something that C perf hackers usually know not to
do. That tactic proves especially misleading in a JIT since JITs salivate at
the sight of loops.

~~~
shawnz
That still doesn't make sense to me.. how can JIT in general be slower than
Java when Java is JITed?

~~~
seanmcdirmid
The Java JIT has static type information to work with, the JS JIT can only
infer type information via heroic efforts. Static types do mean something,
especially when working with primitives and other unboxed data types.

~~~
pcr910303
Well, you're right that JS JITs can only infer type information via heroic
efforts, but AFAIK the Java compiler throws away any type information from the
source code, which means that the JVM JIT needs to inver type information
again from the byte code.

Still, the JVM JIT is faster than JS due to reasons explained in the sibling
comment of the parent one.

~~~
pdpi
> but AFAIK the Java compiler throws away any type information from the source
> code

You're thinking about generics. .class files preserve a whole bunch of type
information (I'm building a .class decompiler in my free time, and I'm looking
at that very same data in my debugger ATM).

~~~
The_Colonel
You probably know this, but it's not obvious to the random reader.

Even generic types are available in the Java .class file and are accessible
from the reflection API. Spring for example uses this quite heavily.

~~~
cogman10
It depends on where you are at.

Type information is present in fields and class inheritance

For example, a class like this

`class Foo implements Bar<String>`

Retains the fact that the generic type is a String.

That information is completely lost at method invocation. So a method that
takes a `Bar<String>` ultimately compiles to a method that takes a `Bar` and
knows nothing of the String.

To get that generic information down you have to engage in some fun tricks
using either the class or field method I mentioned earlier. (Usually you do
this with a second type parameter where it matters).

------
connor4312
As someone who's been writing a lot of JavaScript, Go, and a handful of other
languages for a while, I feel this. In Go, I can basically know what's going
to happen when I write a function. This operation will read from the stack,
these instructions will be run, and I can take a peek at the assembly if I'm
not sure (though I've developed a pretty good feel for what Go will do without
needing that). I can benchmark it and know that the performance I see on my
machine will be the performance when I ship this bit of functionality into
production, barring hardware differences.

In JavaScript, it's a black box. I know some constructs might deoptimize
functions when run on Wednesdays because I read them on a blog published in
2018 that's _probably_ still accurate. In my benchmark running on Node 12.14.1
on Windows this seems to be true. But then who knows if it'll be the same
thing in production, and it might 'silently' change later on.

JavaScript in V8 is incredibly fast these days, but I find it much easier to
write optimal code in Go.

~~~
_bxg1
It's really no different from native compiler optimizations, which are also
mysterious and always changing, except for one key aspect: when recording
timings to compare against other timings, you can control whether
optimizations are turned on or off, to remove that variable from the
comparison.

~~~
lpghatguy
Thankfully, with a native compiler you don't have to deal with your JIT
warming up, or some other code causing your function to get deoptimized, or
subtle JIT trace aborts. :(

~~~
fctorial
> or some other code causing your function to get deoptimized

Why doesn't the optimizer generate different compiled implementations of a
function for different pieces of code?

~~~
notamy
I would imagine this to be due to the overhead of then having to track where
EVERY variant of a JITted function can then be called from, when to
deoptimize, etc.

------
_bxg1
> if your economics are such that servers are a bigger cost than payroll

Sorry, and I may be oversimplifying the author's situation, but this really
sounds like a case where you need to not be using JS for your server. On the
client you don't have much choice, but on the client the pure-JS performance
rarely gets tight enough to warrant this degree of micro-optimization work.

Author makes some good points - it would be great if the JIT were more
profiler-friendly - but I have to question a little bit how important it
actually is, the way the use-cases line up.

~~~
erik_seaberg
When someone realized you pay a lot for cycles at the server, but you pay
nothing for cycles at the client, that was the moment the world-wide web began
to die.

~~~
_bxg1
This is baseless mudslinging. The bottleneck on the client is virtually never
the cycles consumed by the actual running of the actual app's JS code. It's
usually, in order starting with the most common:

\- Piles of ads/analytics scripts which have no motivation not to slow down
the page

\- Reflow; i.e. needlessly many elements on the page causing the browser to do
extra work calculating layout

\- The initial loading and JIT-ing time of a needlessly heavy JS bundle

------
lispm
> Interpreters, which read the program line by line as it runs

Byte code interpreters do that? That would be surprising. Programs are
represented with 'lines' in byte code?

Things he wants, let's look at SBCL, a Common Lisp implementation (see
[http://sbcl.org](http://sbcl.org) ):

> compile short bits to native code quickly at runtime -> that's done by
> default

> identify whether a section is getting optimized -> we tell the compiler and
> the compiler gives us feedback on the optimizations performed or missed

> know anything about the native code that’s being run in a benchmark -> we
> can disassemble it, some information can be asked

> statically require that a given section can & does get optimized -> done via
> declarations and compilation qualities

> compile likely sections in advance to skip warmup -> by default

> bonus: ship binaries with fully-compiled programs -> dump the data (which
> includes the compiled code) to an executable

~~~
eatonphil
Yes, I thought of SBCL when I was reading this too.

A JS frontend for SBCL could be nice...

~~~
pjmlp
Or at very least improve browser developer tools to give us what Lisp
compilers have been doing the last half century.

Why should we need to hand compile special versions of JavaScript VMs, just to
get _(decompile ....)_?

------
dahart
A JIT is just another cache, like memory. Yes, it’s hard to predict, but not
fundamentally that different from caching in any language. It does mean perf
tests have to be end-to-end and match real-world loads, but it doesn’t mean
it’s “impossible” at all, it means you need to measure.

Is this a real problem? I’ve been profiling my JS for years and never actually
run into a mysterious problem where some important code I profiled was way way
slower in prod than when I was profiling. Has that happened for you? How often
does this happen? I take it as an assumption that profiling is something you
mostly do on inner loops & hot paths in the first place. I mean, I profile
everything to look for bottlenecks, but I don’t spend much time optimizing the
cold paths.

> Get notified about deopts in hot paths

Studying the reasons for de-opts help you know in advance when it might
happen. If you avoid those things, do-opts won’t happen, and you don’t need
notifications.

For example, ensure you don’t modify/add/delete keys in any objects, make sure
your objects are all the same shape in your hot path, don’t change the type of
any properties, and you’re like 90% there, right?

> statically require that a given section can & does get optimized [...]
> compile likely sections in advance to skip warmup

While these don’t exist in V8, it’s maybe worth mentioning that the Google
Closure compiler does help a little bit, it ensures class properties are
present and initialized, which can help avoid de-opts.

------
inglor
Hey Node/bluebird person here: You want to run Node with --teace-opt and
--trace-deopt and --allow-natives-syntax with %OptimizeFunctionOnNextCall
before benchmarking.

------
cjfd
The high level point that is of great importance is that if one wants a
computer program to function in a reliable way one needs simple and
understandable algorithms/components. Nowadays even the processor is no longer
simple and assembly has become a high level language and it got us 'nice'
things like spectre. For the practical day to day work the KISS principle that
has been the corner stone of effective programming for half a century is now
more important than ever. Yes, there is a nice library available to do such
and such and maybe you need it, but are you also thinking about the dangers
that any additional moving part may increase unpredictability? Let me give one
very stupid example that I ran into this very week. The agreed-upon answer
according to [https://unix.stackexchange.com/questions/29608/why-is-it-
bet...](https://unix.stackexchange.com/questions/29608/why-is-it-better-to-
use-usr-bin-env-name-instead-of-path-to-name-as-my) is that it is better to do
#!/usr/bin/env bash than #!/usr/bin/bash. I say: absolutely not! You have just
increased the number of moving parts involved by one for an immeasurably small
benefit. And if you are worrying about bash versions then just stop using any
bash features that are less than a decade old. I also say that unless it is
necessary for the core problem that you are trying to solve stay away from any
features of anything that are less than 5 years old. And if you actually need
the new and fancy stuff, and maybe you do, expect to pay a hefty price for it.
Every new tool that you introduce has its own peculiarities that you will
spend hours of debug time on and one should err on the side of just saying no.

------
samatman
LuaJIT, true to form, has a sort of solution for this, in the form of a
profiler cheap enough to run in production.

You do have to change how you think about performance analysis, but in return,
you get to actually answer the question you're trying to reason about, namely,
how does this run in production.

Pacifying the JIT is a bit of a dark art, but the whole thing is pretty
transparent with good tooling. I've yet to regret building on LuaJIT.

~~~
thu2111
Worth noting that the JVM has this feature as well, under the name of "flight
recorder".

------
BorisTheBrave
This article contains a logical error. The premise is JITs are hard to
benchmark and keep good performance on. That's true.

But the alternative is bad performance all the time (the JITS fall back to
interpretation, after all).

What's the value in having clearly understood bad performance? If you care
enough about performance that you need to understand it, surely you care about
the absolute level of performance.

~~~
patrec
False dichotomy. No one should be using shitty byte code interpreters like
(c)python for real production work. Or at least stop whining about global
warming. The real alternative is AOT.

Also in many cases having it's much better to trade off average performance
for lower variance.

~~~
jashmatthews
The vast majority of the time it doesn't matter. Most of us don't write
performance critical code. Even games use bytecode VMs for scripting. You'd be
better off spending the money saved on planting trees.

If you write a web app in Ruby using a similarly light weight frameworks as
you would using Go, you'll be lucky to get 3x the throughput out of the Go
app.

There are also ways to more efficiently utilize hardware by using bytecode
than distributing binaries. Using bytecode verification rather than an opaque
binary, you can pack thousands of different web applications into a single
process.

This is an application of webassembly which IMHO is much more promising than
in-browser use.

~~~
patrec
> The vast majority of the time it doesn't matter. I keep hearing this all the
> time and keep seeing all the time that it isn't true. Do you know of a
> single comparable Go web app that's not 10-100x less wasteful than goddamn
> gitlab? Most of the pathological sources of CO2 cloud emissions are a
> combination of terrible architecture and slow as molasses scripting
> languages like python or ruby. But these are _not_ orthogonal. In theory
> gitlab could probably be 100x less wasteful even when written in Ruby. In
> practice many bad architectural decisions are far more likely to occur with
> Ruby and Python than a statically typed language (although Java may serve an
> interesting counter-example).

> Using bytecode verification rather than an opaque binary, you can pack
> thousands of different web applications into a single process

Can you expand on what problem this solves?

Anyway, the problem isn't the bytecode, it's the shitty interpreter. There are
plenty of bytecode based languages that are within < 10x of C.

~~~
jashmatthews
Yeah, absolutely. I've worked on some incredibly poorly performing Go
services. The idea that a team struggling to deliver on a roadmap and keep up
application performance while using a high level language like Ruby will
magically be able to do it while using a lower level language is just a
complete fallacy. It never happens!

People get bogged down. They don't write low allocation code. Nobody has any
time to turn on the compiler warnings for escape analysis. Your benchmarks end
up a year out of date and don't even run anymore. Architecture becomes an
afterthought. Function/method calls end up becoming even slower gRPC calls as
the team struggles to box up complexity and extracts more services.

WRT bytecode verification it's better just to read this:
[https://www.fastly.com/blog/announcing-lucet-fastly-
native-w...](https://www.fastly.com/blog/announcing-lucet-fastly-native-
webassembly-compiler-runtime)

There's nothing shitty about the CRuby bytecode VM. It's all a question of
resourcing. You're doing a huge disservice to all the people who worked very
hard to make CRuby 2.7 5-15x faster than 1.8 and it detracts from a valid
discussion about reducing the CO2 footprint of datacenters around the world.

------
pizlonator
This post is extremely V8-centric. For example it uses terminology like
“deopts” which means nothing in JavaScriptCore (we distinguish between exits,
invalidations, jettisons, and recompiles). The post also assumes that there is
only one JIT (JSC has multiple).

And that’s where you’ve lost me. Not sure how you expose anything about how
the JIT is operating without introducing a compat shitshow since JIT means
different things in different implementations.

If you really want to know that something gets compiled with the types you
want, use a statically typed and ahead of time compiled language.

If you have to use a JIT but you find that it doesn’t do what you like then
remember that it’s meant to be stochastic. The VM is just trying to win in the
average. Which functions get compiled and with what type information can vary
from run to run.

Probably the best thing that could happen is that developer tools tell you
more about what the JIT is doing. But even that’s hard.

There are some specifics that I disagree with:

\- I don’t think all JIT architects for JS claim that the perf is about
competing with C for numerical code. I don’t explain it that way. I would say:
JITs are about doing the best job you can do under the circumstances. They can
make JS run 4x faster than an interpreter if things really go well. “Between
Python and Java” is a good way to put it and that’s exactly what I would
expect. So if that’s your experience then great! The JIT worked as expected.

\- It’s usually foolish to want your code compiled sooner. Compilation delay
is about establishing confidence in profiling. I’m pretty sure we’d JIT much
sooner if it wasn’t for the fact that it would make the EV of our speculation
go negative.

TL;DR. the JIT can’t unfuck up JavaScript.

~~~
bjourne
> Probably the best thing that could happen is that developer tools tell you
> more about what the JIT is doing. But even that’s hard.

Why? You have the exact same problem when writing SQL code but there you have
lots of powerful introspection tools to make it easier to control performance.
You can also use indices and hints to nudge the RDBMS into executing queries
in the most optimal way. PyPy for example has a lot of support for
introspection.

~~~
pizlonator
Because JS JITs aren’t deterministic. Just starting the profiling or
introspection tool could change internal behavior.

The nondeterminism can come from lots of places. In JSC it’s that the JIT
polls the heap for some of its profiling and it profiles concurrently. So OS
scheduling decisions affect what types the JIT sees.

~~~
bjourne
Performance in general is non-deterministic in a multi-tasking os because you
have multiple processes competing for cpu time. My point is that there is
nothing inherent to Javascript VMs that makes them harder to control or
analyze. Runtimes for other dynamic languages and database engines shows that
proper tooling makes it much easier for developers to optimize for
performance.

~~~
pizlonator
On some level of course that’s true but surely you’re not suggesting that the
nondeterminism of JS is no worse than the nondeterminism of C. C at least runs
with the same types every time.

~~~
dragonwriter
C is weakly typed, so that it runs with the same static types means very
little.

~~~
pizlonator
C has strong types for the purpose of optimization. A C compiler never wonders
whether + means int addition or string concatenation for example. In JS the
optimizing compiler doesn’t get such type information except by profiling and
profiling is nondeterministic.

------
peterkelly
If you were to get the information you were after it would be specific to a
particular implementation, and a particular _version_ of that implementation.
The nature of how JIT is done for JavaScript is not defined in the spec and
varies a lot (starting from totally nonexistant many years ago).

To have useful performance measurement tools that are going to help your code
run across multiple implementations would require a) all of those
implementations to work the same, forever, and b) the semantics of this to be
specified in the standard.

------
claytongulick
The thing is, javascript performance is mostly "good enough", and honestly
that's what really matters.

When you have workloads that are mostly io bound, having syntactic sugar like
js async/await to avoid blocking is really a huge strength.

When we write systems, we operate under constraints, and frequently the
largest constraint is time to market rather than pure performance.

Dynamic typing can be a huge strength for TTM.

If performance was the only concern, we'd all be in straight C with inline
asm.

~~~
saagarjha
What’s TTM?

~~~
gok
Time to market I'd guess?

------
gentleman11
As a nodejs developer: what is the best compiled server language to learn
right now? Is it Java still, or is it better to look at go or rust?

~~~
csande17
Part of me wonders if ASP.NET will make a comeback. A lot of the historical
reasons not to use it don't exist anymore (there's an officially-supported
Linux port, for example), and a lot of the cool new JavaScript language
features like async/await and arrow functions came from C#.

Besides, most JavaScript programmers are already dependent on Microsoft
products like TypeScript, NPM, and Visual Studio Code. What's one more?

~~~
gentleman11
Is it open source?

~~~
Nullabillity
Kind of? Vital parts are still closed (like the debugger), and the tooling
still assumes a closed-source world (for example, with OmniSharp, jump-to-
definition to dependencies just gives you a synthetic source file that only
contains method signatures).

It's also not exactly run like a community project. The issue tracker feels
more like a customer support system: "can you try updating to the newest
release now?" abound, with no mention of what the actual problem or fix was.

------
gridlockd
I completely agree, it's possible to do some tracing of what the V8 JIT does,
but the workflow is awful.

Microbenchmarks don't represent the real world. Instead of running a single
microbenchmark, I suggest running a host of them, but all of them "at the same
time" (not serially). They'll get in each others ways and total performance
will be far worse.

This happens with AOT code as well of course, because caches are being
trampled no matter what, but JIT code only exarcerbates the issue because it
is larger.

------
remexre
Lancet [1] seems like it fixes at least some of the unpredictability. (Haven't
used it though, only read about it on another site.)

[1]:
[https://github.com/TiarkRompf/lancet](https://github.com/TiarkRompf/lancet)

------
arkanciscan
Not a big deal if you're mainly writing client-side JS. You're never gonna
know how fast something will run in a browser anyway.

------
anthk
I'd love a jit for DosBOX-X, or Qemu. I know, KVM, but think about emulating
non-native archs. Or systems without KVM support.

~~~
remexre
Doesn't QEMU already JIT? I guess, depends how much [1] counts.

[1]:
[https://wiki.qemu.org/Documentation/TCG](https://wiki.qemu.org/Documentation/TCG)

~~~
anthk
Is not marvellously fast, albeit x86 it's a crawl with these CPU context
switches. MIPS emulation seems faster, much faster.

------
gok
Dead on. JITs are high interest credit card technical debt when it comes to
performance. It says a lot that all the performance-sensitive parts of widely
deployed JITs are themselves implemented with AOT compilation.

~~~
saagarjha
> It says a lot that all the performance-sensitive parts of widely deployed
> JITs are themselves implemented with AOT compilation.

…not really? Managed languages with a substantial runtime tend to make for
poor ergonomics when writing JITs.

