I really don't understand this article, and the claims really rub me the wrong w...

tedmielczarek · on Feb 27, 2018

> Why? Because underneath all that rust .... is an optimizing compiler, and it happens the author has decided to stay on the happy path of that.

There are two big differences here: 1) You're comparing "staying on the happy path" of a JIT compiler in the JS case, vs. an optimizing compiler in the Rust case. With the latter you can just compile your code and see what comes out and it tends to be fairly predictable. With the former, I'm not even sure there are tools to inspect the generated JIT code, and you're constantly walking the line of JS engines changing their heuristics and throwing you off the fast path. This was one of the primary motivations for the asm.js/WebAssembly work: the ability to get predictable performance.

2) Many of the optimizations mraleph performed were tricks to avoid allocation (which is normal optimization stuff, but more of a pain in GCed languages). In JS he winds up having to effectively write C-in-JS which looks pretty hairy. In Rust controlling allocation is a built-in language feature, so you can write very idiomatic code without heap allocation.

hobofan · on Feb 27, 2018

> what comes out and it tends to be fairly predictable

Predictable as long as you stay on the same version of the compiler (yes, I know that there are crater runs to prevent regressions). Also, how does much can/does the output for different target architetures differ in performance? Couldn't that be likened to trying to optimize for multiple JS engines?

tmandry · on Feb 27, 2018

To address your question about allocations, in Rust, you always know when you are allocating on the heap vs on the stack. The way you write your code guarantees it. Stack allocations are “basically free” compared to the heap because the memory management overhead is negligible or nonexistent. There are also guarantees about when objects you allocate on the heap are freed; there is no garbage collection.

This is in contrast to JavaScript where objects are always allocated on the heap, and are garbage collected.

So Rust gives you a lot more control and flexibility about how memory is managed. This might or might not matter for your use case, of course. There are compiler optimizations and heuristics -on top of that-, to be sure, but you end up with a lot more guarantees about how your code executes, because that’s what the language is designed for.

EDIT: If you want to learn more about memory management in Rust, see https://doc.rust-lang.org/book/first-edition/the-stack-and-t...

DannyBee · on Feb 27, 2018

"here are also guarantees about when objects you allocate on the heap are freed; there is no garbage collection. This is in contrast to JavaScript where objects are always allocated on the heap, and are garbage collected."

Surely you realize that this, and what the author wrote, are basically the same ever rehashed GC vs non-GC language discussion. Performance characteristics of each is not anywhere near as simple as any of these things claim, so i'm going to leave this alone.

"EDIT: If you want to learn more about memory management in Rust, see https://doc.rust-lang.org/book/first-edition/the-stack-and-t...

Again, i see no guarantees here.

I see it saying things like "This program has one variable binding, x. This memory needs to be allocated from somewhere. Rust ‘stack allocates’ by default, which means that basic values ‘go on the stack’."

Can someone point me to an actual language level guarantee in a spec? I looked, and i don't see it.

I can't see anything that makes it non-conformant to build a rust compiler that dynamically allocates and places all of these on the heap, and is very non-constant time for local variable allocation.

  But again, i didn't spend more than a few minutes browsing, so i may have missed it (For example, https://doc.rust-lang.org/reference/memory-allocation-and-lifetime.html says nothing here, the requirements i can find in this entire chapter can easily be met by using dynamic allocation. There are not guarantees on space or time usage that would require a real stack :P).

In fact, the vast majority of requirements here look like they could be met by a garbage collected heap/stack. It would be horribly inefficient, but ...

I'm totally willing to believe nobody has written down a good enough spec yet, just saying i don't see it ATM :)

I want to strongly differentiate between what "one implementation of rust does" and what "the language guarantees". Because if you are going to claim it's rust that makes the guarantee, as the author did, you should be able to back the claim up.

pcwalton · on Feb 27, 2018

> I want to strongly differentiate between what "one implementation of rust does" and what "the language guarantees". Because if you are going to claim it's rust that makes the guarantee, as the author did, you should be able to back the claim up.

It's perfectly reasonable to use "Rust" as a proxy for the only implementation that can be realistically deployed to the Web (rustc). Everyone knows what the author meant.

dbaupp · on Feb 27, 2018

> I can't see anything that makes it non-conformant to build a rust compiler that dynamically allocates and places all of these on the heap, and is very non-constant time for local variable allocation.

This is such a ... pointless nitpick? https://www.xkcd.com/115/

I mean sure, the language doesn't require that compilers don't emit dumb code, it is just designed so that it's easy to emit good code. Something with fewer static guarantees like JS makes it much harder for compilers to emit that good code. I don't think there's any ground to contest that.

In any case, if you just replace "stack allocates" etc. with "semantically stack allocates", you get the language's guaranteed behaviour (although Rust has no ISO spec or anything, so you're probably going to say that even that isn't truly guaranteed).

Mean, it's fair that the language used is quite strong, but still, spelling literally everything out with every possible caveat is a great way to have bad pedagogy.

jdright · on Feb 27, 2018

He is a lawyer, what would you expect? Being pendandic is his job. Not that I agree with him btw. I think he just does not have enough real practical engineering experience to understand it well.

pcwalton · on Feb 27, 2018

Daniel Berlin has way more practical engineering experience than you or I. He's a longtime GCC contributor.

(I just think in this specific instance he's wrong.)

acemarke · on Feb 27, 2018

I think the author's primary point was that the work done by mraleph required _deep_ knowledge of the V8 JIT internals and low-level profiling to get those "3x speedup" results, plus the algorithmic improvements. Meanwhile, the original Rust implementation got the same "3x" results without having to do deep analysis of how the compiler was behaving. It also seems (based on the commentary) that the way JS/JIT engines treat WASM bytecode is likely to require fewer special-cases or heuristics than plain JS.

Sure, the Rust compiler and the LLVM infrastructure are doing a lot of complicated work internally... but the end users of the compiler aren't having to spend time digging through the guts of it to guess what kind of magic sequences are needed to get fairly good performance.

DannyBee · on Feb 27, 2018

"but the end users of the compiler aren't having to spend time digging through the guts of it to guess what kind of magic sequences are needed to get fairly good performance."

That's precisely my point: The claims they make about this in this article that claim to prove this are demonstrably false. So that may even be true, but it's definitely not anything in this article that shows that.

If the author had just said "hey, when i wrote this version, it performed better and was easier", that'd be great, and awesome.

Instead, it seems they wanted to make more general claims, and to be honest, don't seem to have a lot of idea what they are talking about on that front, and to anyone who does have experience in this area, like i said, it comes off very badly.

I'll also point out, if you expect to not need to do profiling and algorithmic improvement to anything to get significant speedups, that's also similarly silly. It's just not realistic for any language in the real world. The only question is "which of the code you write will this be true for" not whether it will be true.

imtringued · on Feb 27, 2018

The article clearly admits that the LLVM compiler uses heuristics. You even quoted that part of the article.

>WebAssembly is designed to perform well without relying on heuristic-based optimizations, avoiding the performance cliffs that come if code doesn’t meet those heuristics. It is expected that the compiler emitting the WebAssembly (in this case rustc and LLVM) already has sophisticated optimization infrastructure, that the engine is receiving WebAssembly code that has already had optimization passes applied, and that the WebAssembly is close to its final form.

But really the point of the article is the last part.

>that the engine is receiving WebAssembly code that has already had optimization passes applied, and that the WebAssembly is close to its final form.

The compiler applies heuristics once during the compilation step and then never again. Compare this to JITs which are constantly changing and can pull the rug from under you.

>Maybe it would also surprise the author to learn that their are JITs that beat the pants off LLVM AOT for dynamic languages like javascript (they just don't happen to be integrated into web browsers).

Yes but it gets even better! If you limit yourself to a formalised subset of javascript called asm.js which gives you fine control over memory layout and allocation you can reach even the performance of C! Have you heard of it's successor? I think it's name was Web Assembly and every major browser has integrated it. It's a really cool technology that shows how sandboxed JITs can have the same performance characteristics as AOT compilers.

littlestymaar · on Feb 27, 2018

Allocation in Rust are opt-in, you allocate when using the `Box` keyword, when using reference-counted pointers or when using a heap-allocated data structure (vector, hashmap, etc.). If you do neither of those, you have zero heap allocation, 100% of the time, there is no heuristic involved here.

For optimizations, you're right : LLVM uses a lot of heuristic. But this happens at compile time, when you generate your webassembly blob. When you have it, it will run at a consistent speed across browsers and between several generations of the same browsers. None of this is achievable in plain JavaScript, where JS optimized for old V8 (crankshaft) won't be optimal for SpiderMonkey or newer version of V8 (Turbofan).

gpm · on Feb 27, 2018

What you're completely missing here is that a naive rust compiler would be reasonably close in speed to LLVM's non-naive compilation. Sure some inlining helps a bit, and llvm does some really fancy things that help a bit, but even without those Rust would still be a reasonably fast language. As such you can just right normal rust, and your worst case where the compiler completely fails to help is reasonably fast.

A naive javascript interpreter is really slow, spidermoney and V8 do a ton of work to make it reasonably fast. If there's an important piece of code where the compiler doesn't even completely fail to help, but just helps less than normal, the code is no longer reasonably fast.

dbaupp · on Feb 27, 2018

> Why? Because underneath all that rust .... is an optimizing compiler, and it happens the author has decided to stay on the happy path of that. There is also an unhappy path there. Is that happy path wider? Maybe

I think you've got a fair point, that all optimizing compilers and JITs have happy paths and unhappy ones, falling off the happy path for either form will result in slower code. But, the happy path being wider kinda seems like the key thing here?

Additionally, AOT has consistency, in that it doesn't depend on runtime data. This can obviously leave some performance on the table if a JIT ends up optimizing for the data that happens to be used in a given session, but also means performance usually doesn't (accidentally) depend on global state.

> Instead, i have watched plenty of patches to LLVM go by to try to improve it's heuristics (oh god, there's that evil word they used above!) for removing allocations for rust

I recall some that just told LLVM the names of the symbols Rust uses instead of malloc/free, so that it could do its standard escape analysis of completely pointless (that is rarely useful for Rust, IME), but didn't actually touch the rest of LLVM.

In any case, rustc (and the current understanding of Rust as a language) doesn't insert allocations itself, because there's no reason to, unlike JS, where the allocations are essentially semantically required. That is to say, Javascript's language model is so flexible that many things have to allocate by default (as it's the only way to get the required shared-mutation behaviour), where as in Rust, heap allocation is implemented as a standard library construct (it's not necessary for the language: e.g. libcore is a bare-metal subset of the standard library that works with no dependencies) and people won't manually put things on the heap unless they have to.

> Maybe it would also surprise the author to learn that their are JITs that beat the pants off LLVM AOT for dynamic languages like javascript (they just don't happen to be integrated into web browsers).

Are you saying that a JIT for JS beats LLVM for... JS? I don't think that's particularly surprising (and probably especially not for someone who seems to have a lot of history with writing JS engines/Spidermonkey, like the author), given things like webkit's experience: https://webkit.org/blog/5852/introducing-the-b3-jit-compiler... .