Yup, more generally, I'd say that statically typed languages that compile to native code (Rust, Go, C++, etc.) are necessary to achieve good performance when writing linters / formatters, type checkers, compilers, and interpreters (AST- and graph- based workloads).
You could frame it as a "failure of JITs in the 2010's". JavaScript isn't a good language to write the TypeScript compiler or a linter, because v8 isn't fast enough.
The semantics of JavaScript do not allow v8 to be fast enough.
IIRC this was precisely why the Dart project was started more than 10 years ago by the original v8 authors. They were spending a lot of time looking into why real world web page performance was falling off various cliffs in v8. They realized they needed to change the LANGUAGE in order to be able to write fast programs. A major use case was developing programs like Google Docs and GMail in the browser, which had to compete with native programs written in C++.
JITs are fast in common cases, but they not only have big costs in terms of memory (code storage) and startup time, but they're hard to ENGINEER with!
Similar story with Python tooling -- the linters, formatters, and type checkers are quite slow due to being written in Python. mypyc gives a bit of speedup, but it still uses the Python runtime.
Related story from yesterday:
"Even the pylint codebase uses Ruff" (linter in Rust)
That is, he says that in the 90's and 2000's, we thought that clock speeds would continually increase, and JITs would get better, and so we could design language semantics without regard to performance -- language that almost REQUIRE slow implementations.
(I think his take is about 50% true. The other 50% is that dynamic languages simply allowed people to produce popular and useful software at a greater rate, especially for the web, so we ended up with a lot of software written in dynamic languages! Doing web apps in Java vs. Ruby/Python/JS is a huge difference in productivity, and I'd say you often end up with a BETTER result, due to increased iteration / fast feedback.)
---
This also tracks with my experience with https://www.oilshell.org, where we reverse-engineered the shell in an experimental fashion with Python, and then evolved that implementation into a statically typed language that generates C++ (using MyPy, ASDL, and algebraic data types).
This core of the program is the elaborate and strongly typed "lossless syntax tree", which is basically what's used in linters and formatters.
Even though I've been using both C++ and Python for >20 years, I was a little shocked how much worse Python is for AST- and graph-based workloads.
I'm looking for references/measurements specifically on these types of workloads. I think a lot of papers about JITs are misleading with respect to them, or at least you have to read between the lines.
I'd say that Python and JS are 10x as slow as native code for "business" and "web app" workloads, and I've never had a problem with them in those settings. Quite the contrary, I've actually sped up poorly working code in static languages with Python. If you're within 10x of the hardware's performance, you're doing VERY WELL compared to "typical software", which has layers of gunk and can be 100x to 1000x too slow.
But bare Python and JS (no libraries) are closer to 100x too slow for ASTs and graphs. This is because of all the allocation and GC overhead -- in both time and space -- in addition to dynamic dispatch, etc.
(The funny thing is that Oil is now the most statically-typed shell implementation, even though it's nominally written in Python :) It uses fine-grained static types, where as shells written in C use a homogeneous "WORD*" representation in C, and strings with control codes embedded in them for "structure" and "types". I should probably write a blog post about that ...)
---
To shine some light on the other side, I'm still a bit skeptical of Rust specifically, because:
- memory management is littered all over the codebase.
- Borrow checking seems to work better for stateless/batch programs (it thinks about parameters and return values), but linters and type checkers for language servers are STATEFUL: https://news.ycombinator.com/item?id=34410187
- Many ASTs are actually graphs
- pattern matching can't see through boxing apparently ?
Also, the author of esbuild tried BOTH Rust and Go, and ended up with Go. IIRC it performed better because it didn't have deterministic destruction on the stack -- GC was more efficient?
It seems like a language with both garbage collection and algebraic data types would be nicer, but neither Go or Rust fit that description!
Rust makes a lot of sense for kernels and so forth, but for language processors -- especially stateful ones (which includes the Unix shell!) -- I think GC is still a big help. And we know already know how to make GC fast for that use case, i.e. you don't need a GC that scales to 1 TB of memory on 128 cores.
If only folks didn't ignore the JVM which is still wicked fast and getting faster all the time. JITs didn't fail, you are just using the ones that suck at what you are trying to do (v8) or don't exist at all (Python).
.NET and Java have convincingly proved that JIT'ed bytecode VMs are plenty fast enough for almost all general purpose computing.
I wouldn't write a browser/game/kernel/hard-real-time application in one but they are awesome for data manipulation, databases, web servers, general desktop apps, etc.
My point is that you don't need "shitty scripting language" + "fast AOT compiled languaeg" when you can literally just chose "fast managed bytecode language" for 99.99% of usecases (that aren't running in a browser because JS monopoly there ofc).
AoT with periodic profile-based re-optimization (like the latest Dalvik / Android Runtime provide), perhaps also with dynamic re-optimization/on-stack replacement seems ideal. As long as you're ditching Dalvik's earlier use case for hybrid interpretation/JIT, a compact SSA-based format (like Michael Franz's et. al.'s SafeTSA) seems better suited as an intermediate format.
In any case, pre-generating/caching the native code with profile-guided optimization seems ideal, giving you more time to perform expensive optimizations and also avoid repetitive re-compilation when nothing has changed about your usage patterns.
Platform-independent binary distribution formats with profile-guided optimization seem like clear wins for most applications that aren't currently using hand-written assembly. Re-compiling every time the binary is launched seems wasteful. In some domains, there's also a compelling case to be made for making the garbage collector optional.
The JVM is better for these workloads than JITted runtimes for dynamic languages, but I'd still stay Go, Rust, and C++ are better -- both in a practical sense of what tool to use, and in theory (AoT compilation, static types, and language control over memory layout).
It's not like people have been ignoring them on purpose. Plenty of code has been written in those languages, but the point of my post is that these pointer-rich workloads are even harder for them.
Java seems to lack value types on the stack, which results in a lot of extra garbage for language processors (something I learned the hard way)
(IIRC Guy Steele's famous "growing a language" talk over 20 years ago specifically advocated for value types in Java.)
Speed isn't the only important dimension -- they often trade speed for memory usage and startup time, and those become issues. As well as the weight/size/complexity/deployment of the runtime (i.e. Go is managed, but there's no separate runtime to deploy or configure). Also IIRC Java bytecode is untyped, so the JIT has do a lot of work to recover the types again, which is weird.
I think the JVM / CLR make a lot of sense for many server-side workloads. They haven't caught on much for tools deployed to people's desktops, I think for good reasons.
I agree for short lived executions like tools etc.
That said value types are coming to Java as part of Project Vahalla (and tangentially related to Project Panama) and AoT will be shipping as part of OpenJDK proper which would likely allow Java to be suitable for some of those cases.
I would say Rust has a distinct advantage for language processing tasks however, not just because it's small runtime, compact memory layout etc but also because the existing body of work w.r.t language projects is very rich and well developed.
JVM and .NET have had AOT compilation for 20 years now, even if not always available as free beer.
> I think the JVM / CLR make a lot of sense for many server-side workloads. They haven't caught on much for tools deployed to people's desktops, I think for good reasons.
How does go fit in with the other two? I hate that it somehow considered low level, when it is closer to JS in execution semantics/performance/everything.
With respect to performance and semantics, it's not closer to JS -- it's a typed language that uses types for AoT compilation, like C++ and Rust.
The GC is a big difference, but it's also a pretty good one from what I know.
The author of esbuild is understands software performance deeply (e.g. architected Figma in the browser in C++). He tried esbuild in both Rust and Go, and preferred Go for performance.
Go barely does any optimizations, and its GC is quite subpar if anything (it does optimizes for latency, while most other optimizes for throughput, for what it’s worth). It does have value types and pointers (so do C# and D for quite some time), but with a naive compiler it will generally sit at the ~2+x niche of C performance, which.. performant JITted languages also occupy, including JS.
OK, but what's your point? That Go isn't good for writing language processors?
For that problem, Go can be put in the family of Rust and C++, and IMO the GC is actually an advantage over them. As mentioned, ASTs are often graphs.
Putting Go closer to JS is just wrong, in an absolute sense, and relative to this problem. esbuild proves that Go is good for these workloads (again see those links). I think you're over-generalizing language performance without paying respect to the workload -- performance is very multi-dimensional.
On microbenchmarks, Go is probably 2x slower than C, but (1) that's VERY good, and (2) those benchmarks aren't representative of the pointer-rich AST workloads we're talking about here.
I have some beefs with Go myself, but it sounds like you just have some beefs with it that aren't all that relevant to the problem being discussed.
Go exists in a weird space. It's not the top performer but is respectable (as long as you don't pressure the GC too much) however it's also not very expressive.
This leads to it not being favoured as a high level language because it lacks the primitives to write very concise code. On the other hand it has a big runtime and a GC so it's too high level for many true systems programming tasks. This lack of high level language features coupled with it's higher level runtime means that it finds itself occupying a space between systems languages like Rust and C++ but "lower" than Python/JS/Ruby/Kotlin/Swift/etc.
Essentially it ends up competing with Java on the server and supplanting C/C++ for systemsy tools in fields it got to before Rust arrived on the scene.
That’s a good summary. I just see it way too often bundled together with low-level languages and it benefits no one to prefer/not prefer a language based on false knowledge.
You could frame it as a "failure of JITs in the 2010's". JavaScript isn't a good language to write the TypeScript compiler or a linter, because v8 isn't fast enough.
The semantics of JavaScript do not allow v8 to be fast enough.
IIRC this was precisely why the Dart project was started more than 10 years ago by the original v8 authors. They were spending a lot of time looking into why real world web page performance was falling off various cliffs in v8. They realized they needed to change the LANGUAGE in order to be able to write fast programs. A major use case was developing programs like Google Docs and GMail in the browser, which had to compete with native programs written in C++.
JITs are fast in common cases, but they not only have big costs in terms of memory (code storage) and startup time, but they're hard to ENGINEER with!
Similar story with Python tooling -- the linters, formatters, and type checkers are quite slow due to being written in Python. mypyc gives a bit of speedup, but it still uses the Python runtime.
Related story from yesterday: "Even the pylint codebase uses Ruff" (linter in Rust)
https://news.ycombinator.com/item?id=35035618
---
Emery Berger has memorable framing of this -- "Perl, Python, PHP, and JS are the irrational exuberance" languages.
https://www.sigarch.org/from-heavy-metal-to-irrational-exube...
That is, he says that in the 90's and 2000's, we thought that clock speeds would continually increase, and JITs would get better, and so we could design language semantics without regard to performance -- language that almost REQUIRE slow implementations.
(I think his take is about 50% true. The other 50% is that dynamic languages simply allowed people to produce popular and useful software at a greater rate, especially for the web, so we ended up with a lot of software written in dynamic languages! Doing web apps in Java vs. Ruby/Python/JS is a huge difference in productivity, and I'd say you often end up with a BETTER result, due to increased iteration / fast feedback.)
---
This also tracks with my experience with https://www.oilshell.org, where we reverse-engineered the shell in an experimental fashion with Python, and then evolved that implementation into a statically typed language that generates C++ (using MyPy, ASDL, and algebraic data types).
Oil Is Being Implemented "Middle Out" :https://www.oilshell.org/blog/2022/03/middle-out.html (and many other blog posts)
This core of the program is the elaborate and strongly typed "lossless syntax tree", which is basically what's used in linters and formatters.
Even though I've been using both C++ and Python for >20 years, I was a little shocked how much worse Python is for AST- and graph-based workloads.
I'm looking for references/measurements specifically on these types of workloads. I think a lot of papers about JITs are misleading with respect to them, or at least you have to read between the lines.
I'd say that Python and JS are 10x as slow as native code for "business" and "web app" workloads, and I've never had a problem with them in those settings. Quite the contrary, I've actually sped up poorly working code in static languages with Python. If you're within 10x of the hardware's performance, you're doing VERY WELL compared to "typical software", which has layers of gunk and can be 100x to 1000x too slow.
But bare Python and JS (no libraries) are closer to 100x too slow for ASTs and graphs. This is because of all the allocation and GC overhead -- in both time and space -- in addition to dynamic dispatch, etc.
(The funny thing is that Oil is now the most statically-typed shell implementation, even though it's nominally written in Python :) It uses fine-grained static types, where as shells written in C use a homogeneous "WORD*" representation in C, and strings with control codes embedded in them for "structure" and "types". I should probably write a blog post about that ...)
---
To shine some light on the other side, I'm still a bit skeptical of Rust specifically, because:
- memory management is littered all over the codebase.
- Borrow checking seems to work better for stateless/batch programs (it thinks about parameters and return values), but linters and type checkers for language servers are STATEFUL: https://news.ycombinator.com/item?id=34410187
- Many ASTs are actually graphs
- pattern matching can't see through boxing apparently ?
Also, the author of esbuild tried BOTH Rust and Go, and ended up with Go. IIRC it performed better because it didn't have deterministic destruction on the stack -- GC was more efficient?
It seems like a language with both garbage collection and algebraic data types would be nicer, but neither Go or Rust fit that description!
Rust makes a lot of sense for kernels and so forth, but for language processors -- especially stateful ones (which includes the Unix shell!) -- I think GC is still a big help. And we know already know how to make GC fast for that use case, i.e. you don't need a GC that scales to 1 TB of memory on 128 cores.