The parser written in Go was both faster to compile and faster to execute than the parser in Rust. The Go version compiled something like 100x faster than Rust and ran at something around 10% faster (I forget the exact numbers, sorry). Based on a profile, it looked like the Go version was faster because GC happened on another thread while Rust had to run destructors on the same thread.
The Rust version probably could be made to work at an equivalent speed with enough effort. But at a high-level, Go was much more enjoyable to work with. This is a side project and it has to be fun for me to work on it. The Rust version was actively un-fun for me, both because of all of the workarounds that got in the way and because of the extremely slow compile times. Obviously you can tell from the nature of this project that I value fast build times :)
Can you work around this by using separate functions for the branches? This will have other benefits for compile time as well. Generally, small functions are better for compile time, because some compiler passes are not O(n), including very basic ones like register allocation.
For large switch statements I do this for readability reasons, because I try to keep my functions small.
> Based on a profile, it looked like the Go version was faster because GC happened on another thread while Rust had to run destructors on the same thread.
Have you tried using jemalloc? It can help a lot for situations like this.
Huh, I thought Rust already used jemalloc by default. I looked it up and it looks like it was removed relatively recently. When I was doing this experiment, I was using a version of Rust that included jemalloc by default so that 10% number already uses jemalloc. I remember this because I also thought of trying to speed up the allocator.
Like I said above, I profiled both Go and Rust and the 10% slowdown with Rust appeared to be running destructors for the AST. I think the appropriate solution to this would be some form of arena allocator instead of changing the system allocator. But that gets even more complicated with lifetimes and stuff.
> Can you work around this by using separate functions for the branches?
Yeah, I could have tried restructuring my code to try to avoid compiler issues. But this would have been even more time spent working around issues with Rust. Go was better than Rust by pretty much every metric that mattered for me, so I went with Go instead.
It's too bad because I was initially super excited about the promise of Rust. Being able to avoid the overhead of GC while keeping memory safety and performance is really appealing. But Rust turned out to be not a productive enough language for me.
If you're doing arena allocation in a compiler you might as well just leak all your allocations (which you could get with bumpalo with a 'static lifetime); then you won't have to deal with lifetimes at all.
> Yeah, I could have tried restructuring my code to try to avoid compiler issues.
Well, my point is that it would be good for readability to restructure your code in that way even in Go. 500-line functions are hard to read.
> But this would have been even more time spent working around issues with Rust. Go was better than Rust by pretty much every metric that mattered for me, so I went with Go instead.
I find the opposite to be true, especially for compilers. It's hard for me to go back to a language without pattern matching and enums (much less generics, iterators, a package ecosystem, etc.). The gain of productivity from GC and compile times is not worth Go's loss in productivity in other areas for me. But reasonable people can disagree here.
I considered this but it's a very limiting hack. Ideally esbuild could be run in watch mode to do incremental builds where only the changed files are rebuilt and most of the previous compilation is reused. Memory leaks aren't an acceptable workaround to memory allocation issues in that case. While I don't have a watch mode yet, all of esbuild was architected with incremental builds in mind and the fact that Go has a GC makes this very easy.
> 500-line functions are hard to read.
I totally recognize that this is completely subjective, but I've written a lot of compiler code and I actually find that co-locating related branches together is easier for me to work with than separating the contents of a branch far away from the branch itself, at least in the AST pattern-matching context.
> It's hard for me to go back to a language without pattern matching and enums
I also really like these features of Rust, and miss them when I'm using other languages without them. However, if you look at the way I've used Go in esbuild, interfaces and switch statements give you a way to implement enums and pattern matching that has been surprisingly ergonomic for me.
The biggest thing I miss in Go from Rust is actually the lack of immutability in the type system. To support incremental builds, each build must not mutate the data structures that live across builds. There's currently no way to have the Go compiler enforce this. I just have to be careful. In my case I think it's not enough of a problem to offset the other benefits of Go, but it's definitely not a trade-off everyone would be comfortable making.
With no exhaustiveness checking (also no destructuring, etc.)
I should also note that you can change the default stack size in Rust to avoid overflows, though there should be a bug filed to get LLVM upstream stack coloring working. It's also possibly worth rerunning the benchmark again, as Rust has upgraded to newer versions of LLVM in the meantime.
I don’t see the value in this level of thinking here. Why should any developer go through this much hassle when they have a perfectly good solution that, really, I’m not seeing any discussion about this that actually highlights issues in the approach to using ago for this sort of thing
I don’t think it’s that simple. I understand Go uses garbage collection but that doesn’t automatically mean Go doesn’t do compile time optimizations or is poor at CPU bound work.
While I understand GCs add overhead I don’t think that in and of itself means much here
This seems like yak shaving to me
A recently published article from a guy working at Discord throw a real flameware, because he was arguing about the opposite: Rust is a better and faster language then Go, but he based his assumption on very old version of Go (1.9) where the GC was way slower then in the current versions.
I’m not sure I really see the benefit of working around it when Go fit the use case as is
That would help immensely.
Was the Rust parser written by hand or did you use one of the parser frameworks (e.g. nom or pest) out there? nom, for instance, goes to great lengths to be zero-copy which would probably be a big benefit here.
In the past I tried using WTF-8 encoding (https://simonsapin.github.io/wtf-8/) for string contents, since that can both represent slices of the input file while also handling unpaired surrogates, but I ended up removing it because it complicated certain optimizations. I think the main issue was having to reason through weird edge cases such as constant folding of string addition when two unpaired surrogates are joined together. I think it's still possible to do this but I'm not sure how much of a win it is.
Sure, but different approaches are going to be more optimal for different languages.
I assume by zero-copy you mean that identifiers in the AST are slices of the input file instead of copies?
Yes. From the README:
zero-copy: if a parser returns a subset of its input data, it will return a slice of that input, without copying
Geal also makes claims that nom is faster than hand-written C parsers.
Nom comes with 'escaped' and 'escaped_transform' combinators. In theory it should be possible, with relative ease, to return a slice if there are no escape characters and an allocated string if expansion is required. Presumably you'd have to use a Cow<str> though.
Of course it is. My opinion (which is worth what you've paid for it) is that I'd just go for UTF-8 support. I can't remember the last time I've seen UTF-16 in the wild (thankfully).
Performance-wise the other thing that I'd keep in mind with rust is that in debug mode string handling is painfully slow.
Edit: here's the URL for nom: https://github.com/Geal/nom
- Go is more fun if you are trying to be productive and push stuff out. The programming experience feels fluid and there’s not much agonizing over small details; the language is simple and you don’t need to think as much about things.
- Rust is more fun if you have a focus on perfection. It offers a lot of tools for abstraction and meta programming. These tools can be challenging at times. I do think even with NLL you will find yourself fighting the compiler, trying for example to resolve how you can avoid overlapping a borrow with a mutable borrow in some complicated bit of code, but you definitely get a lot of nice guarantees in exchange. I also do find it frustrating when something as simple as passing the result up can end up being really tricky.
I also think it depends heavily on the type of program you are writing and how. I’ve certainly hit cases where I still don’t know the optimal way of structuring things. In other cases people have managed to help me figure out what I need to do.
Concurrent memory safety is a huge plus, without a doubt, though there are applications where its not enough and applications where its too much. I think that puts Rust in a spot where it has use cases where it is clearly the best option but many use cases where it is overkill. As an example, Go shines particularly well for servers thanks to Goroutines and the fact that many servers have a shared-nothing architecture these days.
> Rust in a spot where it has use cases where it is clearly the best option but many use cases where it is overkill.
Indeed. It would make no sense to switch to Rust if Python is good enough for the task. I was just arguing that once you've learned Rust, you can do pretty much anything you want with it without friction, and I personally wouldn't bother writing anything in Python nowadays, because I can write Rust as fast and get the static typing guarantees and sane error handling that comes with it.
> Go shines particularly well for servers thanks to Goroutines and the fact that many servers have a shared-nothing architecture these days.
Having done quite a bit of Go, I don't agree with you. It's way too easy to accidentally share data between goroutines, and then cause data race or deadlocks . The day they introduced the race detector, we found 6 data races in our code (a few thousand loc) and a few others in our dependencies, and in the next year we found two not caught by the race detector (because it's a dynamic thing, it can't catch all races). More than generics (which are being introduced if they don't change their mind like they did for error handling) Go really need something akin to Send and Sync in Rust. M,.or maybe like Pony's capabilities system, but Go definitely needs improvement on that front. Multithreading is a hard thing, and Go makes it too easy to use, to people without the necessary experience (because Go is so easy it attracts a lot of junior or self-taught devs) without safety net and this generally doesn't end well.
- Sorry your Go experience was bad. I can only say my anecdotal experience was the opposite. Mostly for shared nothing architectures, but I also worked on an MQ-style software in Go and had a relatively good time. I think things that are well-suited to CSP concurrency fare pretty OK. Rust could’ve prevented things like accidental concurrent map accesses, but it still can’t guarantee you are implementing concurrency correctly on the application level (from perspective of say, coherency or atomicity.) So for many apps I’ve written, even somewhat complicated ones, I don’t feel like Rust would always be the best option. To me Rust makes most sense when you really can’t ever afford a memory error. Web browsers seem like an obvious winning case.
I find TypeScript's interface flexibility to be pretty clutch when working with high-level code that deals with input that's.. Complex data structures. So I'm thinking configuration files, REST APIs, user/developer inputs. Having worked with many async paradigms, I also favor async/await for developing asynchronous business logic workflows. If you imagine your call graph for a certain workflow, and everything is written to use "async", you can imagine just drawing a circle around a portion of it and then easily "stamp out" more of those to occur in parallel. The way you can collect results and handle errors with the async/await paradigm is a bit nicer than working with channels and go-routines IMHO.
I like Golang for lower-level tooling and network services of course. It also has seamless support for parallel processing in process. The OPs project is something I would certainly look to Golang for.
Now on a tangent, C# has async/await and most of the speed but doesn't quite have the flexibility of TypeScript's interfaces and of course doesn't have the compile speed of Goglang. I would honestly use C#/F# a lot more or stuff if I thought it would fly in my work environment. Would love to work on some open source projects in C# to get it more exposure :)