Hacker News new | past | comments | ask | show | jobs | submit login
How to rewrite it in Rust (michaelfbryan.com)
238 points by FBT 5 days ago | hide | past | web | favorite | 39 comments

We did a similar thing with a Scala -> Rust rewrite for the http://prisma.io query engine.

By rewriting small components and integrating them into the existing project using Javas native interface, our small team of 5 developers were able to pull off this massive rewrite in just under a year. The resulting code base is rearchitected in a few very important ways, but mostly follows the same structure.

And because we kept and evolved our old Scala based test suite, we have a very high confidence in the rewrite.

When Async/.await finally landed, we could switch over very quickly, and it has been a joy to focus on benchmarks and performance over the last month. Spoiler: Rust is faster than Scala :-D

I promise that this is asked genuinely and isn't some sort of veiled "gotcha!" (it's tough to tell on the internet sometimes); what was the reason for a change from Scala to Rust?

I ask because Scala already has a good type system and the JVM typically has good performance nowadays, particularly with something like GraalVM, so I am actually really curious to why you felt a Rust rewrite was a good idea.

Just some reason I might make a switch from Java/C# to Rust:

* you can keep memory use quite a bit lower * you can still sometimes get large constant factors of performance improvements over the JVM in some kinds of problem domains. If this means you can run on 1 server instead of say, 5, you have a much simpler infrastructure. * startup time, especially if you are doing 'serverless' or similar * tail latency - even a good GC language will have occasional long pauses. * data race protection at compile time * easier deployment - no need to install a jvm and keep it updated

Beyond this, generally a smaller payload and/or container size for the application. With dependencies it can get pretty big, and this can slow down deployments (of course Rust's build time is quite a bit longer at times that offsets this).

On the long pauses, I've built simulation server systems for multi-user in .Net and the stop the world GC for several seconds now and then was very painful practically speaking.

I agree with all of your points except the "startup time" and "easier deployment"; GraalVM is pretty sweet and produces nice, self-contained executables.

Prisma engineer here who's been part of the rewrite since the beginning. We tried GraalVM, but the binary size was huge and we anyways needed to write parts, such as the JDBC drivers in Rust, C or C++ due to GraalVM not being able to compile certain JVM code to a native binary.

We distribute the binary with our NodeJS package and hundreds of megabytes of binary size will not work that well for our users.

Oh! That's a valid reason; I'm actually surprised that GraalVM has issues not supporting JVM features, since I've been using it for Clojure code locally (though admittedly just for fun, nothing serious). Definitely makes sense if you're stuck rewriting things anyway, might as well just do it in a more modern language.

Thanks for the insight!

Yep. I remember how at least JDBC and JWT libs were needed to rewrite using a native language (which we did until some point!). And the GraalVM has a weird API that is not JNA/JNI when you want to use the native-image. It is not very robust system for our needs, but it was a good learning experience.

What we did in the end is we rewrote parts of the system with Rust, plugged that to the JVM package and we had all our tests ready to use. First the database connectors and at the same time the other part of the team was writing the graphql parsing in Rust. We could all utilize our Scala integration tests which was crucial for our success.

And btw. we still have our tests in JVM, although the rest of the stack is Rust now.

That's interesting, would love to read more (blog post maybe?) on issues with GraalVM.

I should add that the Rust community has been extraordinarily welcoming, and our existing Scala engineers were able to relatively quickly become proficient in Rust.

Huge shoutout to everyone working on Rust Async, especially the Berlin crew, who has been very helpful.

It'd be great to hear from your experience about using Javas native interface. AFAIK, using native call can have a performance impact: https://en.wikipedia.org/wiki/Java_Native_Interface#Performa...

I guess the plan what to make native is pretty important. Maybe you want to disclose more details about that?

Do you have any write-ups on this? I might be looking to do this for a large java codebase soon.

How many lines of code? (old/new)

The Rust code is here: https://github.com/prisma/prisma-engine/

And the old Scala codebase here: https://github.com/prisma/prisma/

The old codebase has parts in Rust, so counting the lines is not that straightforward.

This ability to incrementally add Rust to a C codebase is very useful for adopting Rust in established projects.

You don't actually have to rewrite everything on day one. You can stop writing new C code right now, and then gradually replace old code only when you need to fix it or refactor it.

Twice I've been part of a move from Javascript to Typescript that worked much the same. Both projects were large applications with several developers working on them. Both had been ongoing for a few years before the port started. In both projects we decided to write all new code in Typescript and convert JS to TS when we made any largish change to an existing JS file. In both cases it took around a year for us to hit > 90% all code being converted this way, and at that time we decided to actually make issues in our issue tracker for porting the rest, and then had the rest converted in a couple of months after that.

The big difference is however that JS and TS can live side by side on a file by file basis out of the box with the Typescript compiler, which makes it super easy to convert. You don't have this luxury with C and Rust of of the box, but serious kudos to the author for finding a way to do something very similar.

When converting C to Rust you usually have to do things on a module by module or compile artifact by compile artifact basis, which makes it much more challenging. You can however employ some sort of strangler pattern: https://docs.microsoft.com/en-us/azure/architecture/patterns...

> When converting C to Rust you have to do things on a module by module or compile artifact by compile artifact basis, which makes it much more challenging.

OP is essentially about proving the opposite. It does take a bit of setup to get there, but you can ultimately translate C to Rust on a function-by-function basis, and Rustify interfaces, data structures, etc. only gradually after nothing on the C side is relying on the older defs.

C++ would be more of a challenge - you need to forgo quite a few C++-exclusive features to end up with interfaces that Rust can work with. That's where an "artifact by artifact" approach might work better. Other languages would be roughly similar, with their heavyweight C FFI's.

> you need to forgo quite a few C++-exclusive features to end up with interfaces that Rust can work with

Luckily, many C++ projects do this already so they can be called from C.

Sorry for the ninja edit. I've updated my comment. I meant to discuss how it works out of the box with Typescript, but takes more work with Rust. Seriously impressive that it can be done though.

I did something similar for a large-ish C# project and F#. The company I worked for was mostly an F# place, but we had a fairly large legacy codebase written in C#; typically we had the pattern of "if you had downtime or need to fix a bug in the C#, just rewrite the C# code into F#".

Annoyingly, at least with the typical msbuild pattern, you have to be using the same language at the project level, but you can have as many projects as you want per solution (it's weird). So it's not as seamless as the JS/TS system, but overall it's not too bad, since you still can mix and match somewhat.

MSBuild assumes you want a 1-to-1 relation between projects and assemblies, so assemblies and projects are interchangeable. There is no hybrid C#/F# compiler than can produce one assembly out of source code from both languages, so you’ll always need at least 2 compilers to be involved, hence the need to have a csproj and an fsproj side by side in your code base.

Regardless, major kudos for your rewrite!

Yeah, I knew that actually; it's still annoying :), but after awhile you kind of get used to figuring out how to split up the stuff you're going to rewrite.

It wasn't just me; it was everyone on my team, and probably everyone in the company; I'm pretty glad they did that though; F# ain't perfect, but I like it a lot better than C#.

Is there any way to do this the other way around and integrate new rust code into non-trivial C project with non-trivial build system?

Yep, it sort of looks the same. This is how Rust entered Firefox, for example. The most straightforward way is to call `cargo build` from within whatever build system you're using for C.

> However, Rust has a killer feature when it comes to this sort of thing. It can call into C code with no overhead (i.e. the runtime doesn’t need to inject automatic marshalling like C#’s P/Invoke) and it can expose functions which can be consumed by C just like any other C function.

As we see below, you may still need to write some code to convert from C types to something that is more ergonomic to use in Rust. But the marshaling ABI-wise is minimal.

> Turns out the original tinyvm will crash for some reason when you have multiple layers of includes

It's actually crashing because there are no lines of code in the file, so certain data structures never get initialized.

> As we see below, you may still need to write some code to convert from C types to something that is more ergonomic to use in Rust. But the marshaling ABI-wise is minimal.

Exactly, as much as I love writing Rust code, there is nothing that is more frustrating that maintaining bindings from C to Rust. Then you'd have to create another idiomatic binding on top of it. Sure bindgen is great, but Zig's cImport and Swift's ClangImporter take this further.

They both use clang modules to tackle the first part. A autogenerated idiomatic solution would be great for Rust, but sadly it doesn't exist.

At some point, I figure you will always need human intervention to get actually idiomatic interop, whether it's C marshalling, JSON serialization, database persistence, etc.

> This is where build scripts come in. Our strategy will be for the Rust crate to use a build.rs build script and the cc crate to invoke the equivalent commands to our make invocation

Yikes - port the entire build system to cargo before you write a line of Rust. Now draw the reset of the owl!

Surely's there's an incremental path for the build as well? Perhaps if you're using CMake?

You can build Rust code as a static library and make the C build system consume that instead. This is the approach that librsvg took.

Moving C build to build.rs is not necessary. It's usually done only because people used to Cargo don't like bringing CMake along. If you were to publish this as a Rust crate, it'd be slightly easier for downstream Rust users to have one less external tool to install.

There's a `cmake` crate that you could use from `build.rs` the same way, or you could go the other direction and have your existing build system invoke Cargo or rustc.

Great guide, this looked surprisingly straightforward.

I came here to say the same thing. This is a great article on "Rust is safe for our org."

This seems exactly what remacs is doing with emacs.


What language will it be _next_ year?

Seeing how Rust is consistently at the top of Stack Overflow's most loved languages, I'd wager it's gonna be Rust again.

Next year is not literally next year.

Over last decade and a half it's been: Ruby -> Node.js -> Go - > Rust and something new will come along. But just like others on the list, they will all live and evolve together.

Ruby and Node are, for the most part, not playing in the same field than Rust. I have absolutely no problem seeing them living along each others for decades.

Longshot bet: It will be something old with a surpriing new use case before it's something brand new. Maybe one of the perls.

Hopefully Ada/SPARK.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact