
Building Fast Interpreters in Rust - mfrw
https://blog.cloudflare.com/building-fast-interpreters-in-rust/
======
buro9
It should be noted that this implementation has been in production since last
Summer and is powering the Firewall Rules ( [https://blog.cloudflare.com/how-
we-made-firewall-rules/](https://blog.cloudflare.com/how-we-made-firewall-
rules/) ) engine today.

This was one of the first major components to use Rust at our edge and we've
been really happy with it.

The flexibility this approach (to matching traffic) has given us... whilst we
also get the speed, memory safety... is just great. The speed at which we're
iterating on the firewall and other systems that use this is a joy to behold.
A lot of that speed derives from the confidence in this component.

We're also really pleased that whilst we have several proposals for
optimisations, none have yet needed to be prioritised as the performance is
great.

~~~
jimmy1
> . The speed at which we're iterating on the firewall and other systems that
> use this is a joy to behold.

How much of this do you attribute directly to Rust versus other factors like
seniority of engineering talent, clear communicated requirements,
organizational empowerment, etc? If you swapped out Rust for C, what would be
the impact on velocity? Is it a 1-4x multiplier, or larger?

~~~
buro9
Engineering resource is finite, the less time spent on working on the engine
the more time we can spend working on new features and other systems.

Before making this in Rust we experimented with a Go and also a Lua
implementation as we investigated the approach. The Rust code took the same
time to initially produce and was production ready very early and has been
rock solid and required virtually no maintenance since it was put into
production.

That frees up that engineer to work on other things, whilst also reducing
ongoing maintenance that was anticipated and so those engineers are also
working on other stuff.

I'd attribute a chunk of that to Rust... though it helps if you have great
engineers who are familiar with the concepts of parsers, etc working on these
things. The same may not be true if someone new to Rust was also new to the
concepts needed.

~~~
ryanworl
What were the differences between the Go and Rust versions that led you to
choose the Rust version in the end?

~~~
buro9
Given that I wrote the Go version and RReverser wrote the Rust version, I'm
going to go with: A better engineer wrote the Rust version.

But actually I was excited by Rust too. The memory safety, performance (the
Rust was faster) and the degree of control over how we could present the FFI
to the languages we needed to integrate with, in addition to how readable the
Rust was by comparison to the Go code (readable, but so much of it), and then
avoiding the GC... the Rust implementation was a convincing winner.

~~~
int_19h
In terms of readability, how much is it due to Rust having ADTs and pattern
matching over them?

~~~
gavia1
You can kind of "fake" ADTs in Go with interface{} and switches, but it isn't
all that nice.

Rust on the other hand with first-class ADT and pattern matching makes it a
pleasure! You also have all the nice advantages of the compiler checking that
all branches have been satisfied which catches a lot of bugs at compile time.

Go has it's place, and works great in those places, but I don't think this is
one of them. I'm pleased we chose Rust and this feels like a great use for it!

------
kibwen
This discussion of interpreters dovetails rather nicely with Catherine West
(kyren)'s new experimental Lua VM in Rust:
[https://www.reddit.com/r/rust/comments/awx9cy/github_kyrenlu...](https://www.reddit.com/r/rust/comments/awx9cy/github_kyrenluster_an_experimental_lua_vm/)

------
skybrian
In C, a common way to get good performance without JIT is with bytecode and a
giant switch statement. It would be interesting to see if that's viable in
Rust. How does performance compare to an implementation using closures?

~~~
saagarjha
A _better_ way to get good performance is to thread your switch statement,
which is hard to do explicitly in Rust last time I tried (maybe you could do
this if you mark functions as inlinable?).

~~~
skybrian
What do you mean by "thread your switch statement"?

~~~
ridiculous_fish
The "big switch statement" approach is for each bytecode instruction to
complete by jumping to a centralized dispatch location (i.e. the switch
statement).

The "threaded" approach is for each bytecode instruction to complete by
decoding and jumping to the handler for the next instruction.

Basically instead of "break" you have `goto handlers[nextIp->opcode].`

The advantages of threading are fewer jumps and better branch prediction
(since branch prediction is tied to IP). The disadvantages are slightly larger
code and compilers struggle to optimize it, since the control flow is not
structured.

~~~
filereaper
This method of design is called a Continuation Passing Style Interpreter. [1]

Here's a production version from OpenJ9's JVM ByteCode Interpreter. [2]

[1] [https://kseo.github.io/posts/2017-01-09-continuation-
passing...](https://kseo.github.io/posts/2017-01-09-continuation-passing-
style-interpreter.html)

[2]
[https://github.com/eclipse/openj9/blob/01be53f659a8190959c16...](https://github.com/eclipse/openj9/blob/01be53f659a8190959c1641eac20b0488739e94c/runtime/vm/BytecodeInterpreter.hpp#L4872)

------
filereaper
I _highly_ recommend folks to view this talk by the venerable Cliff Click
called "Bits of Advice For the VM Writer" [1]

Of particular interest are Continuation Style Passing Interpreters [2] and how
much of a speedup you can get by going to raw assembly.

The talk has lessons Cliff learned from his years on the JVM but those can be
adapted easily to other runtimes.

[1] [https://youtu.be/Hqw57GJSrac?t=341](https://youtu.be/Hqw57GJSrac?t=341)

[2] [https://kseo.github.io/posts/2017-01-09-continuation-
passing...](https://kseo.github.io/posts/2017-01-09-continuation-passing-
style-interpreter.html)

~~~
mlindner
Rust doesn't use a runtime so how is this relevant?

------
trace_next
I think C++ does closures a bit better than Rust, particularly when you have
lots of Arcs and Rcs. The clone dance gets old fast. However, Rust does
_everything else_ so much better, especially Cargo (here's hoping Buckaroo,
vcpkg, etc. gain adoption).

~~~
gpm
I'm not very familiar with C++ closures, but I agree with rust closures having
room for improvement for a different reason. Lack of fine grained control over
copy/move.

    
    
        let settings = thing_it_wants_to_take_by_reference;
        let channel = thing_it_needs_to_take_ownership_of;
    
        let closure_that_takes_everything_by_reference =
            || { some_code(settings, channel) };
    
        let closure_that_takes_everything_by_move =
            move || { some_code(settings, channel) };
    
        let closure_that_does_what_I_want = {
           let ref_settings = &settings;
           move || { some_code(ref_settings, channel) }
        };
        
        let closure_with_hypothetical_syntax =
            || [move channel] { some_code(settings, channel) };

~~~
steveklabnik
This was a fairly explicit design decision, because these cases come up so
rarely. It's our opinion that it's not common enough to be worth increasing
the complexity of the language.

You don't even need it in this case; this will compile, for example:

    
    
      fn main() {
          let settings = String::new();
          let channel = String::new();
        
          let closure_with_hypothetical_syntax =
              || { some_code(&settings, channel) };
      }
            
      fn some_code(settings: &String, channel: String) {
        
      }
    

The only difference is the & on settings in the closure. This moves channel
but not settings.

~~~
gpm
Wait... hmm... now I'm going to have to track down some code where I did this
trick and figure out why or if I was just being stupid.

~~~
steveklabnik
It’s cool! In general, the default is “rust tries to capture each thing the
way it needs to be” and you can use move to override that. Sometimes, you _do_
have to do what you wrote, at least in theory. I’ve never actually had to.

~~~
chrismorgan
If your closures need to be 'static, it’s common in my experience to need to
use `move`, and thus to need to clone types like Rc<T>. It depends on whether
you’re dealing with Fn, FnMut or FnOnce, too.

------
azhenley
Very nice article. It describes writing an interpreter in Rust for a DSL for
Wireshark-like filters.

The source code:
[https://github.com/cloudflare/wirefilter](https://github.com/cloudflare/wirefilter)

------
halayli
I am surprised ebpf wasn't mentioned. it's made for exactly that and you get
bonus features like lite weight JIT and safety given that it supports a strict
subset of asm commands that are deemed safe.

The arguments against JIT aren't backed by any prototypes or proofs. it felt
like the author has a bias towards dynamic dispatch and walked backwards
towards it.

There is no mention how frequently these filters change nor the possible perf
advantage that you'd get from running compiled expressions that would
compensate for the accumulated compile time of these filters over time.

Often you need 2 approaches together and you switch from interpretation to JIT
after a condition. (in db case after the same query shows up multiple times
for example). In their case it can be something as simple as if filter hasn't
updated in x time then consider it stable and compile it.

On the other hand, PGs dynamic dispatch is a fun read just like any other code
in PGs repo.

[https://github.com/postgres/postgres/blob/master/src/backend...](https://github.com/postgres/postgres/blob/master/src/backend/executor/execExprInterp.c)

~~~
buro9
In the other blog post (linked from the top of this blog post) I wrote about
the motivation for using the Wireshark-like syntax and how we also have a few
customers with a lot of rules (tens of thousands and greater).

That said, it seems obvious to us that now we have a Rust library that can
parse a Wireshark-like syntax into an AST... that we don't have to just
perform the matching in Rust. i.e. that we can ask the library to produce
translations of the expression as SQL (for our ClickHouse), GraphQL (for our
analytics API), or even eBPF.

We can't run everything in eBPF, but we could check the list of the fields
within an expression to see whether it could be run in eBPF, and then look at
heavy hitter rules and promote the ones doing the most work inside L7 to be
eBPF and to run in XDP.

Even if we don't do this for customer configured rules, this might be
something we do for handling denial of service attacks using the same
Wireshark-like expression syntax throughout our system.

~~~
halayli
Got it. Just to clarify, I didn't mean to implement the filters using ebpf
inside the kernel but rather use ebpf's engine as a lib at the application
layer.

------
reitzensteinm
Instead of a JIT being the next step, did you consider vectorizing the
execution?

Amortizing the cost of dispatch over multiple packets means that under heavy
load, the performance of the system should be fairly close to what you could
JIT, but the system would be much more simple.

Of course, this only helps with the worst case where a backlog is starting to
build, but it at least reduces the worst case.

~~~
RReverser
This is not executed on packets (like it would be in Wireshark), but rather on
requests defined in the application layer as a set of properties.

~~~
reitzensteinm
Ah, makes sense!

------
omaranto
I bet a JIT in LuaJIT would have been easy and fast: compile to Lua code in
strings, call "eval" (actually load() in Lua, but the name eval is better
known).

------
dgellow
> While (for a good reason) this might sound scary in unsafe languages like C
> / C++, in Rust all strings are bounds checked by default.

What’s the good reason, if I may ask?

~~~
masklinn
I'm guessing because an off-by-one or an extra skip might mean you miss the
end of the string and go off into la-la land feeding whatever garbage happens
to be in memory to your parser? That would mostly be a C issue (as it has no
string abstraction at all).

~~~
htfy96
They can use str.at() in C++ to ensure boundary safety

~~~
RReverser
Boundary safety is only one of many possible issues. For example, Rust makes
it easier to work with arbitrary slices (not copied substrings) while
statically verifying ownership of the original string.

~~~
htfy96
With static analysis most misuse can also be detected:
[https://godbolt.org/z/UE-Mb0](https://godbolt.org/z/UE-Mb0), which provides
more flexibility than Rust even with NLL

~~~
roca
I'm not aware of any evidence that C++ Lifetimes will detect "most misuse". It
will detect some common lifetime errors. It will certainly fail to detect
other common lifetime errors. [https://robert.ocallahan.org/2018/09/more-
realistic-goals-fo...](https://robert.ocallahan.org/2018/09/more-realistic-
goals-for-c-lifetimes-10.html)

Since it fails to detect common lifetime errors, it's not surprising it's more
"flexible" than Rust NLL, which prevents all lifetime errors.

~~~
htfy96
The true negative in the post has been fixed. Rust also doesn't prevent all
dangling references as long as you depend on some unsafe third-party
library/external library/libc calls and use them in an improper way.

------
ram_rar
I wonder, how does this compare to an implementation in bpf. its great to see
Rust getting traction in systems / networking world.

------
ori_b
I am surprised so few interpreters are written in garbage collected languages,
which would give the interpreted language a GC for free. The main interpreter
loop (or jitted code, for that matter) shouldn't allocate, so there is no
inherent cost to the approach. And a tuned, mature GC is likely better than
yet another MVP reimplementation.

~~~
chrisseaton
> would give the interpreted language a GC for free

That's great if the language you're using has the same GC semantics as the
language you're implementing, and a pretty major problem if it doesn't.

~~~
ori_b
Yes, it would.

What languages have materially different GC semantics, other than Python on
the cpython vm? (PyPy loses the fast collection at the end of scope for
efficiency reasons)

~~~
__s
Lua's got an annoying case of metatables directing whether a table's
key/values are weak references or not (__mode & __gc)

~~~
lpghatguy
Lua also has the nasty case that __gc can throw!

