Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> We don’t have any memory safe techniques for the output of JITs (or compilers by the way)

That's not quite true.

BTW there is a writeup by Exodus Intelligence here:

https://blog.exodusintel.com/2024/01/19/google-chrome-v8-cve...

It's a horrifyingly complicated vulnerability that takes 28 printed pages to explain and one line of code to fix. I don't even want to know what kind of effort it took to discover this one.

The bug is not directly caused by V8 being written in C++. It's a miscompilation, and those can happen whatever language your JIT is written in.

But. But but but. It could be argued that the problem is indirectly related to it being written in C++. It's a tricky argument, but it might be worth trying. I will now try and make it. Please be aware though, that I am commercially biased, being that I work part time at Oracle Labs on the Graal team who make a JS engine (open source though!). It should hopefully be obvious that these words and thoughts are my own, not indicative of any kind of party line or anything. So, yeah, take it all with a solid dash of table salt.

One of the reasons a higher level language like Java or even Rust is more productive than C++ is the capacity for building higher level abstractions. Specifically, Java has very good reflection capabilities at both runtime and also (via annotation processors) compile time. High level and safe abstractions are one of the key tools used to make software more robust and secure.

If you wade through Exodus' explanation, the core of the bug is that whilst implementing an optimization in very complex code, someone forgot to update some state and this causes the generated IR to be incorrect. This sort of bug isn't surprising because V8 has several different compilers, and they are all working with complex graph-based IR in a low level and fairly manual way.

In the beginning the GraalJS engine worked the same way. JS was converted into bits of compiler IR by hand. It was laborious and slow, and unlike the Chrome team, the engine was being written by a little research team without much budget. So they looked for ways to raise the level of abstraction.

There are very few fast JS engines. There's V8, JavaScriptCore, whatever Mozilla's is called these days, and GraalJS. The latter is unique because it's not written in C++. It's not even just written in plain Java (if it was, this argument wouldn't work, because the vuln here isn't a memory corruption directly in V8's own code). GraalJS is written in something called the "truffle dsl" which is basically using Java syntax to describe how to create fast JIT compiling virtual machines. You don't directly implement the logic of a VM or compiler for your language in this DSL. Instead you express the language semantics by writing an interpreter for it in Java whilst also using annotations, a class library, a code generator that comes with the framework and a set of compiler intrinsics to define how your language works and also (crucially) how to make it fast through tricks like specialization. The code of the interpreter is then repeatedly fused with the data/code of the program being interpreted, partially evaluated and emitted as compiled machine code to execute.

This process doesn't guarantee the absence of miscompilations. Obviously, there's still a compiler involved (which is also written in Java and also uses some self-reflection tricks to be written in a high level way). But fundamentally, you aren't manually manipulating low level IR to implement a high level language. Nor are you writing several compilers. There's one compiler that's general enough to compile any language.

There is a bit of a performance sacrifice from doing things this way, in particular in terms of warmup time because partial evaluation isn't free.

But this does let you minimize the amount of code that's doing risky IR transformations! It's about as close as you can get to a "memory safe technique for the output of JIT". A JS engine built using this technique cannot corrupt memory in the JS specific code, because it's not emitting machine code or IR to begin with. That's all being derived from a reflection of the interpreter, which is itself already memory safe.

Also, obviously, working with complex graph structures in C++ is super risky anyway, that's why Chrome uses a garbage collected heap. V8 is super well fuzzed but if you don't have Chrome's budget, implementing a JIT in standard C++ would be quite dangerous, just from ordinary bugs in the parser or from UAFs in graph handling.

Like I said, this is kind of a tricky and perhaps weak argument, perhaps analogous to arguing about safe Rust (which still has unsafe sections), because there are still graph based optimizations in Graal, and they can still go wrong. But you can at least shrink those unsafe sections down by a lot.



That could be and that analysis really is quite long. I’ll have to take your word on that as I don’t have the experience you do. I do know that most JITs don’t have to even worry about this as they don’t need to worry about running untrusted code (not sure about the real world deployments for GraalJS). It’s also true that v8 has a lot more usage and vulnerabilities in it like this are a lot more valuable. So it can be hard to compare and contrast how different approaches impact vulnerabilities (+ comparing vulnerabilities is really difficult). But generally it is true that higher level abstractions can reduce certain classes of defects (not sure if this one falls into that but certainly others).

All that being said, the changes you suggest sound more like a design choice than something specific to a language. V8 has a complex build process and an object graph that should let you get the necessary compile time and runtime reflection capabilities.

Anyway, I think we can both agree that compiler research generally assumes trusted inputs but there’s not much research in building robust compilers and JITs that are secure against malicious input. We know generic techniques like fuzzing but no really robust designs (in terms of the level of protection we know related to memory safety)


Graal/Truffle languages can support sandboxing and GraalJS does. So it's designed to run untrusted code.

I think in this specific case it's really hard to say. It's sort of on the borderline. But we can imagine many other closely related cases where the higher level abstraction would help.

You could potentially do something like Truffle with C++ and LLVM, but I don't think it'd be easy. The grain of the language works against you. It's interesting if V8 already has added some of the infrastructure necessary.


You'd be surprised how much friction you'd have for C++ reflection. First, since it's a custom build step, you can do a mix of custom code gen and C++ constexpr/consteval for static reflection. Here's a header-only implementation for adding compile time reflection purely within the language [1]. And v8 already does dynamic code gen as part of its build process (to generate the snapshot to speedup instantiation of the isolate).

Some amount of dynamic reflection is also a must since JS is a dynamic language with reflection support + you need to walk the GC tree.

Of course none of this necessarily exists within the compiler part of the codebase as it's unlikely to be needed there, but clearly doable.

I don't know the specific details of reflection needed for the abstractions you reference and clearly V8 is still doing some amount of manual IR generation, so it's possible it would be a substantial investment to actually retrofit those techniques into v8 (+ the current capabilities may exist in the wrong part of the codebase for applying it to the JIT itself).

One would have to do a careful analysis of historical security exploits & specific techniques and their ability to prevent to figure out if it's worth adding those abstractions (especially since there is a potential performance tradeoff as you mention). As I said, I think there's insufficient research in this area to establish a compelling body of best practices (not to take away from the contributions of the GraalJS team to this space).

[1] https://github.com/veselink1/refl-cpp




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: