Hacker News new | past | comments | ask | show | jobs | submit login
Eliminating Data Races in Firefox (hacks.mozilla.org)
470 points by caution 16 days ago | hide | past | favorite | 119 comments

> What is ThreadSanitizer? ThreadSanitizer (TSan) is compile-time instrumentation to detect data races according to the C/C++ memory model on Linux. It is important to note that these data races are considered undefined behavior within the C/C++ specification.

TSan, ASan, UBSan - if you are writing code and your toolchain supports these, you should use them. Over the past 6-7+ years I've used them, I have never seen a false positive (I'm sure some have occasionally existed but they seem rare in practice).

> However, developers claimed on various occasions that a particular report must be a false positive. In all of these cases, it turned out that TSan was indeed right and the problem was just very subtle and hard to understand.

Yes, I have seen this phenomenon!

Address Sanitizer -- never seen a false positive. Ever.

UBSan -- again never had a false positive.

Thread Sanitizer does have false positives in some circumstances. They're hard to verify to be falsy and in many ways it's better to refactor to eliminate the complaint because you'll end up with better cleaner code. For example, Qt uses those atomic fences that the article describes [0] [1].

[0]: TSan does not produce false positive data race reports when properly deployed, which includes instrumenting all code that is loaded into the process and avoiding primitives that TSan doesn’t understand (such as atomic fences).

[1]: https://lists.qt-project.org/pipermail/interest/2014-March/0...

I have seen false positives in ASAN.

In Firefox, media threads use a reduced stack size (128 kB), due to, historically, 32-bit address-space concerns with lots of media elements on a page. Opus, in its default configuration, requires a fair amount of working stack memory (a few dozen kB in some cases). "A few dozen" is much smaller than 128 kB, but apparently enabling ASAN increases the stack usage, quite a bit. So we got crashes under ASAN when it overflowed the stack that would not have been possible without ASAN enabled.


Address Sanitizer inserts sentinels on the stack to watch for certain kinds of dangerous stack behavior. I would be willing to bet that the stack overflow could occur without ASAN and it's not a false positive.

You are right to be concerned. The primary culprit is the use of C99 variable-length arrays (or alloca when those are not available). I can see why ASAN would want to pad those.

Fortunately, the allocation sizes of these arrays are tightly constrained by the limits of the Opus format (this is not an accident: I helped design the format). I performed a careful analysis (by hand) to ensure the maximum possible stack usage (without ASAN) was not anywhere close to the size allocated for the stack. It is of course possible that I made mistakes, but I tried very hard not to.

The broader point is that this is an entire class of memory access error that ASAN cannot properly validate due to its design. Because it changes the system under test in ways that differ from a real deployment, I cannot use it to mechanically validate my hand analysis.

It is worth noting that valgrind does not have this problem (because it instruments the original binary, instead of making compile-time changes), and is also used by Firefox in testing.

Configure using larger stacks when building with ASAN since this is a known problem?

The sanitizers are useful enough that I’ve found adding small targeted changes to support those modes to be valuable vs the increased risk of not testing what you run.

Address Sanitizer isn't a tool that solves every problem. But it solves well the problems it's intended.

I'm assuming you're still talking about Firefox running Opus [0]? I'm not familiar with it. But I do think that any platform that can run Firefox and wants to listen to audio should have a lot more than 128k lying around. Otherwise your user isn't using the right tool for the job.

I understand memory constraints can be tight and it feels like there's 128k of space right there that you can use. And to achieve that you've written it in C and manually calculated the stack space needed. It feels awesome to have that kind of knowledge and experience with opportunity to leverage it! It's great that you've helped design a media format and implement something to process it!

But Firefox wastes tons of megabytes just to load, store, and transmit telemetry, preferences, search providers... the list can go on. Do you really want to try to convince me that a 128k slab for audio that the user specifically loaded is too expensive in the face of all of the telemetry loaded by Firefox these days?

Why add all the maintenance headache when you can instead allocate a single slab for your own use? If I were to write it and manually optimize then I'd use an allocated slab. And, as a bonus, the slab's ownership can move to any other thread if needed. And it would make it clean in Address Sanitizer. And, even more important, I wouldn't have to worry about what happens if the frames before my entrypoint changes. For example, Firefox's software design could change and adds a couple of wrappers around the manually-sized stack. What happens if I want to compile Firefox on a different platform which doesn't permit a manually-sized thread stack?

So it sounds like you've written something that's going to be very difficult to maintain if you were to vanish. And you've written it for a software product where that level of optimization is, perhaps, better done automatically by tools available to you.

If you build a standalone media player specifically for Opus then that'd be pretty cool. But Firefox shouldn't have pieces that are difficult to maintain.

[0]: https://hacks.mozilla.org/2018/10/introducing-opus-1-3/

I've gotten false positives in ASan when trying to instrument a Win32 application code compiled in MSVC using MSVC's then-experimental ASan support, using the wrong combination of the maze of ASan libraries (static library combined with dynamic MFC). And I also get startup crashes when running MSVC ASan binaries in Wine. But it's obscure beta use-cases and some may be fixed by now.

It would probably be fairly easy to change Qt's Mutex implementation to be TSan-compatible and only do so for TSan builds (by swapping out the fences for atomics when building with TSan). This is what we did in Firefox.

UBSan has "false" positives, in that unsigned integer overflow is not C/C++ UB. They've decided to warn about it anyway, and sometimes it could be a bug, so that isn't totally unreasonable.

only clang (not gcc) ships ubsan with support for checking unsigned integer overflow. clang's ubsan does not enable it by default (it must be explicitly enabled).

As a result, it's not clear that it's a false positive in the technical sense here (as one would need to enable checking for unsigned overflow explicitly).

In my opinion, the main limitation of TSan is that it cannot find all possible races. (An impossible task so I think TSan is a pretty great tool to get to the point that this is the main limitation)

As a dynamic tool, if a race is observed during execution by TSan, the race is very likely to be real. But it is impossible to "observe" execution for every part of a large code base, and so real races are likely still hiding in some edge case that TSan did not execute.

Static Analysis tools are interesting in that they have the opposite problem. Static tools can detect races across the entire code base, but without runtime information it is difficult to know if the races are real. This leads to more real races being detected, but also comes with more false positives.

There is a lot of interesting working being done on data race detection and how we can bring these two sides closer together to find more real races, without overwhelming developers with false positives.

This is absolutely true and hence we combine not only our tests with TSan, but also fuzzing, to explore even more corner cases.

On the static vs. dynamic side, I would always opt for the dynamic when it can guarantee me no false positives, even if the results are incomplete. It is pretty much impossible to deploy a tool that produces lots of false positives because developers usually will reject it at some point and question every result.

> In my opinion, the main limitation of TSan is that it cannot find all possible races.

In my experience TSan finds all possible races if it's built with unit tests and unit tests provide enough coverage. If unit tests don't provide enough coverage then there are likely other bugs lurking beyond just data races.

I've never actually used TSan, so please correct me if I've missed something significant about how it works, but doesn't TSan need to observe the race happening in order to detect it?

The code coverage analysis I've seen in the past has only evaluates what code is run at least once, but how do you get coverage analysis to evaluate the coverage of what code runs concurrently with what other code effectively enough to get confidence that you'll detect data races?

For example, if you've got a cache that safely handles concurrent use for most operations, but has a data race when two threads both cache miss on the same key within a short window of each other, are there kinds of code-coverage analysis that could theoretically point out that none of your tests involve a cache miss of the same key in two different threads?

I've seen concurrency permutation testing tools, that can reorder the sequence of operations between different threads searching for a permutation that causes a data race, but that relies on actually executing the potentially-racy behaviour at all during testing, right?

I guess this is also what fuzzing is about, but are there tools or techniques you can use to increase your confidence in how well your tests cover the space of concurrent use of your data structures?

TSan uses run-time instrumentation, yes.

Unit tests rarely cover high-load concurrency scenarios. I used to think unit test coverage was important, but I've found using the type system to make errors impossible is a much more effective way of lowering the defect rate.

They're not mutually exclusive concepts. :)

Given the project management triangle they kind of are - the goal should be to achieve the required defect rate at minimum maintenance cost. A lot of people talk as though more tests or more verification is always a good thing, but zero defects is usually not a business priority if it means slowing down feature work. So adding more different kinds of verification should be a last resort, only when you actually require a lower defect rate than you can achieve with one method alone - which is pretty rare IME.

> the goal should be to achieve the required defect rate at minimum maintenance cost

I disagree. Maintenance cost doesn't need to be minimized. It needs to be reasonable.

The goal should be to produce high quality software. High quality software is easy to understand and easy to instrument with tools.

Much as I enjoy writing high quality software, it's irresponsible to spend time or money producing something higher quality than is justified by the needs of the business. Generally I find that unit tests for anything other than actual logic (which is a tiny fraction of most codebases) don't offer enough benefit to be worth the costs.

(I'd also argue that unit testing can reduce the quality of the software as it makes the internals of the architecture more costly to change).

> Much as I enjoy writing high quality software, it's irresponsible to spend time or money producing something higher quality than is justified by the needs of the business.

I think it's more irresponsible to produce something that the customer isn't going to be able to use for many years or that needs a patch before it's even released.

The responsible approach is the one that delivers the best cost/benefit for the customer. Yes, it doesn't mean producing something that's completely unusable - but it rarely means producing something completely bug-free either.

Unit tests tend to be written to run without parallelism, or at least with less parallelism than the normal app has.

Not my unit tests. :-)

I agree, good code coverage can probably get most of the way there. But there are some problems with relying on coverage.

Coverage can be misleading. Maybe there is some function being executed in parallel:

  int global;
  void foo() {
    int local;
    if (complex_condition)
And assume maybe somewhere way down the call chain the data passed to bar is modified. The bar function and all of the functions called in bar can have 100% test coverage, but if the else branch is not covered, the race will be missed.

So without true 100% test coverage, it is possible races are being missed by dynamic tools, and true 100% test coverage is probably not possible or feasible in large complex projects.

> but if the else branch is not covered, the race will be missed.


> true 100% test coverage is probably not possible or feasible in large complex projects.

I have yet to encounter a single large complex project where 100% test coverage is not possible. I have encountered such projects where 100% test coverage required massive refactoring of the code to facilitate testing and doing so resulting in fixing most customer complaints received.

So you truly believe that Firefox or Windows or Ubuntu could not only be 100% covered by unit tests, but that it would actually be worth taking the time to do so? (edit: typos)

As someone who has worked on the code of Firefox, I strongly doubt that.

For instance, pretty much every piece of code (whether native or JS) I've seen is programmed defensively, assuming that for some reason invariants can be broken - something that typically cannot happen unless there is a memory corruption involved. Testing all these defensive programming branches is not possible without introducing memory corruptions.

Yes, I firmly believe that Firefox, Windows, and Ubuntu can achieve 100% unit test coverage of software. I truly believe it would be worth its while.

As Yoric mentioned, that would require some custom Qemu patch to corrupt memory, some test boxes with dodgy RAM, and/or other things along those lines.

Removing those "impossible" code branches is a bad idea. "Impossible" things happen millions of times a day to Firefox.

I used to work for LimeWire, which had tens of millions of users. We were sometimes seeing bug reports of "impossible" stack traces. They went down dramatically when we added a splash screen that calculated SHA-1 hashes of all of the DLLs and JARs before using the reflection API to load and start the actual application. Given that the "corrupted" JARs were still presumably passing the JARs' CRC-32 checksums and bytecode verification at class load time, I think we were most often just detecting dodgy RAM that would later have caused "impossible" errors by corrupting bytecode or native code after bytecode verification.

Writing defensively is great! There's nothing wrong with that.

If you're writing code to defend against "dodgy RAM" whose symptom is a corrupt SHA then it's super trivial to test that code: have a valid JAR file and adjust its content so that it will still "load" but isn't the file you expect. Then verify that the program refuses to load the JAR.

> have a valid JAR file and adjust its content so that it will still "load" but isn't the file you expect. Then verify that the program refuses to load the JAR.

Sorry for being a bit slow today. but what's the difference between "load"ing and loading? You clearly understand that you're using two different definitions of load, otherwise you wouldn't have put one in quotes. (Also, it would be self-contradictory if the two were identical.) However, I can't for the life of me figure out two different definitions for "load" which both involve CRC-32 and bytecode verification checks, and yet aren't identical.

If I'm correct, we saw a dramatic reduction in the "impossible" errors not because we were detecting the actual problem that caused the errors (corruption of internal JVM buffers after bytecode validation), but instead because our SHA-1 failures (loading chunks of files into our own buffers and checksumming those) were highly correlated with the "impossible" failures because they had the same underlying cause.

In our case, the JVM loaded the code from the JAR, which, unless I misunderstand Sun/Oracle's classloader, means that both CRC-32 checksums and bytecode verification pass, even for unsigned JARs. The chances of random bit flips passing CRC-32 checks are 4 billion to one. The odds of random bit flips passing both a CRC-32 check and bytecode verification are probably at least quadrillions to one. I'm pretty sure the corruption was happening minutes to hours after the classes actually got loaded by the JVM.

In other words, I think we were getting saved by SHA-1 summing a bunch of large files being a poor-man's memtest86: writing semi-random-ism patterns to big chunks of memory and verifying you can read back what you wrote. Failing this correlates heavily with seeing "impossible" errors at runtime, but we weren't directly examining the classloader's internal buffers that were the actual cause of the errors.

On a side note, modern OSes should really have a low rate memory checker built into the kernel's memory manager. It's a crutch, but cheap consumer IoT devices and WiFi APs aren't getting ECC any time soon.

The difference is that "load" means that the file is otherwise acceptable (valid header, valid content) but that it's been adjusted so that the checksum isn't what's described in the manifest.

That is:

1) the JAR file successfully loads by the JVM -- as in it will "load"

2) the JAR file doesn't pass the SHA checksum requirement you've added -- as in it isn't the file you expect.

Even disregarding tne very good points about code defending from memory errors mentioned by others, I imagine a realistic time frame for such coverage for a project the size of Firefox or Windows to be no less than a decade, with 0 other development going on. I find it absurd to imagine it would be worth it, even if it reduced bug count to 0 (which it wouldn't).

> Yes, I have seen this phenomenon!

I've had this happen to me personally. Looking again and again at the tsan error and the code it pointed to, I was absolutely sure it was a false positive. Thankfully I got a friend to look at it and after a good night's sleep he was able to explain to me how exactly the code was wrong.

Tl;dr: It is extremely hard to write correct threaded code in C/C++, and it is quite hard to recognize errors even when they are pointed out to you.

Two pieces of information that I found interesting:

* Rust had poor support for sanitizers ("Rust, which has much less mature support for sanitizers)". Work was done to improve this, but it's not clear how well it works now - it would be worth a blog post in itself IMO. This was surprising, because at least I assumed that it works flawlessly thanks to clang.

* Rust / C++ bridges are predictably problematic and weaken Rust's guarantees ("races in C++ code being obfuscated by passing through Rust"). This is the problem to solve for most companies working in a traditionally C and C++ oriented domain, as the C* code will dwarf the Rust code. Because of the shared memory space any problems in that code will be just as critical as if the entire code was unsafe. It would be quite awesome if Rust were able to isolate unsafe code not just at compile time, but also at runtime, by e.g. offering the option of executing the unsafe code in a separate address space. This would require some manual work for moving the data back and forth, but it could offer the tools to support such a work mode e.g. unsafe blocks could take parameters.

I wish Mozilla would be more transparent about the absolutely normal and typical difficulties they must be encountering in a mixed code base. I have the feeling that there's plenty of them, but they don't want to emphasise them in order not to discourage Rust adoption.

In terms of Rust guarantees with C* code; the Rust parts of the code will be fine. Which means if you have an issue, you can easily isolate it to the C* code (probably doesn't help a lot). You can probably isolate things if you use IPC (I believe there is a few crates for that), but it wouldn't prevent the result from being wrong, only from unsafe behaviour stomping the address space. You could likely avoid copy if you relied on SHM for larger data patterns.

On the other hand, integrating Rust and C* isn't terribly hard, there is automated tools that generate bindings for C/Rust so you don't really have to think about it much other than unpacking the unsafe data from C*.

I worked on a mixed C++ and C# codebase where we switched from in-process calls to IPC for that very reason, taking the efficiency hit. Having the .NET runtime running in the same address space confused any sanitizer or leak detector. You rely on those tools to manage a large C++ codebase in practice.

Also, because the .NET GC churns through a lot of memory, statistically almost every problem on the C++ turned up on the C# side first, sometimes with very confusing effects. For example, .NET has a compacting GC, so an object can get corrupted and then randomly moved somewhere else. We didn't have that kind of problem often, but when we did, it was pure hell to debug.

> Overall Rust appears to be fulfilling one of its original design goals: allowing us to write more concurrent code safely. Both WebRender and Stylo are very large and pervasively multi-threaded, but have had minimal threading issues. What issues we did find were mistakes in the implementations of low-level and explicitly unsafe multithreading abstractions — and those mistakes were simple to fix.

> This is in contrast to many of our C++ races, which often involved things being randomly accessed on different threads with unclear semantics, necessitating non-trivial refactorings of the code.

And when it comes to finding bugs in your unsafe concurrency primitives in Rust programs, there is a tool called Loom that helps with that, as a powerful complement to what ThreadSanitizer provides: https://github.com/tokio-rs/loom . You can think of Loom sort of like QuickCheck, except the permutations are all the possible thread states of your program based on the memory orderings that you have specified.

Here's a video that briefly talks about what Loom can do https://youtu.be/rMGWeSjctlY?t=7519 (it's part of a larger video about memory orderings, and is a great introduction to the topic).

Loom is really awesome, though it is focused on exhaustive testing, so not suitable for code that has a lot of possible interleavings (e.g. due to a ton of threads, or a large body of code).

There is a new project out of AWS called Shuttle [1] which is like Loom, but it does random exploration instead of exhaustive exploration, which enables massively distributed testing of really complicated stuff.

[1] https://github.com/awslabs/shuttle

Excellent, thanks for the tip! I have to ask though, given a tool that does exhaustive testing (like Loom), shouldn't it be trivial to adapt it to do random testing merely by randomizing rather than enumerating the test inputs? Is there something more going on that I'm not seeing?

Is there anything like Loom or Shuttle for C/C++?

Or are these only implemented in rust and they'd still work with C/C++?

I vaguely recall that Loom is an implementation of a research paper that purported to implement this for C/C++, but I don't know if that implementation is available/production-ready. In the worst case it should be relatively straightforward to back-port Loom to C++, since Rust also uses the C memory model.

The paper is here (referenced in the Loom docs.):


This page describes how to get the code (linked from the paper):


I found some data races in an ODBC driver using TSan. Running it through CDSChecker is one of my round tuit projects.

Loom's style of exploration (and that of aws shuttle mentioned below) can be quite effective. Coyote is the equivalent library in the .NET world (formerly known as P#) and comes with a bunch of exploration "schedulers" (from random exploration to probabilistic and even an experimental reinforcement learning based one). They released an animation rich video today (see https://news.ycombinator.com/item?id=26718273). Worth watching even if you're using Rust as Loom and aws shuttle are conceptually similar.

Yeah, that was the part that stood out to me as well. I thought it was pretty intuitively obvious that having your data races confined to small segments is better than allowing them to be spread over the entire codebase, but a lot of people act like it negates the advantage of working in a safer language entirely. It's nice to have empirical reports that back up the "obvious" though. Sometimes "obvious" things turn out to be wrong. It's always good to check them.

> but a lot of people act like it negates the advantage of working in a safer language entirely.

I think the underlying assumption is more along the lines of: the easy cases are made easier, and the hard cases are made impossible — or rather the concern being that rust hasn’t made the problem easier, it just moved it around.

Which is often the case with claims of solving long-lasting problems

> the concern being that rust hasn’t made the problem easier, it just moved it around.

This is largely the point of Rust, though. Rust takes problems that used to manifest at runtime and causes them to manifest at compile-time, shortening the feedback loop drastically. It also takes problems that used to manifest all over the codebase and roots their causes in the leaf nodes of the program that are making use of `unsafe` blocks. Moving around the problem turns out to be a valuable way of addressing the problem, without ignoring any of the essential complexity or making anything fundamentally impossible.

> shortening the feedback loop drastically

More importantly, forcing you to catch them. Problems at runtime don't get caught until that condition actually occurs during an actual run of the software.

> Which is often the case with claims of solving long-lasting problems

This is what I really like about programming language theory: problems which are literally undecidable in existing languages, can be made almost trivial by designing a language with that problem in mind. The trick is to have the language keep track of the information needed for the solution, so we're not dealing with "black boxes".

A classic example is annotating every variable with a type, and propagating that information through compound expressions; lets us spot problems like trying to use a number as a string, without worrying about the halting problem (we do this by forbidding some technically-valid programs, like `1 + (true? 10 : "hello")`).

Of course, this leaves the much harder problems of (a) keeping the burden imposed by this extra information down to a reasonable level, and (b) trying to make the language compelling enough for people to actually use. Rust is one of the few examples of going all-in on a new ecosystem, and it actually being embraced. Often it's easier to "embed" a new language inside an existing language, to piggy-back on the existing tooling (e.g. the variety of auto-differentiating languages built in Python)

Yes, people often forget that TMs really are "pure runtime". Sure, the program itself is static, but nothing else is. There's no imposed semantics on anything beyond the pure mechanism of the interpreter itself. This lessens the 'problem' part of the Halting Problem drastically if you just impose some constraints on the programs, e.g. static types, maybe a borrow-checker, etc.

Not all forms of moving things around are equivalent. Some just keep the problem as diffuse as it was before, others restrict the problem to a smaller area.

Rust does not, by any means, solve this problem. What it does do is help you prove that the problem is contained to a smaller subset of your codebase that you can audit/reason about much more easily.

> I think the underlying assumption is more along the lines of: the easy cases are made easier, and the hard cases are made impossible — or rather the concern being that rust hasn’t made the problem easier, it just moved it around.

Except in this case the article linked demonstrates that they've made things dramatically easier overall. Data races went from being endemic in a C++ codebase to being in a few low-level and easy-to-locate areas.

The easy stuff stayed easy, the bulk of the rest was folded into the easy case, and even the hard cases appear to have been made easier to find, identify, and fix.

Once you pick away enough at the straws, the needle becomes a lost easier to find.

D takes a different approach. D has a type constructor `shared`. Only types marked as `shared` (or `immutable`) can be accessed by multiple threads. If you've got concurrency problems, the scope of the problems are immediately reduced to the subset of the code typed as `shared`.

In addition, `shared` types cannot be accessed directly. They can only be accessed via library routines, such as D's atomic library.

Having an `immutable` data type also helps, as immutable data can be accessed by multiple threads without need for synchronization.

It's not that different. Rust has Sync + Send + &refs and &mut refs.

> D takes a different approach. D has a type constructor `shared`. Only types marked as `shared` (or `immutable`) can be accessed by multiple threads.

This is called Sync in rust. There is also a Send trait for types types which can be moved (but not necessarily shared), as there are data items which can not leave their origin thread e.g. thread locals or non-atomically refcounted instances.

That description seems very similar to how Rust's type system enforces thread safety. You have two kinds of references: shared and exclusive. The shared references will typically only provide immutable access, whereas if you have an exclusive reference, you can mutate it. Beyond this, there are some special types that can be mutated through shared references, e.g. the AtomicI32 type. Similarly, the Mutex type lets you lock it given only a shared reference and provides you with an exclusive reference when locked.

Did D ever get around to specifying a formal memory model? Because when I tried to adopt D a couple years ago I felt there was a lot of ambiguity surrounding the semantics of `shared`, particularly when interfacing with C and C++ code.

I ended up just casting to and from `shared`, and that seemed to work, but it was pretty verbose for highly parallel code and I was never quite sure under what circumstances doing that might violate the compiler's invariants.

Also such casting appeared to eliminate most of the benefits, since what appeared to be local data might have been cast and had a pointer sent across to another thread elsewhere. In the end `shared`, @nogc antics (the language internals weren't fully annotated), closures requiring the GC (compare to C++), and the friction of interfacing with highly templated C++ code such as the Eigen library caused me to abandon the attempt. I sure learned a lot in the process though!

> memory model

It's very similar to C++'s memory model, but it isn't necessary to understand it to be effective with D.

> casting

Casting to/from shared is necessary, but is restricted to @system code.

Does this approach have to deal with potential recursive locking?

Not by itself. But D's prototype ownership-borrowing system can help with that.

It's interesting, I read this and had the opposite conclusion -- people are still writing racy code using Rust.

I mean, it's great that you can write threaded code with more confidence, but it seems the conclusion here is that Rust gives you more confidence but not absolute confidence and you still need the normal stable of runtime detection tooling. The second fix[1] in particular resembles the bad pattern of "sprinkle atomics it until it works" you see in non-Rust languages.

(If this comment raises your hackles, please read the "no true Scotsman[2]" fallacy before responding.)

[1]: https://hg.mozilla.org/integration/autoland/rev/044f8b594c71 [2]: https://en.wikipedia.org/wiki/No_true_Scotsman

> What issues we did find were mistakes in the implementations of low-level and explicitly unsafe multithreading abstractions

(emphasis mine)

The important thing is that the races only happened in unsafe { } blocks. It's well-established that these blocks should basically be treated like C++ in terms of scrutiny, but (I believe) you can still have roughly "absolute confidence" in non-unsafe code.

It's true and interesting that unsafe { } blocks deserve sanitization analysis like C++ code does - and maybe this hasn't been discussed enough - but I don't think it's fair to suggest that the enormous language benefits are outweighed by some false sense of confidence. The proportions in that idea just seem way out of whack.

The thing is, the linked data race occurred in safe code! Looking at https://bugzilla.mozilla.org/show_bug.cgi?id=1686158, the patch changes a non-atomic read through a &SwCompositeGraphNode and a subsequent non-atomic write through a &mut SwCompositeGraphNode (both safe code). They should not race, and the only reason they do is because there wasn't proper RwLock-style synchronization when handing out &SwCompositeGraphNode and &mut SwCompositeGraphNode. Also I think merely creating an unsynchronized &mut is undefined behavior in of itself (in addition to allowing UB in safe code). If so, even the fixed code is still technically wrong (just not miscompiled or misbehaving in practice).

Looking at https://searchfox.org/mozilla-central/rev/ee9dab6aa95f167a34..., SwCompositeGraphNodeRef has "safe" methods (line 286) with unsafe blocks, handing out raw &SwCompositeGraphNode and &mut SwCompositeGraphNode to the interior of an UnsafeCell, without locking or runtime checking. I think SwCompositeGraphNodeRef "caused" the UB because it was invoked by some other caller to create a &mut SwCompositeGraphNode without proper synchronization... And worse yet, SwCompositeGraphNodeRef has Deref and DerefMut implementations that implicitly create & and &mut, unsynchronized, at call sites that merely look like method calls...

My theory as to how this happened:

- C++ atomics are not mutable through a const *, so the authors used &mut to indicate mutability (even though Rust's &mut has stronger guarantees, according to the Stacked Borrows memory model, "all other pointers to this object are invalidated, except the pointers we reborrowed from"... though Stacked Borrows may change to make async fn and intrusive linked lists sound, I suspect it won't make this code sound).

- Maybe they thought "we want to prevent calling &mut SwCompositeGraphNode methods from a &SwCompositeGraphNodeRef". IDK. Semi-related: SwCompositeGraphNodeRef is a #[derive(Clone)] Arc<UnsafeCell<SwCompositeGraphNode>> but !Sync, so only threads that own a SwCompositeGraphNodeRef can create either &SwCompositeGraphNodeRef or &mut SwCompositeGraphNodeRef. And I don't see &SwCompositeGraphNodeRef floating around, so I don't see why access control would matter.

(EDIT: The commit introducing the unsound abstraction is https://hg.mozilla.org/integration/autoland/rev/8d47e408c578, filed under the bug at https://bugzilla.mozilla.org/show_bug.cgi?id=1670328.)

AFAIU one can create races in safe code using atomics with insufficient ordering.

(That notwithstanding, I agree that Rust is a vast improvement over C/++.)

"races" is too general. Safe Rust cannot have data races. It can have race conditions.

I'm not convinced this is a meaningful distinction in this context. Assuming a developer consciously creates a race which turns out to be a bug, how does it matter whether it was a data race or some other kind of race condition?

In general data races can break program invariants, violate memory safety and do other undefined behavior.

In C++, data races can cause jumps to invalid addresses and the compiler will assume data races away and delete checks that are redundant in the single-thread case. The equivalent problem happens in most languages where data races are possible (two Python threads writing at once can put a value into an impossible/illegal state)

An ordinary race condition is just an observable effect of the nondeterministic progress of threads, so it's possible to have a truly benign race condition, so long as the correctness of the whole program doesn't depend on the outcome of the race. I/O causes nondeterminism anyway, so it's not really practical to avoid all ordinary races.

It's possible to have truly benign data races too, e.g. lazily computing a hash of immutable data.

One of the reasons that Rust's ownership model is necessary is most data races cannot otherwise be statically detected. The compiler will be unable to apply any of its theoretical UB optimizations. It will emit its native code and you get the defined behavior of the underlying hardware. There will be no undefined behavior. If that ever changes, the bigger news, I imagine, will be that we solved the halting problem.

Data races are undefined behavior, and race conditions are a logic error. Yes, both are bugs, and both are important to fix. But they have (at least to me) different severities.

(Worth noting that data races have a clear, unambiguous definition, and therefore are possible to eliminate through tooling. Race conditions rely on program intent or logic, and are therefore... not really.)

Also, I think it matters, generally, that we are truthful about what Rust prevents and does not prevent.

> Worth noting that data races have a clear, unambiguous definition, and therefore are possible to eliminate through tooling. Race conditions rely on program intent or logic, and are therefore... not really.

This is a great point. Usually people only talk about the two categories in terms of vague severity, but this is a hard distinction which has lots of implications for how each can be dealt with.

Any kind of race can lead to data corruption, or being very drastic, possible death if the outcome is related to any computer system with direct influence over human lives like factory automation systems.

This is why I think Rust discussions about this subject don't focus enough that the language only validates a very tiny subset of races.

> Data races are undefined behavior, and race conditions are a logic error. Yes, both are bugs, and both are important to fix. But they have (at least to me) different severities.

Some data races really are benign while some other race conditions can lead to data corruption or security vulnerabilities. Why then should data races be strictly more severe than data races? What am I missing?

"This context" is the original article, which is discussing data races specifically.

Actually "this context" was my previous reply. Much like I can nest a lexical scope inside a program and have it reach up and past the containing scope, I can create a new context in a discussion. Please at least try to understand my position before you reply.

And deadlocks

You can also create races in safe Rust by using the file system, no concurrency primitives necessary (just create a new file, close it, and re-open it assuming it's still there..).

What safe Rust doesn't have is _data races_.

You are correct. The "no data races" slogan of Rust is accurate but you need to mention the way it needs to be understood. First, data races are only a subset of all race conditions. Second, like the other safety guarantees, the statement only applies to safe Rust. Once you use unsafe Rust or any non-Rust language (and both usually happens in any non trivial program), this guarantee stops being a guarantee. But it doesn't mean it vanishes into nothingness. It instead becomes a quantitative statement: it's much easier to avoid data races in Rust than in unsafe alternatives.

I think when you build Rust projects that require reliability, it's foolish to lean back and believe that marketing slogan and not look for concurrency bugs. However, once you found them, it's way easier to fix them, at least when your unsafety is well encapsulated like how it's recommended.

I think the big challenge for Rust in this context is to improve its tooling so that using it is comparable or even easier than the C++ tooling. This blog post is proof that such tools should be ran otherwise you are missing out on important bugs.

> you still need the normal stable of runtime detection tooling

I hate that you're getting so many downvotes, and I wish people wouldn't do that...but I still want to disagree slightly :) I think it's interesting to note that the meaning of "you" changes between the two cases. If I write safe code on top of parking_lot, you could argue that I don't really need to do runtime race detection. Rather, someone in the ecosystem needs to do it, but once it's been done, the shared code is improved for everyone. If my code is all safe, I can benefit from these global improvements over time, and I can be confident that I haven't introduced any new races myself.

They literally found tons of new races conditions in c++ and 2 in the rust code with their tool. And you want to act as if that little factoid makes rust useless somehow. Anyone who thinks a language is flawless is not thinking correctly and that includes rust. However it does a much better job at this than C/C++ yet somehow you twisted that into declaring rust as useless.

I'm not sure how you got from "Rust gives you more confidence but not absolute confidence and you still need the normal stable of runtime detection tooling" to "useless".

It's probably because "Rust gives you absolute confidence" is a strawman, I don't think anyone was arguing that, and often people's first reaction to a strawman is to construct a contrary strawman.

Another fallacy to keep in mind is that two things not being perfect does not make them equal failures.

"No true Scotsman" is a fallacy because "true Scotsman" doesn't have a clear definition and can mean anything. But "unsafe" has a very precise meaning in Rust. Safe Rust didn't have any data races, which is exactly what we would expect.

Is that "sprinkle atomics" stuff in bindings/interfaces for non-Rust external code? I'm not particularly surprised that you would have to wrap atomics around externalities and not doing that would be a source of errors if the external code has race conditions.

It's not, I linked the patch. (I don't fully understand the patch so I'm happy to be corrected here, but it looks like it was an ordinary "use atomics incorrectly" bug.)

It's unsafe code (see the "unsafe" keyword?). So, Rust relies on the people who wrote it to do so correctly, and in this case they mistakenly assumed they could get away without one variable being atomic. The patch corrects that, making it atomic.

So Rust did exactly what we wanted here, the problem code was narrowed down to the code that was explicitly unsafe and needing caution. Humans remained fallible, and the ThreadSanitizer spotted their mistake.

I think the mistaken idea that Rust allows "bug free" multithreaded programs will lead to a new generation of difficult to maintain multithreaded software... The lesson of multithreading should be that you need to use less of it and only when necessary, not that you should write more. It is similar to what we got when Java was the darling against C++: you now have lots of Java programs that don't have the same problems as the C++ version, but still are difficult to understand, leading to the well known FactoryFactory syndrome.

I honestly don’t see “less concurrency” as a viable route for modern software. We’re adding cores at a much faster rate than we’re improving clock speeds. Anything that makes concurrency easier is a win in my book.

There are several ways to increase concurrency: low level and high level. UNIX, for example, provided high level tools for concurrency (processes). You can also use message passing if you want. It is possible to write more concurrent software using safer primitives, instead of relying in low level techniques such as Rust/C++ multithreading.

I don't think it makes sense to call processes either higher or lower level than threads. Splitting your program into multiple processes often comes with many other drawbacks than spitting it into multiple threads. You can even still have data races if you use files or mmap to communicate between processes (you can also get something similar even with pipes/sockets for complex chunked data structures).

Even worse, there are very few, if any, tools that could help you analyze a multiprocess program for any kind of race conditions.

When processes crash or corrupt memory they don't bring all the application down with them like threads, hence why security focused architectures have moved back to processes after the thread adoption hype.

hopefully it will lead to a new borrow checker logic

Do they have a roadmap for complete Firefox conversion to Rust?

The current roadmap for that is "never". :) There just isn't enough ROI for rewriting all old code.

Surely at one point the return for getting rid of the rest of the C++ code would be getting rid of the C++ build process?

I'd go further and say that the ROI is not only "insufficient", it is negligible. Unless your legacy code is written in a language/framework where available experts are disappearing (which is not the case here), there is practically zero value on rewriting code that works as intended.

Plenty of legacy code was reimplemented from C++ to Java (a memory-safe language, not unlike Rust) when the latter was released. The ROI is definitely there. But given the scale of that task, it makes sense to pick the lowest-hanging fruit first. AIUI, most of the effort wrt. "oxidation" is now around WebRender, specifically making it the default render backend.

I think you should change your question to "which -parts- of firefox will be redone in rust and is there a road map for that?"

It's been a while, but I don't believe that is a goal, so I don't believe there's a roadmap for it.

Servo [1] was the project to do that. Unfortunately they don't care (enough) about it anymore [2].

[1] https://servo.org/ [2] https://mjtsai.com/blog/2020/08/11/mozilla-layoffs/

Servo was a research project, AIUI it was never meant to completely replace Gecko.

I almost thought it was an April fools joke.


Please stop posting unsubstantive and/or flamebait comments to HN. It looks like you've been doing it a lot, and it's not what this site is for.


Well, it was supposed to be a joke, but I now realize that it was in bad taste, so I apologize.

Who knew using already existing data race analysis tools is much more feasible than shoehoning a huge project into a brand new language. Way to go mozilla!

If you believe that this tool has found all the data races in the C++ code, or even a majority of them, I have a bridge to sell you.


Hey, could you please not take HN threads into flamewar, including programming language flamewar? It's tedious, and one reason this site got started was to have a forum that avoids that kind of thing.

Thoughtful critique is welcome, but that would require (a) adding solid information, and (b) taking out the swipes.


> now suddenly going full Rust is the most important thing

As said above, Mozilla does not seem to have ever suggested, in the past or now, that "going full Rust" is a goal, let alone "the most important thing."

Yeah and Webrender, Stylo, and Servo just happen to appear out of thin air.

Webrender and Stylo were big projects, but they’re already in tree, and that means about 10% of Firefox is Rust. That’s hardly “full Rust,” and even less so “at all cost.” You would have to show intent to actually re-write all of Firefox, you’d have to somehow deal with the fact that the number of engineers writing more C++ and continuing to do so is far greater than those writing Rust... there’s a ton of evidence that “full Rust at all cost” is not intended, desired, or being worked on.

It sounds like you’re objecting to any Rust being included at all, if I’m being honest.

EDIT: Like, for example, this article is about committing engineering time to help improve their C++. If the goal was 100% Rust at all costs, that would be a waste. Even this article is an indication that this thesis is incorrect.

Firefox has something like 8 million lines of C++. Stylo replaced about 150,000 lines of that, and WebRender is more of a fast path than a true replacement of anything.

Both of those rewrites came with massive speedups due to the amount of parallelism they were able to use, which was impractical in C++. Other parts of Firefox are not as critical or parallelizable, hence why they aren't being replaced any time soon.

You seem to be quite aggressively misinformed, if you are not deliberately trolling.

> the amount of parallelism they were able to use, which was impractical in C++

You mean after not using OpenMP, basic multithreading core guidelines, and not doing the refactoring with sanity checkup in TLA+, tools that end up on the frontpage everytime people rediscover them.

Do you believe that "just applying TLA+ to the rendering code in Firefox" is a useful thing to recommend to people? TLA+ cannot scale to that size of a codebase without such dramatic simplifications that it's not going to catch any of these kinds of bugs. This is the entire reason why proving safe encapsulation of `unsafe` blocks is possible was so important, it lets you focus the proof engineering effort on a much tinier part of the program. Sure, it can't prove the kind of global invariants TLA+ aims for, but you can also actually apply it to a very large codebase like Firefox (and keep it in sync--another problem you need to deal with when using tools like TLA+).

Out of curiosity, have you used OpenMP or TLA+? I seem to remember that solve problems that are really, really different from the use cases for which Rust is designed.

Also, I'll throw out there that TLA+ is nearly as handy for Rust as it is for C++. Safe Rust will keep you from data races that break it's safety guarantees, but there's more to correct use of concurrent algorithms than that. TLA+ is a great tool for modelling those other concerns if you need to apply a little more engineering rigor to some of those problems.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact