Hacker News new | comments | show | ask | jobs | submit login
Firefox switching to clang-cl for Windows builds (groups.google.com)
278 points by sohkamyung 7 days ago | hide | past | web | favorite | 80 comments





A reply in the thread from David Major about performance:

"At the moment, performance is a mixed bag. Some tests are up and some are down. In particular I believe Speedometer is down a few percent."

"Note however that clang-cl is punching above its weight. These builds currently have neither LTO nor PGO, while our MSVC builds use both of those. Any regressions that we're seeing ought to be short-lived. Once we enable LTO[1] and PGO[2], I expect clang to be a clear performance win."

[1] "LTO (Link Time Optimization) is a method for achieving better runtime performance through whole-program analysis and cross-module optimization." http://blog.llvm.org/2016/06/thinlto-scalable-and-incrementa...

[2] PGO = Profile Guided Optimization


LTO can be particularly handy when doing devirtualization. Our experience ([1]) with Chromium demonstrated that there's a fair amount of virtual methods which have exactly two implementation in the code: one for production and another one for tests. While linking production code, it's trivial for this optimization to spot that, replace a virtual call with regular invocation and then often inline and allow for other in-place optimizations. In many cases of the renderer (Blink), that gives 3%-7% speedup out of nowhere.

1. https://bugs.chromium.org/p/chromium/issues/detail?id=580389...


Out of curiosity, why use virtual calls in those cases? Why not just have two different implementation files -- one for tests, and one for production -- and choose one based on which one you're doing? It seems like a win all around.

There's always performance hacks to be found in big apps, in exchange for implementation complexity and man-hours. And there's always a next bottleneck to be found. Performnace bugs are subject to an extended form of survivorship bias where they become bigger when the preceding perf bottlenecks get fixed.

I've refreshed and seen three major revisions of your comment after writing a full reply to the first one, and I can't say I've been able to clearly follow where you're coming from and where you're going with any of them. I would suggest that if you feel the need for such drastic revisions, it may be worth reconsidering whether whatever you're trying to argue is really compelling.

His advice is sage and you should look past whatever deficiencies you find in his form of communication. Simply put there's always bigger fish to fry, and virtual methods are a quick and easy way to implement test stubs. Especially when the compiler devirtualizes.

Well I had a response to that originally but his edits kind of changed how much sense my reply made in response. Here's what I had:

Performance is only one aspect of it. It also reduces code bloat, reducing the program's size footprint. Most tests (yes, I know, not all, but most) should not make it into the final binary users are running. I also don't see what's "hacky" about making a foo.test.cc file when I want an alternate implementation for foo.cc. It seems to be quite a positive and clear way to document the fact that an implementation is only needed for testing, and vice-versa. And not only that, but it reduces compile (& link) times, since you only need to compile one of the two implementations for each use case.


Yea all things being equal I'd choose your approach. But in an existing project that did things this way - it would be the last thing I'd touch. This is of course barring extreme circumstances - which profiling and other tools would identify.

Unfortunately a lot of C++ developers don't really understand how the linker works, and because of that the build system is black magic.


Yeah, I wasn't suggesting they immediately dispatch a team for transforming everything into this. But at least from now on they could try to get people to use this pattern, and make the move gradually.

I want to reiterate what I think the broader point is - If this is the most pressing problem you have a very highly functioning team. There's only so much bandwidth for change and I frequently see it get spent poorly.

Sorry! That's a risk you have when following comments tightly :)

I'll reconstruct my first take from memory if you have comments on it:

Current CPUs are pretty good at predicting indirect branches, and it's hard to tell beforehand which virtual calls still turn out to be perf problems - and it's wasted effort and a fallible process to attempt their compllete elimination beforehand.


Right, and I posted my reply to that here: https://news.ycombinator.com/item?id=17504680

It seems like a win all around.

Depends. Virtual calls can be done purely in code, with different implementation files you need to go round via the build system. The former just always works independent of platform whereas often for the latter it's more configuration work. Also, ignoring YAGNI, you could say the virtual one makes it easy to quickly test things out etc, is easily mockable, ...


From my point of view a benefit for virtual calls is toolability. If you have an abstract interface, it’s pretty easy to either directly create mocks for it in the test file, or use sth like gmock. If you go through the link completely different implementation route, you might need to create and link a different library which contains the faked implementation for several test cases. It also works well with pure header code.

If I understand you correctly aren't you violating the initial assumption that there are only 2 implementations, one for production and one for testing? At which point, you're answering a different question...

>Out of curiosity, why use virtual calls in those cases? Why not just have two different implementation files -- one for tests, and one for production -- and choose one based on which one you're doing? It seems like a win all around.

I don't fully understand the proposal here. Do you have an example of a code in the wild that uses the proposed scheme?


I'm just saying do conditional compilation instead of virtual dispatch, since you generally shouldn't really need both the test and production implementations to run inside the same program. So if you have reason to require

  // widget.h
  class Widget { virtual void throb(); };
  // widget.cc
  class WidgetImpl : public Widget { void throb() { ... } };
  class WidgetTest : public Widget { void throb() { ... } };
then, instead of that, just do

  // widget.h
  class Widget { void throb(); };
  // widget.cc
  void Widget::throb() { ... }
  // widget.test.cc
  void Widget::throb() { ... }
where you only compile widget.cc for the production build, and only compile widget.test.cc for the test build.

Feel free to post a more specific example and I can take a stab at seeing if I can make this transformation to it (or why it might not be possible, if you think it's not).


Thank you for elaborating your proposal. It might work in a theoretical scenario, but unlikely to be practical for existing projects with millions lines of codes.

Essentially, it was easier to write a compiler / linker optimization, than change the source code. On the bright side, everyone wins!


> It might work in a theoretical scenario, but unlikely to be practical for existing projects with millions lines of codes.

I mean there's no reason you have to change everything by tomorrow. You could introduce it as a policy moving forward, the same way I'm sure you make any other changes to other patterns that need to be followed. I'm sure you have a process for this?

> Essentially, it was easier to write a compiler / linker optimization, than change the source code.

I agree, but see above and below.

> On the bright side, everyone wins!

Well, you waste space, time, and energy making the compiler and linker do work that doesn't inherently need to be done. You also lose parallelizability of the compilation of the two implementations if they currently reside in the same file. Finally the compiler (er, linker) might not be able to actually apply the devirtualizations you expect, whereas in this case it's just a matter of inlining since the target method is already known.


> Well, you waste space, time, and energy making the compiler and linker do work that doesn't inherently need to be done.

It's repetitive, error-prone work, better to have the compiler and linker do it than rely on the programmer getting their use of the preprocessor right.

And even if the programmer gets it right, doing it via macros means every tool that you want to apply to your codebase - IDEs, profilers, coverage, instrumentation - needs to understand your macro. Are you sure they'll all get it right?

Better to write plain old standard code that every tool will work correctly with, and the worst thing that can ever happen is a slight performance penalty.


Are you replying to the correct comment? Not once did I even mention macros or the preprocessor.

I misunderstood, but what you're actually advocating has all the problems I mentioned only even more so. Any tool that you want to apply to your codebase will have to understand not just a macro, but your build configuration.

How would you do it without using the preprocessor? Honest question. You could leverage the build system to do that instead but that seems even worse to me... You are effectively hoisting an implementation detail up into the build system.

Like the other commenter said, I'd rather do simple, virtual calls and then fix it using LTO.


> Well, you waste space, time, and energy making the compiler and linker do work that doesn't inherently need to be done.

You're suggesting that instead of a compiler wasting time space and energy, that we get actual people to waste their time space and energy by manually applying the transformation that the compiler is able to do anyway?


Changing things gradually can lead to old petrified lava flows of old design patterns that nobody understands or removes anymore: https://en.m.wikipedia.org/wiki/Lava_flow_(programming)

> lava flow is a problem in which computer code written under sub-optimal conditions is put into production and added to while still in a developmental state.

I don't know what you're referring to as the "suboptimal code" here. If it's the old pattern using virtual everywhere, then the ship's already sailed... that code is already in production. If you mean the two-file approach I proposed then that makes no sense; if it's worse than the last approach then why would you even use it.


What I meant is that you will end up permanently with multiple solutions for the same problem if you try to change things gradually. This has several drawbacks: Additional complexity having to support both versions, confusion with new developers about which variant to use, a higher likeliness that someone will add a third variant. You have to weigh the benefits of the new alternative against having a more complex code base/doing the grunt work of changing everything at once.

So in this example you have to weigh the benefits of the two file tests against having tests that work uniformly.


In the performance-insensitive code the approach you defend could be "good enough" but for the code on the "hot path" the changes like the proposed by mehrdadn could result in the consistent improvement and are surely better long-term "win."

"We do it that way here" is not something that has to be always applied.


I have done things like that, but the one definition rule gets in the way.

    void DoSomething(widget* param, OtherClassWithvirtual* param2) {...}
My DoSomething function needs to link to widget, but now I have two different libraries that widget could be in. The linker encodes in my binary which library to load to get widget. All attempts to use the other library instead are undefined behavior.

The only way I know of to work around that is build DoSomething twice, linking the real widget once and the test widget the other. Note that DoSomething takes two parameters, and those are classes that may themselves also take other classes with virtual functions. All the possible cases quickly gets into a combinatoric explosion and my build times goes from an hour (already way too long) to days.

If I'm wrong please tell me how, I'd love to not have so many interfaces just because someone doesn't want their unit test to become a large scale integration test.


What if I want both? What if in ClassATest, I need ClassBImpl? Do I go around creating more compile cases?

Because that’s what we need, more magic in the build system.


Not sure I see what you're thinking of... could you give me a code example?

The implementation of TestWidget only defines some stuff of its own but otherwise shares a bunch of code with Widget is a possible scenario here.

You can still derive classes when you need to, right?

What class are you deriving from though? Can’t be Widget, because in the scenario you described Widget and TestWidget are mutually exclusive (TestWidget just replaces Widget altogether). The alternative is extracting the shared behaviour to a BaseWidget that both classes extend, but that’s just ugly as sin.

idk, would something like this work maybe?

  // widget.h
  class Widget { void throb1(); virtual void throb2(); };
  // widget.cc
  void Widget::throb1() { ... }
  void Widget::throb2() { ... }
  // widget.test.cc
  void Widget::throb1() { ... }
  void Widget::throb2() { ... }
  class TestWidget : public Widget { int x; void throb2(); };
  void TestWidget::throb2() { ... Widget::throb2(); ... }
If not, could you present code to illustrate the problem? I won't try to address more issues without actual code.

But now you've made throb2 virtual again, which is the problem this proposal was trying to avoid in the first place.

Yeah? I didn't get rid of throb2()'s virtual, because it actually had a good reason to be virtual. But I very much got rid of throb1()'s virtual because it didn't need to be virtual, and that was the point. It's to be expected that if you add extra constraints in the problem, you are likely to end up with extra constraints in the solution. I was just presenting an approach that let you limit virtuality (word?) to where it's actually needed, meaning that what you gained here was not having to pay costs you clearly didn't need. It's obviously not a magic pill to remove every virtual, which I would have hoped was pretty clear. It's your job to extend the approach to fit the precise constraints you're dealing with, whether it's adding pimpl or keeping a virtual or whatever. And remember you can always avoid this approach if you think it's too ugly in this scenario or have whatever complaints. You can still use it when you don't have extra constraints. Nobody's forcing you to pick a side. Just use the best approach for the problem at hand.

One reason is that other test might need the production version. Building twice for that case is more of a pain and not really worth it.

I mean, then you can avoid it in that situation. Or actually use the production version like I explained here: https://news.ycombinator.com/item?id=17505358 Or find another way to use it. It's not all-or-nothing.

Yes that works, but relies on devirtualization in the production code, and isn't at all what you were suggesting before about having separate implementation files.

Have you seen this in use in a real project?

Or just use a macro?

Yeah that would work but also make your code base a mess. Things that you can catch during build step and move in, or out, is much better handled at that level rather than leaking it into your code base.

That's really cool. I've primarily seen LTO benefits on the more embedded side of things where it ends up enabling far more aggressive dead code elimination between code and libraries which makes it much easier to fit into small chips. I hadn't thought much about virtual methods with C++ code (uncommon to see those in embedded code to begin with).

Interesting. Do they collect profiles from Nightly users or do they have some internal characteristic workload?

It's based on the perf tests, which I think are all public.

> Watch for more toolchain changes to come. The next steps after this will be to switch to lld-link and enable ThinLTO. That will open the door to a cross-language LTO that could inline calls between Rust and C++. In the longer term we can look into cross-compiling from Linux.

Very cool. The potential to inline Rust and C++ (and C, of course) code together at the LLVM level is one of those things that's been theoretically possible for a long time now but has yet to really mature in practice (at least, I'm not aware of anyone aside from Mozilla attempting it at scale). Really curious to see how this affects Firefox performance once they get it all wired up.


Rust's performance benefits won't be immediate, as single-threaded C++ is already very fast and safe parallelism benefits of Rust would be difficult (impossible?) to leverage across language boundaries.

I'm more interested in seeing whether the maintainability of Firefox will increase over time as the proportion of Rust code increases. That is, will Mozilla be able to accelerate Firefox's development because the source code is in a safer language?


Multithreading isn't Rust's only performance benefit. Another is being able to write single-threaded code in a way that'd be too risky to do without the borrow checker making sure it's OK. (I.e. in C++ one would write something a bit slower but easier to reason about.) A cool upcoming single-threaded performance boost in Rust is std::simd, which provides portable (non-ISA-specific LLVM abstraction) SIMD that you can augment with ISA-specific operations where needed. Firefox doesn't use it yet, but Firefox already uses the older (to be obsoleted by std::simd) simd crate that provides the same thing a little differently.

I'm trying firefox nightly these days and I notice my cpu has burning peaks (60 -> 80 C) I wonder if rust parallelism isn't kill my poor laptop heatsink/fan.

Huh! I wouldn't expect that at all... here are some easy things to try:

https://support.mozilla.org/en-US/kb/firefox-uses-too-many-c...

If it is still happening after you try those, use https://perf-html.io/ and share a profile.


Known bug, I think, assuming you've enabled webrender. They're still working on correctness. They haven't gotten to optimization yet.

Also nightly contains all sorts of debug stuff


only two webrender about:config entries are true

I'll see how it goes with the normal release


If you want to know for sure if Webrender is enabled you can go to about:config and check the Graphics section. If Compositing is 'Webrender' it's turned on.

all I can see is skia

Are you on a Mac by any chance?

Funny you mention this. While Firefox’s interactive performance on Mac feels absolutely fine, it’s energy consumption is dramatically worse compared to Safari.

Using Firefox over Safari can cost hours of battery life, so it’s very hard to recommend it. Is this just a Mac issue?


I know of one issue with Firefox when you are using a scaled resolution. It absolutely destroys battery and performance for me. I _think_ it has gotten better with the latest versions but the tickets are still open.

https://bugzilla.mozilla.org/show_bug.cgi?format=default&id=...

https://bugzilla.mozilla.org/show_bug.cgi?id=1407536

https://news.ycombinator.com/item?id=17365510


thinkpad, old i5, archlinux

In my comment I actually wasn't talking about parallel Rust providing a performance benefit over single-threaded C++, I was referring to how, by writing their codebase in two different languages, they're missing out on theoretical optimization potential (via inlining) and thereby leaving performance on the table wherever Rust code interfaces with C++ code. By enabling cross-language inlining via LLVM, it should ideally erase any theoretical optimization penalty incurred by writing Firefox in two languages.

Luqman got some neat results getting this to work iirc, way back in 2014.

https://bugzilla.mozilla.org/show_bug.cgi?id=1443590 has info on why :

> We've discussed this on several occasions but apparently nobody ever filed a bug.

> There are a number of reasons using clang-cl for our official builds would be good:

> * The ability to patch the toolchain as we do on other platforms

> * The ability to diagnose and upstream fixes for compiler bugs we encounter

> * It would provide motivation to switch to clang on all platforms, which would make it so we only have to worry about a single compiler for things like performance optimization (I don't assume we would drop support for MSVC or gcc, we just wouldn't have to worry as much about compiler quirks)


Today's Nightly build about:buildconfig

  Compiler 	Version 	Compiler flags
  z:/build/build/src/clang/bin/clang-cl.exe -Xclang -std=gnu99 -fms-compatibility-version=19.13.26128 	19.13.26128 	-Qunused-arguments -nologo -wd4091 -D_HAS_EXCEPTIONS=0 -W3 -Gy -Zc:inline -Gw -wd4244 -wd4267 -Wno-unknown-pragmas -Wno-ignored-pragmas -Wno-deprecated-declarations -Wno-invalid-noreturn -we4553
  z:/build/build/src/clang/bin/clang-cl.exe -fms-compatibility-version=19.13.26128 	19.13.26128 	-Qunused-arguments -Qunused-arguments -TP -nologo -w15038 -wd5026 -wd5027 -Zc:sizedDealloc- -wd4091 -wd4577 -D_HAS_EXCEPTIONS=0 -W3 -Gy -Zc:inline -Gw -wd4251 -wd4244 -wd4267 -wd4800 -wd4595 -wd4065 -Wno-inline-new-delete -Wno-invalid-offsetof -Wno-microsoft-enum-value -Wno-microsoft-include -Wno-unknown-pragmas -Wno-ignored-pragmas -Wno-deprecated-declarations -Wno-invalid-noreturn -Wno-inconsistent-missing-override -Wno-implicit-exception-spec-mismatch -Wno-unused-local-typedef -Wno-ignored-attributes -Wno-used-but-marked-unused -we4553 -D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING -GR- -Zi -O2 -Oy-

Note that they're switching from MSVC to clang-cl.

Since the headline doesn't mention MSVC, this could be interpreted as relevant to the old GCC vs. Clang issue, but it really isn't in this case.


The headline does say for Windows builds. Would that imply MSVC being the original compiler being used, since that is the usual default compiler under Windows?

This is in the latest nightly. Open "about:buildconfig" to see if you're using it.

Why do i have to sign into Google in order to view this?

I don't - it displays immediately for me when I load in a Firefox instance without a logged in Google account.

As an alternative here's Mozilla's archive view: https://lists.mozilla.org/pipermail/dev-platform/2018-July/0...

Google tries to make you log in if it knows you have an account. You can erase Google cookies to avoid that.

It's not about being signed into Google, it's about running JS.

Is it? JS in enabled, but I'm still required to login (both with or without uBlock running). Works in a Private Browsing Mode though.

It's not the problem you're seeing, but the page does require JS:

To use Google Groups Discussions, please enable JavaScript in your browser settings, and then refresh this page.

NoScript users will be automatically redirected to this version (which doesn't require it):

https://groups.google.com/forum/m/?_escaped_fragment_=topic/...


It would be great if they switched to MinGW as well so both Clang and GCC work. MinGW needs some support as well and then they control the while toolchain.

Sadly it probably doesn't make sense for them to allocate resources to that. On the other hand in my experience at least GCC dominates in performance department so that would help with not worrying about compiler switch causing performance regression.


I second that. clang-mingw could be another viable choice.

It would be great because then the builds would be independent of Microsoft VS infrastructure which is a total mess on Windows. Both Clang and GCC work with MinGW (I use MinGW64) headers so then you would have two modern open source compilers to cross-reference and open source toolchain you could fix or improve as needed.

I'm fine with being able to compile with mingw (and we already can on GCC), but we can't ship our release builds with it; it's too far behind the official Windows SDK in terms of supporting new platform features.

Yeah, I realize that supporting MinGW64 is a big task and probably not worth for one single player even as big as Mozilla. Still, I hope one day it gets picked up. Visual Studio dependency is painful and I think it will be quite a while till Clang catches GCC in performance department.

Mozilla used to compile Firefox with mingw because Tor uses mingw, but here is the Firefox bug with about moving to clang-mingw:

https://bugzilla.mozilla.org/show_bug.cgi?id=1465790


No idea what is going on but interested in the commotion...



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: