
Making Chrome on Windows Faster with PGO - zmodem
https://blog.chromium.org/2016/10/making-chrome-on-windows-faster-with-pgo.html
======
alfalfasprout
I've been experimenting with profile-guided optimization and link-time
optimization now on a variety of applications I've been developing.

A couple of things I've noticed:

1\. PGO most benefits whenever you have branching logic in 'hot' code (either
inside a tight loop or as part of a dispatch function that's repeatedly
called). If set up correctly, the conditional can be reordered or the code
structured so that you get very good branch predictions. This often means that
code that wasn't automatically vectorized before can now be reorganized to
employ SIMD instructions (usually this means a worse delay in case of a branch
miss though).

2\. Your dependent functions must be in the same compilation unit if you want
to take advantage of many of the optimizations. Yes, interprocedural
optimization (LTO) is a thing, but it's not perfect. If you have a loop that
calls a function that can be inlined, the compiler does a great job with PGO
ensuring that everything is hot in the instruction cache. If you put those
functions in another compilation unit, not so much.

3\. If you want to use PGO, you better use extremely representative inputs.
The performance of a project compiled with PGO will suffer greatly if you use
unrepresentative inputs.

~~~
codys
I presume "the compiler" you're speaking of is some version of MSVC?

I only ask because while this article is about MSVC's pgo, other compilers
also have pgo and lto, and may not have the same issues wrt optimizing across
compilation units with lto enabled.

~~~
wscott
GCC and, I would assume, Clang both support -fprofile-generate and -fprofile-
use.

But these are based on a trace of actual execution, not a function use profile
like implied by the article. But the article may be after a ELI5 filter.

~~~
alfalfasprout
I haven't used GCC or Clang's PGO, but Intel's PGO lets you specify the kind
of instrumentation used and generates a detailed profile. Clang also lets you
use instrumented code-paths to generated detailed profiles (that include
function use statistics).

------
magicalist
this twitter thread makes the post somewhat more interesting:
[https://twitter.com/BruceDawson0xB/status/793177917949739008](https://twitter.com/BruceDawson0xB/status/793177917949739008)

Specifically, (one reason) why this wasn't done years ago[1] and bugs found
when switched to PGO[2]

[1]
[https://connect.microsoft.com/VisualStudio/feedback/details/...](https://connect.microsoft.com/VisualStudio/feedback/details/1064219/ltcg-
linking-of-chromes-pdf-dll-spends-60-of-time-in-c2-dll-ssrfree)

[2] [https://randomascii.wordpress.com/2016/03/24/compiler-
bugs-f...](https://randomascii.wordpress.com/2016/03/24/compiler-bugs-found-
when-porting-chromium-to-vc-2015/)

------
SaveTheRbtz
The article is a bit shallow. It would be nice to see:

1\. What flavours of PGO optimizations were applied? What was the isolated
impact of each one of them on both speed and size of the code?

2\. What tests did they use to "guide" PGO?

3\. How did they analyze PGO results(except for these three tests that were
provided)? I assume they did not blindly trust it, therefore there should be a
way of visualizing differences between two binaries with millions of lines of
code.

4\. How did PGO affect crash statistics?

------
bitmapbrother
Always a nice win when you can use tools to speed up your code. The only
unfortunate part is that it's platform specific.

~~~
puzzle
It's not like Google hasn't been doing the same under Linux for years:

[http://research.google.com/pubs/pub45290.html](http://research.google.com/pubs/pub45290.html)

[https://groups.google.com/forum/m/#!msg/llvm-
dev/UOJqp0f9MBY...](https://groups.google.com/forum/m/#!msg/llvm-
dev/UOJqp0f9MBY/lN4MgCT6Q_cJ)

------
fowl2
In case anyone was wondering about other browsers - Firefox has been doing
PGO, and dealing with bugs - for quite some time:
[https://developer.mozilla.org/en-
US/docs/Mozilla/Developer_g...](https://developer.mozilla.org/en-
US/docs/Mozilla/Developer_guide/Build_Instructions/Building_with_Profile-
Guided_Optimization)

------
cm2187
Sort of unrelated but one more example of responsive design gone wrong. The
chart overflows outside of the page on an iphone, and by locking the scaling,
I can't zoom out to view it.

Another example of a page that would be more readable had it sticked to plain
old html.

~~~
kevincox
I wish my browser had an option to prevent the scaling lock. I should be able
to decide if I can zoom in or out.

~~~
taspeotis
FWIW: Mobile Safari 10 ignores user-scalable=no [1].

[1]
[http://stackoverflow.com/a/37859168/242520](http://stackoverflow.com/a/37859168/242520)

~~~
kevincox
I Just discovered that ff has an option for this as well :)

------
polskibus
Has PGO been applied to V8 as part of Chrome ?

~~~
mccr8
JS engines spend a lot of their time in JITted code, which isn't helped by PGO
compilation of C++. (I mean, the JIT itself is of course a kind of PGO
compilation...) I believe in Firefox PGO is, or at least used to, be disabled
because it didn't help much, but would occasionally cause crashes due to
compiler bugs.

~~~
MikeHolman
Running JIT code is only one aspect of the runtime. Parser, interpreter (if
applicable), and GC benefit. C++ helper functions that get called from JIT
code also benefit.

