
Chrome just got faster with Profile Guided Optimization - twapi
https://blog.chromium.org/2020/08/chrome-just-got-faster-with-profile.html
======
mintplant
Firefox has done this since forever. It's why official builds will generally
be faster than compiling from scratch yourself. I'm surprised Chrome wasn't
already taking advantage of PGO.

~~~
3pt14159
I'm surprised that there isn't a way to do PGO when compiling from scratch. A
way of telling the compiler exactly what parts are, but build it into the
codebase so that one could get the best of both worlds.

~~~
mintplant
You can pass a PGO flag to the build system [0], but IIRC from when I was
there last, Mozilla puts some serious machine hours into PGO cycles for each
release and that isn't easy to replicate as an individual developer. The
profiling captures things which are very specific to the actual build being
optimized, so inlining that information into the codebase wouldn't be so
helpful. Though there are already static optimization hints spread throughout
the code (off the top of my head, MOZ_LIKELY/MOZ_UNLIKELY for marking
conditionals).

[0] [https://firefox-source-
docs.mozilla.org/build/buildsystem/pg...](https://firefox-source-
docs.mozilla.org/build/buildsystem/pgo.html)

~~~
loeg
Mozilla could publish their profile data for a given release image.

~~~
glandium
They are public, but you have to know where to look. I do think they should be
published alongside the builds on archive.mozilla.org. (disclaimer: I work for
Mozilla)

------
jawns
10% faster is a pretty nice improvement, especially in 2020.

Here's a little Wikipedia stub about Profile-Guided Optimization, for those
who'd like to read more about the technique.

[https://en.wikipedia.org/wiki/Profile-
guided_optimization](https://en.wikipedia.org/wiki/Profile-
guided_optimization)

~~~
mhh__
10% is impressive. Albeit a web browser isn't quite the same situation but
shaving even a percent off a program could save millions for a company like
Google.

~~~
srean
I do not doubt you, but for my own education could you elaborate a little on
how Google or any othr company could save millions.

~~~
jeffbee
If you had a backend workload that draws 20MW and you make it 10% faster, you
saved a million dollars per year in electricity alone.

Math may vary depending on cost of electricity.

~~~
tomtompl
but the executable that is built with these optimizations is running on
people's computers, not as a backend on google servers.. or am I missing
something?

~~~
jeffbee
Perhaps I didn't understand the question. I thought the question was about why
anyone would save a lot of money from software going "only" 10% faster. Most
large datacenter operators will get unreasonably excited about even 1%. But
you're probably right that Google itself will not directly benefit from making
Chrome faster on macOS.

------
londons_explore
Where are the profiles gathered from?

Did they just render the alexa top 1M sites and generate a profile from that?
It would seem some bits of the codebase might only be exercised by real users
(eg. touch input during a mobile game).

Did they instead gather PGO information from real users? If so, how did they
do that in a way to not degrade performance too much while profiling, and
maintaining user privacy (sequences and addresses of branches in some cases
will reveal user data)?

~~~
adrianmonk
There's a small clue at [https://blog.chromium.org/2016/10/making-chrome-on-
windows-f...](https://blog.chromium.org/2016/10/making-chrome-on-windows-
faster-with-pgo.html) :

> _To gather this data, the nightly build process now produces a special
> version of Chrome that tracks how often functions are used._

It doesn't say how Chrome is exercised (what pages are loaded, what user
actions are simulated) during the nightly build. It seems like it must be some
kind of fixed test data.

But that at least eliminates the possibility that they're gathering the
profiling data from production builds used by real users, so the privacy
concerns shouldn't be an issue.

(Also note that info is from 2016, so it could have changed.)

------
jeffbee
FWIW, ChromeOS has been built with PGO for many years. This is catch-up for
the mac toolchain.

------
XCSme
> Tab throttling coming to Beta

This is nice, but I remember the pain trying make a multiplayer browser game
run fine, even when you put it in a background tab. The problem is that the
game had matchmaking, so players would change tabs while a match was found and
for the first few seconds of the start of the game. While doing so, many of
the CSS animations, sounds, JavaScript functions would be queued and all
executed at once when you switched back to the tab, leading to a bad
experience, or in some cases the game breaking. I assume this would only get
worse with this tab throttling feature, but there are legitimate use cases
where both developers and users would prefer a tab to still be running at
normal speed even when not focused. We use this every day in desktop
applications, where we start a game or some video encoding task and just let
it run, non-throttled, in the background

~~~
londons_explore
As a developer, you probably ought to just send a message to the server saying
"I am throttled", and then later when the tab is reactivated send another
message saying "I am unthrottled, please send me the current state of the
game".

You shouldn't be expecting to queue up thousands of messages from potentially
hours of the tab being backgrounded and handle them all when the tab is
unbackgrounded. Nor should you process all the state changes as they come in
even though the tab is backgrounded - no user wants to spend
data/power/cpu/battery/heat on something like that.

~~~
XCSme
That's a good solution if your game receives game states, but in our case the
game was deterministic and ran the simulation on the client at 50FPS, only
receiving the inputs from other players. This means that if your game freezes
for 10 seconds (or tab out for 10 seconds), when you go back it would have to
simulate all those 10 seconds before it could continue.

Imagine having GTA 5 multi-player running in a background window, you would
still want to hear the other people, hear the in-game sounds so you can come
back to the game if something important happens, and also not have to simulate
instantly all the events that you missed when you come back.

This also reminded me that we had problems with just playing a sound when a
game started to let the players know they should come back to the tab. Players
complained that sometimes the game does not notify them it has started, but it
was just Chrome simply not executing JavaScript anymore.

~~~
londons_explore
I think the Chrome developers are trying to tell you you shouldn't be using
your users devices as game servers...

How about when a tab is backgrounded, a server continues running the game and
just streams an mp3 audio stream to the users device. Most devices can stream
audio with the CPU _actually turned off_.

~~~
XCSme
The devices are used as game clients, not game servers.

Streaming audio sounds like an interesting idea, so instead of the server
sending an event and the client playing a sound when that event is received,
the client would constantly stream a dynamically created audio file. This
solution could work, feels a bit complex as you would still have to make sure
stream catches up if user has a lag spike, instead of buffering the audio
(timing should still be accurate).

That being said, what it the game is single-player and there is no server?
Then the client itself would have to generate the audio stream and stream it
to itself, which wouldn't work if the background thread is throttled.

------
Andrew_nenakhov
Sigh. I used to love Chromium, when it was the new kid on the block. But now,
because of this browser monoculture, I wouldn't use it even if it was far
better than Firefox.

How times change.

------
londons_explore
What do these profiles consist of? Simply which branches are taken vs not
taken? Or do they include a history (eg. if this branch is taken then that one
will not be)?

Do they depend on the stack? Ie. when called from this function, that branch
is always taken. That could encourage the compiler to inline the function so
it can have a faster path on the branch.

~~~
jeffbee
This all depends on what hardware you use to capture your profile, but on
recent (Skylake and later) Intel CPUs you can use the Last Branch Record (LBR)
to get branch source and target, including the call stack.

This deck can give you some idea of what the modern Intel PMU can do:
[https://protools19.github.io/slides/Eranian_KeynoteSC19.pdf](https://protools19.github.io/slides/Eranian_KeynoteSC19.pdf)

~~~
loeg
I think the GP question is, what data do GCC/Clang actually emit with
-fprofile-generate?

------
rat9988
It sees it was already used for windows:
[https://blog.chromium.org/2016/10/making-chrome-on-
windows-f...](https://blog.chromium.org/2016/10/making-chrome-on-windows-
faster-with-pgo.html)

~~~
The_rationalist
Don't be misled, it was enabled for windows then got disabled when they
switched to clang. To windows will get back this optimization so newer Chrome
releases will get up to 10% more performance on windows

------
crazygringo
Sorry for the elementary question...

...but is PGO referring to the compilation of Chrome itself, or the
compilation of JavaScript on sites?

The blog post doesn't actually specify which compilation is being talked
about, and page performance could obviously depend on either.

~~~
chrisseaton
Compilation of Chrome itself. Compilation of JavaScript is almost always
profile-guided.

------
tmsh
[https://github.com/greatsuspender/thegreatsuspender](https://github.com/greatsuspender/thegreatsuspender)
is great on Chrome (up until now).

------
Avamander
One can hope that Linux would see similar optimizations applied to it at some
point.

~~~
flatiron
I’m surprised they aren’t focused more on it. Not for us desktop Linux users,
but for ChromeOS

~~~
jeffbee
Perhaps you will then not be surprised that ChromeOS Linux kernel has been
built with profile-guided optimizations for a long time.

[https://chromium.googlesource.com/chromiumos/chromite/+/mast...](https://chromium.googlesource.com/chromiumos/chromite/+/master/cbuildbot/afdo.py#94)

------
gnramires
I wonder how far we can take this sort of effort -- collect exact usage
statistics for users and optimize software exactly for known usage patterns.
Of course some statistical technique would be needed to deal with the long
tail of usage patterns -- surely some program branches are used extremely
rarely (or not even seen in collected samples), but you still wouldn't want
them to be extremely slow (slower than necessary) or crash just because
they're rarely used. The cumulative usage of many of those long tail events
can be large.

Either (regularized) statistical inference of branches or something like
assuming a baseline usage for every branch would be necessary.

This is all significantly complex of course, so for anything that are not the
consumer megaprojects of software (browsers and operating systems?), I wonder
if those tools could be used as well. Perhaps there could be some automated,
anonymized usage reporting that does all this work?

~~~
hinkley
Ages ago I had a conversation with a coworker about why was it that if
databases collect statistics for optimizations, that data structures in
programming languages don't do the same thing?

This would come up a few more times in my career. The next time was a few
years later when I was having to go through a code base removing premature
optimizations for initial list size in a code base. The average length of our
data had grown such that our lists were now bigger than the defaults would
have been. That was added to my growing collection of "making code faster by
removing code" examples.

We talk about using better algorithms instead of tweaking code with micro-
optimizations but then we never talk about tuning the better algorithms. So
they default to the average case or worse.

~~~
tjalfi
> Ages ago I had a conversation with a coworker about why was it that if
> databases collect statistics for optimizations, that data structures in
> programming languages don't do the same thing?

Lus tables in LuaJIT, Hack arrays, and JavaScript arrays can use different
representations depending on what is stored in them.

NYU’s SETL project also did some work on automatically identifying what data
structure should be used.

IBM’s Hermes project was also working on this; I’m not sure whether they
shipped it.

------
freediver
Chrome 87 vs Safari 14 on macOS Big Sur benchmark:

[https://imgur.com/a/gUX2CIF](https://imgur.com/a/gUX2CIF)

Safari/Webkit still the "fastest browser possible" on macOS.

------
gniv
This is supposed to roll out today:
[https://www.chromestatus.com/features/schedule](https://www.chromestatus.com/features/schedule)

------
Santosh83
Will this filter down to MS Edge eventually?

~~~
tambre
It seems Google is running their nightly pipelines with additional profile
generation, but sounds like that's restricted to their internal Chrome (not
Chromium) pipelines.

Microsoft would have to do their own setup. Hopefully they do.

~~~
markdog12
I wonder if PGO were at least part of the "toolchain optimizations" mentioned
here:
[https://blogs.windows.com/msedgedev/2020/02/13/performance-o...](https://blogs.windows.com/msedgedev/2020/02/13/performance-
optimizations-edge-81/)

------
XCSme
Unrelated: my Chrome (on Windows) randomly freezes for 1-2 seconds whenever I
focus one tab/window after I am away from it for a while. Is there any way to
debug those kind of freezes?

------
blackflame7000
Chrome used to be a light weight fast browser and now it feels the need to use
a GB of RAM to show a single new tab window.

~~~
1f60c
I decided to measure the memory consumption of browser's new tab pages using a
woefully unscientific method (simply looking at the memory usage tab in
Activity Monitor and tallying everything up):

    
    
      Firefox  505 MB
      Safari   480 MB
      Chrome   464 MB
    

So it seems like they're about equal?

~~~
saagarjha
Are you including helper processes?

~~~
afrcnc
No he's not. He's also not looking at his CPU usage, where Chrome will
literally go towards the high 70s for no fudging reason, while idle

~~~
londons_explore
Clear your browser profile. Chrome profiles get slow and CPU heavy after
multiple years use.

------
sa46
Does PGO support reproducible builds? If you pass the same profile at compile
time do you get the same binary?

------
coding123
Except on OnePlus where it just freezes randomly for 10 seconds unless you
switch tasks and switch back.

~~~
londons_explore
Close all the tabs...

Somehow mobile Chrome keeps opening a new tab for everything I do, and I found
I had over 1000. Since the tabs aren't actually loaded unless you switch to
them, I assumed keeping 1000 URL's in memory wouldn't impact performance much,
but when I closed them random freezes when clicking every link went away...

------
The_rationalist
Will it be enabled on Android before Linux?

------
callmeal
So, HotSpot then?

~~~
chrisseaton
No, this is AOT PGO, not online PGO like HotSpot made mainstream. The
JavaScript compilers were already doing online PGO.

------
sadfev
Wonderful!

Chromium is an excellent project but just sometimes slow to adopt certain
features.

I love chrome and chromium browsers, I hardly have any issues with them.

Unlike garbage FF and unreliable Safari

