

Unusual speed boost: size matters - othermaciej
https://www.webkit.org/blog/2826/unusual-speed-boost-size-matters/ 

======
DannyBee
the updateCachedWidth example probably gives the wrong idea to a lot of folks.

You would be just as well off making computeWidth const/pure/readonly/whatever

The compiler can even detect if it modifies anything and mark it for you. In
fact, better compilers will compute mod/ref information and know that
m_cachedWidth is not touched over that call.

However, LLVM's (which is what at least Apple is using) basic stateless alias
analysis is not good enough to do this in this kind of case (there are some
exceptions where it might be able to, none apply here)

This is actually a great example of how improving pointer and alias analysis
in a compiler buys you gains, and _not_ an example of "how you should modify
your code", since you generally should _not_ modify your code to work around
temporary single-compiler deficiencies without very good reason.

Especially considering how quickly compiler releases are pushed by Apple/et
al.

~~~
Someone
I agree with the 'make things const' part, but I would expect m_cachedWidth to
be mutable. I'm too lazy to check that now, but if so, that would not help
here.

Even if it is not, I still think this is an example of "how you should modify
your code". Reason? Doing

    
    
      temp = foo();
      temp *= bar();
      m_member = temp;
    

keeps your state consistent in case bar() throws. I would even use it if bar()
is known not to throw, because you can't know what the future will bring, and
because what you 'know' sometimes isn't true. Defensive programming, if easy
as in this case, is a net benefit.

~~~
khuey
Most low-level C++ projects (including Webkit and Gecko) don't use C++
exceptions.

~~~
RogerL
True enough, but we strive for consistent practice - coding in one way for
project X, another for project Y, is error prone and doesn't scale. That handy
dandy defrobolizer you wrote for WebKit? You can't really reuse it in another
project without rewriting it to be exception safe. You should vary from good
practices only when it makes sense, not use them only if you can make a case
for using them in this specific instance. Anyone can come along later, eyeball
your code, and understand your intent and see the potential problems.
Magically code for a very specific environment? Good luck. Also, good luck
when your next project requires exceptions or what have you, because you will
be deeply out of practice coding for them.

Obviously the above can be taken too far, but programming for const
correctness, mutability, and safety in the face of exceptions should be in
everyone's toolkit, and the default way they program in C++ (IMO, naturally).

------
kinlan
I may have missed it, but were there any stats about the actual performance
gains? It often mentioned binary size etc but nothing about the impact it had.

~~~
moconnor
I cam here to say this. It's weird to talk about performance without actually
showing any performance figures.

I'm all for removing old code, but if you're going to claim performance gains
then why not measure those?

~~~
astral303
Exactly. You really should avoid doing performance optimizations without
measurements that show improvement as well as provide some coverage against
performance regressions.

Here's an example, Thrift (as used in Hector, a Cassandra client), had someone
make a performance improvement:

[https://issues.apache.org/jira/browse/THRIFT-959](https://issues.apache.org/jira/browse/THRIFT-959)

The discussion has a lot of "shoulds", and one measurement of latency
distributions, but no measurement of typical workloads or bulk inserts. Turns
out, that caused at least a 30% performance regression:

[https://issues.apache.org/jira/browse/THRIFT-1121](https://issues.apache.org/jira/browse/THRIFT-1121)

------
captainmuon
Does anyone else find it crazy that the absolute size of WebCore is 38 MB?
That's larger than the Linux kernel which includes a bunch of drivers.

If I understand Webkit's architecture correctly, that doesn't even include
chrome (the visible UI, not Google's browser), JavaScriptCore, platform
specific glue, and especially no auxilliary files (certificates, icons, the
"broken image" sign, ...).

Sometimes I long for the good old days where a browser used to fit on a floppy
disc (Opera).

I wonder if someone has done analysis on what features make browsers so
complicated. I could imagine that 20% of the code could handle 80% of the
features (as so often). You could have a 'lite' HTML subset that's targeted on
rich documents, rather than rich client webapps. Something like that would be
great for older computers or mobile computers.

Going a bit further, I know there is a lot of crazy stuff in WebKit... e.g.
neural networks try to predict which links you'll click on, based on previous
behavior, mouse movements, etc, and then the browser prefetches likely pages.
There are runtimes for NaCL, pNaCL, Flash, there's a PDF browser (some of
these are plugins), there is a VNC client, support for a bunch of different
rendering models (layered HTML elements, Canvas, 3D), media support (codecs),
support for webcams and microphones, and peer-to-peer communication, and much
more. _phew_

I guess a large chunk of this stuff should be in the OS, so that other apps
could benefit from it. And another large part of it should be in plugins, so
the browser can benefit from all the codecs on the system, for example.

~~~
dragonwriter
> There are runtimes for NaCL, pNaCL, Flash, there's a PDF browser (some of
> these are plugins)

Really? While Chrome has all of those built in, the other WebKit browsers
don't so why would it be in the WebKit source tree (especially after the
mutual purges of the Blink/WebKit split.)

Also, all of those (except maybe the PDF viewer) in Chrome are plugins, and
they are PPAPI (which was Chromium specific, not used by other WebKit
browsers) not NPAPI plugins, and Flash and maybe the PDF viewer aren't bundled
with Chromium (just Chrome) so its _really_ weird that anything related to
them would remain in WebKit.

~~~
captainmuon
You're right, I mixed up WebKit and Chrome there.

------
Qantourisc
Or even better then double:

inline void updateCachedWidth() { m_cachedWidth = computeWidth() *
deviceScaleFactor(); }

Ghee, was that line so hard to read ? No it's easier ! (Might have just been
an example though.)

~~~
kalleboo
I was also wondering why this wasn't done, it must have been an example that
was reduced in size.

------
cLeEOGPw
It is not mentioned in the article that writing "inline" does not
automatically make the function inline. It only gives C++ compiler a hint that
it might be worth inlining. Compiler can inline function even if it has no
inline keyword, and can not inline even when the function has the keyword, if
it decides inlining would be inefficient.

~~~
kgabis
Why does that keyword even exist then?

~~~
nknighthb
Because some compilers do make use of that hint in useful ways. They're just
not required to, nor are they required to do so in any specific situation,
much like the register and volatile keywords.

~~~
Someone
volatile is not a hint, it's a requirement. You couldn't control hardware
without it (compiler: "you're only writing this variable, never reading it.
I'll skip those writes to speed things up". Programmer: "why isn't my program
writing anything to this I/O port?")

~~~
nknighthb
I suggest reading the C standard, where you'll find that, despite the many
words used about volatile, it doesn't really guarantee you anything at all. I
especially love this sentence:

> _" What constitutes an access to an object that has volatile-qualified type
> is implementation-defined."_

The behavior of I/O ports and the meaning of any attempt to access them is
beyond the scope of the standards. The "requirements" that exist for volatile
are simply an absurdity outside the (new in C11/C++11) threading context,
since the implementation is not actually required to do anything useful.

------
unknownian
Without a doubt, WebKit is one of the most interesting parts of Apple. A
community of open source developers that accept contributions (I'm assuming)
with a developer-focused open blog with tips on writing C++ - a language not
even particularly widely used elsewhere in Apple.

~~~
CoolGuySteve
I concur. When I was there, they were pretty clear that everything I made
belonged to Apple (which was total bullshit according to California law), that
nothing we made would ever be open sourced due to patent issues, and that I
was to never say anything about the company in public.

I wonder how their team could swing that culture without tripping over legal
at every turn.

~~~
bzbarsky
> and that I was to never say anything about the company > in public.

This remains the case to a large extent. This makes working with Apple and
Microsoft in standards groups a bit challenging at times, since they won't
actually comment on whether they're even thinking about implementing a
standard, or whether they would be willing to implement it as written, until
they suddenly ship it.

And if they're _not_ shipping it you have no way to tell whether that's
because they never will because of some fundamental issue they perceive or
whether they're basically fine with the idea bu just haven't gotten around to
finding resources to implement yet.

~~~
othermaciej
In the case of Web standards in particular, you can usually see the checkins
in our public source tree well before we ship it. But per policy we will
rarely publicly commit to shipping something or not ahead of time.

~~~
bzbarsky
Historically, the story with visibility on checkins to iOS Safari was not that
great.

But yes, for desktop Safari usually one can get some idea by scouring checkin
logs.

------
CoolGuySteve
Interesting that this article doesn't mention profile guided optimization. In
my experience, PGO is able to eliminate a lot of the performance problems
associated with unnecessary inlines and rarely called functions eating up
cache space.

The major downsides are that you can only optimize what the profiler can see
and running the thing to make a build takes forever.

~~~
bzbarsky
The major compilation targets for WebKit for Apple are MacOS and iOS, and
compilation is with clang/clang++/llvm.

And clang's PGO support is not very good so far, so there isn't much to talk
about...

------
nnq
looking on example 1, I wonder _why don 't languages like C++ or D or Go just
add a "pure" keyword for functions that don't modify the global environment or
their arguments?_ this will help the optimizers a lot I imagine. and yeah, I
get it that there still could be roundabout side-effects, it's not Haskell,
but the compiler could just trust the programmer that he knows what he's doing
when he sticks the "pure" keyword before a function definition.

~~~
fhd2
> looking on example 1, I wonder why don't languages like C++ or D or Go just
> add a "pure" keyword for functions that don't modify the global environment
> or their arguments?

The function in example 1 is modifying a member variable, and there is indeed
a keyword that requires functions not to modify the class they operate on:
const. It's very powerful, and by a long shot my favourite feature of C++.

That said, the function in question actually has to modify a member variable.

~~~
mjn
The problem here isn't updateCachedWidth(), which indeed modifies a member
variable, but deviceScaleFactor(), which doesn't. Since LLVM doesn't infer
that deviceScaleFactor() leaves m_cachedWidth unmodified, it forces a reload
of m_cachedWidth after the function call in case the value changed.

~~~
RogerL
It was an _example_. Replace deviceScaleFactor with some other function that
modifies, say, m_foo, and you have the same problem. And no, const+mutable is
not an answer(in my example), because the function does in fact modify the
class in a way important to the caller.

------
dm2
While I'm very appreciative of the contributions Apple made to the web with
WebKit, most of the recent innovations and upkeep has been thanks to Google.

What is the future of WebKit now that Blink has been introduced? Will Apple
spend considerable resources keeping an open-source project at the bleeding-
edge considering it doesn't really make them any money? Should Safari just be
scrapped? It only accounts for < 4% of the market share of non-mobile
browsers.

[http://en.wikipedia.org/wiki/WebKit](http://en.wikipedia.org/wiki/WebKit)

[http://en.wikipedia.org/wiki/Blink_(layout_engine)](http://en.wikipedia.org/wiki/Blink_\(layout_engine\))

~~~
Perceval
I think Apple is pretty well set on mobile browsing as the future growth
market. They don't seem to mind cannibalizing desktop sales for iPads,
iPhones, and iPod Touches. In mobile browsing, Safari has ~60% market share.
As long as mobile browsing keeps growing, Safari will remain an important
asset for Apple.

[http://www.netmarketshare.com/browser-market-
share.aspx?qpri...](http://www.netmarketshare.com/browser-market-
share.aspx?qprid=1&qpcustomb=1)

~~~
dm2
I'm not sure I trust that site, they also claim that IE had 56% of the Desktop
market share during 2013, while all other reports claim to have very different
data: [http://www.netmarketshare.com/browser-market-
share.aspx?qpri...](http://www.netmarketshare.com/browser-market-
share.aspx?qprid=0&qpcustomd=0&qpsp=2013&qpnp=1&qptimeframe=Y)

[http://en.wikipedia.org/wiki/Usage_share_of_web_browsers](http://en.wikipedia.org/wiki/Usage_share_of_web_browsers)

Here is StatCounter for Mobile for the last 6 months, the iPhone browser with
23%: [http://gs.statcounter.com/#mobile_browser-ww-
monthly-201207-...](http://gs.statcounter.com/#mobile_browser-ww-
monthly-201207-201307)

~~~
extra88
You just posted the netshare link again instead of a StatCounter one.
StatCounter's Mobile OS shows iOS at about 24% last month.
[http://gs.statcounter.com/#mobile_os-ww-
monthly-201207-20130...](http://gs.statcounter.com/#mobile_os-ww-
monthly-201207-201307)

But I don't think they count iPads and other tablets as "mobile." Their Mobile
Browser chart listed iPhone and iPod touch separately and didn't list iPads at
all. Mobile Screen Resolutions doesn't include iPad resolutions. Browser
Versions (Partially Combined) has a separate entry just for Safari iPad (which
actually exceeds Safari 6.0 and Safari 5.1, combined).

------
cmbaus
I'm glad this is being discussed, because for many years inlining functions
was considered a performance panacea in C++, with no regard to the size of the
object code that was being generated, or the effects inlining was having on
instruction cache performance.

The C++ community has brought itself all kinds of complexity and long compile
times all in the name of performance which, in my mind, was always pretty
suspect.

------
bhdz
Sorry But I will recapture his thinking here:

    
    
        * Try to be explicit { rather than implicit }
        * Carefully consider inlining { large blocks of code }
        * Do not use static initializers { for infrequency or trivialities }

------
zurn
There's a compiler optimization to automatically improve icache utilization by
moving rarely executed code branches far out of line, so that they don't take
space in the loaded cache lines when the straight line code executes. (This
still leaves the wasted disk space and possibly RAM)

GCC docs sound like the trick would be -fprofile-use, -freorder-functions and
-freorder-blocks-and-partition - after a representative profiling run.

A representative profiling run for a shipping binary is a problem of course,
JITs win here. DEC had a dynamic binary reoptimization framework in the 90s
called DYNAMO that could do it for AOT compiled binaries.

------
masklinn
Others have already covered a lot of things, so I'll just say that I'm sort-of
impressed by:

> The second big drop is the result of removing features and cleaning code
> that came from the Chromium project.

In the graph, the second big drop is ~5% of the initial code size, removing
the chromium code actually reduced binary size more than the inlining fixes
did.

------
MaxGabriel
I liked this article, but the last 3 graphs were really poor. By not listing
the actual binary size in some denomination of bytes for the last 3 graphs
(only using %s), I get less information from the charts. A much worse problem
is the date formatting, which squishes all the numbers together so that they
look like one big string. The first graph was much better.

------
dschiptsov
_Try to be explicit in your coding to help the compiler understand what you
are doing._ \- so obvious. Clarity is an evidence.)

------
caycep
Is webkit written in C++ instead of objective C?

~~~
captainmuon
Yes, Webkit is originally based on KDE's KHTML, and thus written in C++. It
was heavily modified by Apple and later by Google. The core is platform
independent C++, and then there are platform specific parts for the various
environments it runs in (C/C++ for Gtk, C++ for Qt, ObjC for OSX, ...).

With the clang or gcc compilers, you can easily link ObjC and C++ together. To
some extent you can even mix them in one file (ObjC++), though I don't have
experience with that.

