
Android Performance Case Study - jkn
http://www.curious-creature.org/docs/android-performance-case-study-1.html
======
napoleoncomplex
This is an avalanche of wisdom. I'm summoning the courage to enable "Show GPU
Overdraw" on my apps and seeing just how bad it is.

And if anyone else was interested, the developer of Falcon Pro has already
started implementing the fixes. To quote him from Twitter (@joenrv): "Reading
@romainguy analysis of my app makes me feel a bit like a naked dude in a room
full of people staring xD #letsgettowork"

~~~
sciwiz
He has already updated the app:
<https://twitter.com/joenrv/status/275289692998623232>

"#FalconPro v1.0.2 uploaded to @GooglePlay. Get ready for some extra butter
thanks to @romainguy ! Should hit your devices in an hour or so"

------
StavrosK
I have very little interest in Android/mobile development, but this post was
one of the most interesting things I've read in the past few days. Very well
done, I feel like I've learnt very much about graphics performance (generally,
not just on mobiles) in five minutes.

------
kevingadd
The OP is slightly overstating the impact of overdraw as a general thing.
While in this particular case it is likely that the overdraw is the cause of
poor performance in this application, it's not always going to be the cause.

Overdraw primarily eats up memory bandwidth. Bandwidth isn't the only resource
you have to worry about on a GPU (though on a mobile, it's certainly
important). Equally important can be the time spent running pixel and/or
vertex shaders when actually drawing onscreen elements - it's quite easy for a
poorly written pixel shader to add multiple milliseconds to the time taken to
render _one_ fullscreen image on an embedded device.

Unfortunately none of the steps he takes in this article, until the OpenGL ES
Trace near the end, appear to give you any of the information you'd need to
figure out whether overdraw is actually your problem. Maybe it's a safe bet
that for most Android apps, overdraw is the issue, because they're using Skia
and thus don't have access to use custom shaders?

On the other hand, that Hierarchy Viewer feature in the debug tools looks
really great. I wish more SDKs offered features that nice.

~~~
corysama
In the case of pixels hidden behind views that do not use blending, overdraw
only eats a very small amount of memory bandwidth checking the depth/stencil
buffer for each covered pixel. This check is highly optimized in the hardware
to reject whole blocks of pixels while reading only a few bits. (Android UI
does use the depth/stencil buffer, right???) However, I don't think that's
what the article is talking about. "You can see that the transparent pixels of
the bitmaps count against your overdraw."

In the case of visible views that do use blending, overdraw multiplies the
time spent running shader computation right along side multiplying the full
memory bandwidth consumption of the shader (much more than just checking the
depth/stencil buffer). It's true that it's totally possible to write slow
shaders that chug with only 1x overdraw. But, at 3x overdraw it will be 3x as
bad because you are running the whole function 3x per pixel.

~~~
RomainGuy
Android does not use the depth buffer. The UI toolkit draws back to front. We
are thinking about ways to improve this but most apps draw blended primitives
back to front. An optimization I want to get in is to resort the render tree
to batch commands by type and state. A side effect of this will be the ability
to cull invisible primitives.

The stencil is not used at the moment (well... that's actually how overdraw
debugging is implemented) because the hardware renderer only support
rectangular clipping regions and thus relies on the scissor instead. Given how
the original 2D API was designed, using the stencil buffer for clipping could
eat up quite a bit of bandwidth or require a rather complex implementation.

It is planned to start using the stencil buffer to support non-rectangular
clipping regions but this will have a cost.

Remember that the GPU rendering pipeline was written for an API that was never
designed to run on the GPU and some obvious optimizations applied to
traditional rendering engines do not necessarily apply.

~~~
corysama
That's actually what I expected, but I couldn't find any reference. So, I
defaulted to the optimistic, but probably wrong stance hoping that someone
would correct me. Thanks!

This means that, at least on traditional forward rendering GPUs (Nvidia,
Adreno), overdraw is full cost even for pixels covered by opaque views. Do the
PowerVR chips still get effectively-zero opaque overdraw from their tile-
based-deferred-rendering approach?

Meanwhile, I'm not totally clear how hidden surface removal works on Mali
chips. They use TBDR, but still recommend drawing front-to-back to avoid
overdraw
[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc....](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0555a/CHDIJGGA.html)

------
elteto
Wow. I didn't know that the Android SDK shipped with such advanced profiling
tools. They have really improved since the last time I played with it, circa
1.x. Impressive. Can anyone comment on other similar tools?

~~~
martythemaniak
Another common one is "strict mode", where you tell Android to notify you (via
screen flash or log dump) any time your app makes a potentially blocking call
(network, file system) on the UI thread. Very useful as a first optimization.

~~~
ConstantineXVI
As of (IIRC) Gingerbread, you get thrown a NetworkOnMainThreadException if
you're making network calls in the UI thread and haven't explicitly turned
StrictMode off.

(The WinRT SDK goes a step further, there simply _aren't_ any blocking
network/FS methods available. async/await makes that less painful than it
would seem, though.)

~~~
ConstantineXVI
PS: It's actually Honeycomb and later. This behavior only kicks in if you're
compiling against HC+, anything compiled against GB or lower can freely block
the UI thread with reckless abandon regardless of which Android version you're
running on.

------
shocks
I hosted the trace here, if anyone wants to look:
<http://dev.mindfuzz.net/trace.html>

------
Shank
One side affect of this is that Falcon Pro now should be significantly better
optimized as a result. The "What's New" log on Google Play reveals this[1]:

v1.0.2 * Optimized the app following @romainguy recommendations. Report back
if you feel the butter :)

[1]:
[https://play.google.com/store/apps/details?id=com.jv.falcon....](https://play.google.com/store/apps/details?id=com.jv.falcon.pro)

------
yardie
Wow, I'm amazed! Android performance tools have really advanced since my early
attempts at android development (2.2-.3). I'll need to try some of these tools
out.

Have they advanced on the NDK/C++ front as well? I've never been interested in
the Java aspect of Android and working with the NDK was brutal compared to
iOS.

------
campnic
As an Android developer, its really exciting to see good examples of the tools
in action. The documentation on how to debug problems beyond crashes/ANRs is a
bit thin.

That being said, I do have one gripe. There are some chasms between the
different tools. It is a bit painful to operate all the different performance
tools and deal with switching contexts for the same problem. Systrace ->
traceview and back and forth.

Also, it would be nice if traceview had a text based api/interface. I know
that the graph visualization must be valuable for something, but I spend the
majority of my time looking for particular methods and signs of excessive
consumption/trouble. Now that I think of it, this sounds like a fun weekend
project :)

~~~
RomainGuy
We are trying to have all the tools available in ADT/monitor. Systrace can
actually be invoked directly from ADT/monitor, there's a button in the toolbar
for this.

------
clak
Quite useful for Android developers.

~~~
maak
Quite useful for most developers. The optimization process described here is
applicable to many flavors of development. Very nice article OP!

------
shocks
Slightly off-topic: I thought "Twitter clients" were frounded upon now, and
massive API restrictions made them virtually impossible?

Very interesting post though. :)

~~~
kristofferR
There's a limit of maximum 100.000 users/tokens for new Twitter clients.
That's still $100 000 for $1 Twitter apps ($70 000 after Apple/Google takes
their share), so it's not the end of the world of Twitter apps, but it won't
make any developers rich either.

Getting 100 000 users for paid apps is very rare, Falcon Pro has just between
1000 and 5000 users so far for example. It's was just recently released
though. If it starts to become popular "too fast" the price can just be
raised.

~~~
randallu
Maybe if they fix those graphics performance bugs!

------
fulafel
I tried to do the memory bandwidth math but ended up even more puzzled:

Anandtech Nexus 7 review says it has 5.3 GB/s of memory bandwidth. At 60 fps
that would leave 88 MB per frame.

The 1280x800 screen has 3 megabytes worth of pixels at 24 bpp. That's 1/29 of
the 88MB-per-frame. So how come the overdraw related slowdowns started
appearing with just 4x overdraw?

~~~
chmike
Because of Java ?

------
realrocker
Brilliant as always.

~~~
ah_muzakkir
Agreed. It's very interesting. And as a junior Android Application Developer,
I've learnt so much in less than 30 minutes.

To Romain Guy, please keep it up. Thanks !

