
Dalvik patch for Facebook for Android - ctice
https://www.facebook.com/notes/facebook-engineering/under-the-hood-dalvik-patch-for-facebook-for-android/10151345597798920
======
no_more_death
Uninstalled. I checked, and Facebook took up more memory than any other
application I have installed, even more than TSF Shell. (By the way, a MUCH
more "feature-rich," useful, and snappy app than facebook!) Facebook isn't
doing anything comparable to what TSF Shell does, not for me anyway. I bet
this is why my phone recently started taking 30 seconds to make a phonecall
(!).

I can't believe an app requires 8M RAM just for method names! That's 3% of my
entire usable memory (what's left after Motorola Blur takes its share) -- just
for the method names for a single app. Surely I must be misunderstanding what
goes into that buffer. It's unimaginable to me that they really need such a
vast amount of space. It's unimaginable that they really expect their app to
work at all on an old phone without choking it to death.

If I grasp this correctly, printing just the method names would create a book
double the size of the King James Version of the Bible. Maybe you are going to
argue having so many methods is good programming practice, not bloat. But
don't argue that you "support" a system when just the method names total
around 3% of usable memory for that system. And there are a lot of devices
with considerably less memory than what I've got.

Why don't you release a stripped down, "lite" version of the app that works on
an Android 2 phone even if it has more than 5 applications installed? Some
phones don't even allow uninstalling facebook. They are simply stuck with the
bloat (I mean, good programming practices :-).

~~~
kristianp
Yes, I'm in the same boat. On my Android 2.2 phone (HTC Legend) it is hard to
fit some current apps and facebook is a system app so I can't uninstall it.
However I can uninstall the updates, so my facebook is the original factory
version. I won't be trying out this latest version.

It would be interesting to know what it is doing to be that complex. I'm
guessing lots of third party libraries for ads, logging, etc?

------
marhumph
The first paragraph is just rampant self praise. Basically these engineers
decided to be really clever, painted themselves into a corner, smashed a hole
in the wall so as to escape from said corner, and then bragged about how great
they are.

~~~
mikeash
Hitting an arbitrary (and undocumented?) platform limit on the number of
methods your app is allowed to have is hardly "painting themselves into a
corner".

~~~
myko
I'm not sure if it is listed in the official documentation but it was talked
about by the Android team on their blog:

[http://android-developers.blogspot.com/2011/07/custom-
class-...](http://android-developers.blogspot.com/2011/07/custom-class-
loading-in-dalvik.html)

~~~
mikeash
This appears to discuss the per-dex method limit but makes no mention of the
per-process limit.

------
JulianMorrison
No, that is not cool, that is an admission that

1\. They bloated the app so badly, they would have had to monkey patch the OS
to let it run at all.

2\. They actually did have the stonkingly huge arrogance to monkey patch the
OS.

What they should have done is shattered it down into a constellation of
related apps and background processes. This app is FB news, that app is FB
messaging, and so on. Bonus: people could pick and choose the parts of the
system they cared for.

But that last is what they wanted to avoid, I think. Facebook _à la carte_
means that they can't foist unwanted "services" on you.

And so they hacked.

------
jmillikin
According to <http://techcrunch.com/2013/03/04/facebook-google-dalvik/> and
<http://venturebeat.com/2013/03/04/google-facebook-android/> , the specific
limit being hit was a cap of three million methods per instance of the Dalvik
VM.

Wisdom of a three-million-method app aside, I wonder why they didn't fix this
in the compilation toolchain rather than by poking around in native memory
areas during app startup. Facebook wrote a PHP->C++ compiler; changing the
Dalvik compiler to more aggressively inline small methods seems trivial in
comparison.

~~~
georgemcbay
Three million methods? Having both programmed for Android and having used the
Facebook App for Android, I find it very difficult to believe that app has
anywhere near 3 million methods, even if each one is basically a one-liner. If
the number is correct, that really is some kind of crazytown code.

~~~
Xuzz
I think the actual number is 65536 or similar — the 3 million may have been a
misquote by TechCrunch.

~~~
georgemcbay
Still seems like a ridiculous number, though of course far less ridiculous
especially for Java code with its legacy of setters and getters for even basic
properties and FactoryOfFactoriesFactory classes.

~~~
cpeterso
Trivial getters and setters shouldn't be a problem because the Facebook blog
said they tried using ProGuard, which can inline Java methods.

------
barbs
I really love reading recounts like this. Seeing other developers try, fail,
and try again to come up with creative solutions for difficult problems, and
then finally arriving at a solution, is so satisfying. It really demonstrates
how creativity is manifested in programming, and is truly inspiring.

A favourite of mine is Jari Komppa's article on his process of porting Death
Rally from DOS to Windows: <http://sol.gfxile.net/dr_gdm.html>.

Does anyone have any other articles along the same lines? I'd love to read
them!

~~~
anyfoo
Thanks, that was indeed a great read. You might also enjoy this, which is
fairly recent: <http://fabiensanglard.net/duke3d/index.php>

And the article links to other promising articles at the top, though I haven't
checked them out.

The general question remains: how would we go about finding similar stuff?

Oh, one more thing, if you don't know yet, you will find great gems in Raymond
Chen's blog, <http://blogs.msdn.com/b/oldnewthing/>

Not everything is archeological (a lot is still interesting though), but the
articles often link to a whole bunch of other articles, which in turn do the
same. I suggest, for example,
[http://blogs.msdn.com/b/oldnewthing/archive/2012/06/29/10325...](http://blogs.msdn.com/b/oldnewthing/archive/2012/06/29/10325295.aspx)
. I deliberately chose a later article in the series, since it also links to
previous ones.

------
kumarm
Unfortunately TechCrunch reported incorrectly and no one at Facebook cared
enough to correct them (OR thought its cool to boost 3M Methods in an app).

The limitation in older Dalvik VM's is 64K Methods and not 1 Million methods
which is realistic count that a large app would hit (even decent sized app
with high number of third party libraries).

This limitation is known for a while and Google Engineers talked about it
multiple times.

------
MichaelGG
Certainly if there's a lot of small methods, they could have an optimization
pass that inlines some of them and bring their overall method count down. I
wouldn't be surprised if that was a small performance gain, too.

~~~
jerf
I thought of that, then the pragmatic engineer in me observed that they only
discovered this _after_ cracking their app into so many small methods. Given
the time choice of 1. uncracking their app into small methods (which
presumably wasn't atomic, so now that's all mixed in with other development,
so it's not like it can just be reverted) or 2. hacking the system to handle
more methods, I am forced to admit that choice 2 is two or possibly three
orders of magnitude _faster_ , in man-hours. (Probably two or three of their
best engineers for a couple of weeks, vs. "the entire team", from dev to QA,
tasked for months, on work that results in no new features). Not better in all
possible ways, but it's hard to justify "do it right" against that sort of
cost/benefit ratio.

Then I was going to say "I hope they move towards a more permanent solution"
when again my inner pragmatist popped up and said "Yes, they did; unsupport
Android 2 in the future." So... yeah.. nasty, _nasty_ hack, probably an
enormous win almost all the way around. So the wheel turns.

~~~
magic_haze
I think GP meant something like a build process that inlines the bytecode, not
the source code itself. I don't see why this is unreasonable, especially
compared to what they actually did.

~~~
jerf
For as nasty as the hack is, it still boils down to twiddling with a few
numbers.

Also, inlining is probably easier said than done. It's not enough to inline
the methods, you need to make them entirely disappear, and ISTR reading within
the last few days somebody else commenting that Java still has to keep the
metadata about the methods around (which is the problematic part, not the
methods themselves, if I'm reading this right) because reflection may demand
them. You'd need something more sophisticated to do it at the source level,
and with an imperative language with unrestricted side effects, while that is
certainly possible, it's also very much easier said than done. That's not a
weekend hack either.

~~~
MichaelGG
I'm not sure inlining is that difficult. Java bytecode is a relatively
straightforward language is it not? Especially if it doesn't have to be for
all cases, which, if you're looking at eliminating small methods, it does not
have to be.

Assuming Java has something like the Cecil library for MSIL, it should be a
fairly straightforward task. Although, as you point out, so is fixing up a
number.

~~~
zurn
In order to elide the out-of-line method the compiler must statically prove
that it is impossible call outside the visibility of its optimization scope.

I'm not familiar with Java compilers but I'd hazard a guess that to exploit
this optimization (if it were possible at all) they would probably have to
significantly rework their code.

~~~
MichaelGG
For a shipped app it shouldn't be too hard to perform it across all packages
involved, right?

~~~
zurn
The compiler would have to see all code that is linked with the app, including
the Android runtime that calls into the app's code in various ways.

In Java land these kinds of fancy optimizations usually happen in the JVM JIT
at runtime. Would this help the size of those tables in Android's
Dalvik/dexopt setup? In any case you can't ship a custom version of Dalvik
with your app.

~~~
MichaelGG
Maybe I'm not explaining well.

You've got class foo, with methods "DoBar" which calls "DoBazStep1(),
DoBazStep2(), SomeHelper.CalcPosition()".

You use a Java bytecode toolkit and identify leaf functions, small functions,
whatever the criteria needs to be. Then you check every callsite for those
functions, and if found, remove the call, and inline the code, doing fixup on
locals and parameters. Now, "DoBar" has a lot more bytecode in it, as the
bodies of DoBazStep1 and the others are now contained in it. The small
functions are completely removed.

This process can be done totally offline, on the compiled Java code. The
runtime just sees that you wrote one big function, versus a bunch of small
ones.

The only real work is making sure you got everything, and perhaps being clever
with stack local allocations. As the FB team says they moved to a new style
with lots of small functions, I'm assuming they aren't doing all sorts of
metaprogramming and inheritance, but have just split things up at the function
level.

------
cek
It's interesting to see the parallels between Android and Windows in this
regard.

Here's a story from my past, copied from a blog post I wrote last year:

My first job at Microsoft was providing developer support for the early
Windows SDKs. To do my job well, I spent hours studying the Windows SDK
documentation, the Windows source code, and writing sample applications. I
then spent hours poring over customers’ (such as Lotus, WordPerfect, and
Computer Associates) code helping them figure out what was not working.

This gave me a deep appreciation for API design early in my career. I saw
clear examples of elegant APIs as well as horrific monstrosities. I once
created a sample for selecting a font in GDI that I called “ReallyCreateFont”
because both of the built-in Windows APIs (CreateFont() and CreateFontEx())
were basically broken.

The CreateFont case taught me first hand the pain resulting from exposing an
API publicly, having customers use it in ways not expected, and then having to
do unnatural acts to avoid breaking backwards compatibility.

I was supporting WordPerfect’s development of WordPerfect for Windows. This
was sometime in early 1991 if I remember correctly. I had the WordPerfect
engineer on the phone. He had given me access to the WordPerfect source code,
but I couldn’t give him access to the Windows code. So we were each stepping
through WordPerfect in the debugger, him in Utah and me in Bellevue. I could
see what his code was doing and what Windows was doing, but he could only see
disassembly for Windows.

The problem he was seeing had to do with font rendering. In the debugger we
stepped into CreateFontEx(), which calls into CreateFont(), which calls into
some internal GDI function, which calls into the device driver’s Escape()
function (I can’t believe I actually remember all this crap like it was
yesterday). Somewhere in this call stack I came across a block of code in
Windows with comments that read something like

    
    
        // This is a hack to deal with Adobe’s Type Manager.
        // ATM *injects code* into GDI replacing our code with their own.
        // In order to retain backwards compatability we detect
        // ATM’s attempt to do this, let it do it, and then trick it
        // into thinking it actually hacked us this way.
    

I am not making this up (although that comment is paraphrased from my memory).

It turns out that the way WordPerfect was using CreateFontEx() was reasonable,
but pushing the envelope of what anyone had done before, effectively exposing
a bug caused not by the original API design or implementation, but something
the API provider had to do to ensure app compatibility because of something
another 3rd party had done!

Holy-shit! Let me get this straight:

* A 3rd party app (Adobe Type Manager) used CreateFontEx() in a way the API designer failed to anticipate.

* The API designer needed to retain backwards compatibility so it had to resort to putting 3rd party app specific code in the API implementation.

* Another 3rd party comes along and the complexity of the ‘fix’ caused another bug to surface.

Welcome to the world of a true virtuous platform...

~~~
igravious
Parallels?

> I had the WordPerfect engineer on the phone. He had given me access to the
> WordPerfect source code, but _I couldn’t give him access to the Windows
> code_. So we were each stepping through WordPerfect in the debugger, him in
> Utah and me in Bellevue. I could see what his code was doing and what
> Windows was doing, but _he could only see disassembly for Windows_.

FTA: Instead, we needed to inject our secondary dex files directly into the
system class loader. This isn't normally possible, but _we examined the
Android source code_ and used Java reflection to directly modify some of its
internal structures. We were _certainly glad and grateful that Android is open
source_ —otherwise, this change wouldn’t have been possible.

Facebook are not saying that Android/Dalvik has a broken API. They are saying
that that due to the size of their app they hit an interna limit, this buffer
limit increased between platform versions and they needed to create a robust
hack to get their code working on older platforms, They were able to do this
by taking advantage of the open-source nature of the platform and they did not
need a Google engineer on hand to help them though they did get to use a
Google test lab (which is not fair on smaller companies). Also Google have not
inserted code into Android/Dalvik to deal with the Facebook trickery.

~~~
cek
As far as I know Google has not inserted code to deal with backwards compat.
Yet.

------
chuinard
_"choose between cutting significant features from the app or only shipping
our new version to the newest Android phones (ICS and up)"_

The ICS move wouldn't have been a bad option. It would've helped a lot of
users convert to ICS+ devices, which Google probably would've loved.

The APIs from 4+ are much more robust than pre-ICS, and it probably would've
saved more headaches while building the new version in addition to the one
already described in the post.

~~~
chucknthem
Android 2.2 and 2.3 devices are still being released on low end phones,
especially in developing countries, so the number of pre-ICS users are
actually growing, not shrinking.

~~~
psionski
Bought one a year ago (2.3.3, I think). It's a Wildfire S, very small and
lightweight, has a web browser. Can make calls and probably can send short
messages. I was using Windows Mobile 2003 before that and thought I was
finally on the "bleeding edge" of mobile technology, only to find out a month
later that Android 2.3 is considered "legacy" and they are talking about not
supporting it? What, are we supposed to buy new phones every five years now?

~~~
dhugiaskmak

      are we supposed to buy new phones every five years now?
    

No. Every two.

------
cbsmith
Honestly, even on my far more powerful Nexus 4, running modern Jelly Bean, the
Facebook app is worse than the Web app. I think their problems are way bigger
than they appreciate.

------
pbiggar
This seems like it will paint the Android core developers into a corner,
supporting this particular hack for years to come, like the way Microsoft had
to for Windows. If I were android core developers, this would not be a good
day.

~~~
kijiki
This hack is specific to ancient versions of Android, and is not used on
current or future versions. So there is no need to deal with it in any
codebase that isn't completely dead.

~~~
objclxt
> _"specific to ancient versions of Android"_

Ancient _but still shipping_. I can go into a store and pick up a Samsung
Galaxy Ace running 2.3 today. A significant minority of 2.3 based phones are
still being sold, especially for budget/Pay As You Go users.

This hack will need to remain in the codebase for a significant time.

------
vidoc
Is this me or Facebook tends to self-celebrate itself in most of its tech blog
posts? I remember them posting a similar piece a few years back on their
memcached scaling hacks - the article had a very similar tone, it was equally
modest (e.g: not), and was actually nothing but software engineering imposture
aimed at impressing a non-tech audience.

------
ajtaylor
I can't tell if I should be excited for the developers to have been so clever,
or horrified with all the hacks they had to incorporate to make it work.
Either way, the new FB app is definitely an improvement over the older version
(lag central!). Is it reasonable for the OS to make such a low-level component
configurable so this type of convolutions aren't required?

~~~
aredington
You should be angry that they have managed to persuade you that theirs was the
only way out. The odds that properly modularizing their source tree such that
they didn't provoke the limitations of the VM they were running on was
impossible are vanishingly small, and the odds that they correctly hacked the
VM are also vanishingly small.

Facebook's Android Engineers chose the ego stroking solution which involved
clever hacks. This only reinforces my being happy to have uninstalled the FB
android app 2 years ago.

------
imrehg
_"We were certainly glad and grateful that Android is open source—otherwise,
this change wouldn’t have been possible."_

In return, maybe you could open-source some of your app, for example?

~~~
nuclear_eclipse
We do already open source a wide variety of projects, for those who aren't
already aware: <https://github.com/facebook>

~~~
ansible
We use flashcache on one of our build servers. It has been good so far.
Thanks!

------
buster
From my experience the FB app is one of the worst apps on my phone. The
messaging notifications are crap, the reloading of my feed is crap (aka
doesn't load notifications or in an unpredeictable manner). I have much more
complex apps on my phone which show no such behaviour, yet it somehow is a
Dalvik problem? Mhhh... How many Twitter apps and FB apps are there that are
snappy and behave well ( = much better then the FB app!)..

------
gingerjoos
>>> It seemed like we would have to choose between cutting significant
features from the app or only shipping our new version to the newest Android
phones (ICS and up). Neither seemed acceptable. We needed a better solution.

 _This_. "There has to be a better solution", a quality which defines hackers
- is it not? It's that pushing the boundaries for the sake of getting things
done that drive progress, isn't it? It's tempting to take the easy route out.
In this case they could have justified supporting devices ICS up, Gingerbread
and down does not have a lot of market share I assume. But they chose to
persist.

It's that persistence that we need to cultivate to become successful hackers.

~~~
mbetter
Problem: You can't develop your film outside at night because the moon is too
bright.

Solution: Destroy the moon.

------
thefreeman
Should we be worried that they are able to override things in the OS like
this? I assume there is no built in "permission" prompt for this type of
access.

~~~
julianz
It's not in the OS, it's in the individual VM that Facebook is running in.
Every Android app gets it's own VM - if they mess it up badly enough then
it'll just crash.

------
yeureka
I don't get it. FB is so proud of "hiring the best engineers" and then they
have to resort to this kind of hacks to run their app?

------
navpatel
This issue has been a colossal pain in the ass! I need this! Are they planning
on releasing this code to replace the buffer?

~~~
klewelling
I am working on a product that may solve this problem for you. The product is
designed for a different purpose but it should work for you as well. The only
caveat would be the need to break your app into multiple pieces that run in
separate processes. Give me a shout at kenneth@inappsquared.com if you are
interested.

------
jcfrei
it's great for employees at facebook that the company allows their engineers
such tinkering (to an absolute extreme in this point). but I think fb
engineers should work on bigger problems than hacking the dalvik vm - like
stopping the erosion of their younger user base.

------
exabrial
...too bad they reinvented the wheel with Dalvik. It'd be awesome to be
running this stuff under openjdk or hotspot

