I can't believe an app requires 8M RAM just for method names! That's 3% of my entire usable memory (what's left after Motorola Blur takes its share) -- just for the method names for a single app. Surely I must be misunderstanding what goes into that buffer. It's unimaginable to me that they really need such a vast amount of space. It's unimaginable that they really expect their app to work at all on an old phone without choking it to death.
If I grasp this correctly, printing just the method names would create a book double the size of the King James Version of the Bible. Maybe you are going to argue having so many methods is good programming practice, not bloat. But don't argue that you "support" a system when just the method names total around 3% of usable memory for that system. And there are a lot of devices with considerably less memory than what I've got.
Why don't you release a stripped down, "lite" version of the app that works on an Android 2 phone even if it has more than 5 applications installed? Some phones don't even allow uninstalling facebook. They are simply stuck with the bloat (I mean, good programming practices :-).
It would be interesting to know what it is doing to be that complex. I'm guessing lots of third party libraries for ads, logging, etc?
They clearly say that "this idea seemed completely insane" so it's not like they took this road lightly. They searched for a solution and became rather desperate. I wonder if they asked Google for help.
Having said that I'm not sure I would be frank about such a crazy hack for a slow behemoth app that is only considered "fast" because the previous version was insultingly slow.
1. They bloated the app so badly, they would have had to monkey patch the OS to let it run at all.
2. They actually did have the stonkingly huge arrogance to monkey patch the OS.
What they should have done is shattered it down into a constellation of related apps and background processes. This app is FB news, that app is FB messaging, and so on. Bonus: people could pick and choose the parts of the system they cared for.
But that last is what they wanted to avoid, I think. Facebook à la carte means that they can't foist unwanted "services" on you.
And so they hacked.
Wisdom of a three-million-method app aside, I wonder why they didn't fix this in the compilation toolchain rather than by poking around in native memory areas during app startup. Facebook wrote a PHP->C++ compiler; changing the Dalvik compiler to more aggressively inline small methods seems trivial in comparison.
"Even though I disagree with most of your comments (to the extent that it is any more than unrelated editorializing) and everyone talking about "too many methods" etc, the actual bug that was triggered is here and provides a test case: https://code.google.com/p/android/issues/detail?id=22586 showing that multiple interface inheritance triggers this easily."
Issue 22586 - android - Dexopt fails with "LinearAlloc exceeded" for deep interface hierarchies
I have a small background in static analysis of Java, and between reflection, the class loader, and the huge amount of polymorphism that you find in java, it is very very difficult to inline basically anything statically.
A favourite of mine is Jari Komppa's article on his process of porting Death Rally from DOS to Windows: http://sol.gfxile.net/dr_gdm.html.
Does anyone have any other articles along the same lines? I'd love to read them!
And the article links to other promising articles at the top, though I haven't checked them out.
The general question remains: how would we go about finding similar stuff?
Oh, one more thing, if you don't know yet, you will find great gems in Raymond Chen's blog, http://blogs.msdn.com/b/oldnewthing/
Not everything is archeological (a lot is still interesting though), but the articles often link to a whole bunch of other articles, which in turn do the same. I suggest, for example, http://blogs.msdn.com/b/oldnewthing/archive/2012/06/29/10325... . I deliberately chose a later article in the series, since it also links to previous ones.
The limitation in older Dalvik VM's is 64K Methods and not 1 Million methods which is realistic count that a large app would hit (even decent sized app with high number of third party libraries).
This limitation is known for a while and Google Engineers talked about it multiple times.
Then I was going to say "I hope they move towards a more permanent solution" when again my inner pragmatist popped up and said "Yes, they did; unsupport Android 2 in the future." So... yeah.. nasty, nasty hack, probably an enormous win almost all the way around. So the wheel turns.
Also, inlining is probably easier said than done. It's not enough to inline the methods, you need to make them entirely disappear, and ISTR reading within the last few days somebody else commenting that Java still has to keep the metadata about the methods around (which is the problematic part, not the methods themselves, if I'm reading this right) because reflection may demand them. You'd need something more sophisticated to do it at the source level, and with an imperative language with unrestricted side effects, while that is certainly possible, it's also very much easier said than done. That's not a weekend hack either.
Assuming Java has something like the Cecil library for MSIL, it should be a fairly straightforward task. Although, as you point out, so is fixing up a number.
I'm not familiar with Java compilers but I'd hazard a guess that to exploit this optimization (if it were possible at all) they would probably have to significantly rework their code.
Actually, even in plain, reflection-less java it may be quite complicated because subclass methods can be invoked by superclass-typed variables. For every method invocation you need to find all possible subclasses whose objects can be assigned to variable on which the method is invoked. You need to know which subclasses are ever passed as superclass/interface arguments to methods. You need to know which subclasses are assigned to superclass/interface member/static variables. You need to know which subclasses are stored in superclass collections. And so on.
In Java land these kinds of fancy optimizations usually happen in the JVM JIT at runtime. Would this help the size of those tables in Android's Dalvik/dexopt setup? In any case you can't ship a custom version of Dalvik with your app.
You've got class foo, with methods "DoBar" which calls "DoBazStep1(), DoBazStep2(), SomeHelper.CalcPosition()".
You use a Java bytecode toolkit and identify leaf functions, small functions, whatever the criteria needs to be. Then you check every callsite for those functions, and if found, remove the call, and inline the code, doing fixup on locals and parameters. Now, "DoBar" has a lot more bytecode in it, as the bodies of DoBazStep1 and the others are now contained in it. The small functions are completely removed.
This process can be done totally offline, on the compiled Java code. The runtime just sees that you wrote one big function, versus a bunch of small ones.
The only real work is making sure you got everything, and perhaps being clever with stack local allocations. As the FB team says they moved to a new style with lots of small functions, I'm assuming they aren't doing all sorts of metaprogramming and inheritance, but have just split things up at the function level.
Here's a story from my past, copied from a blog post I wrote last year:
My first job at Microsoft was providing developer support for the early Windows SDKs. To do my job well, I spent hours studying the Windows SDK documentation, the Windows source code, and writing sample applications. I then spent hours poring over customers’ (such as Lotus, WordPerfect, and Computer Associates) code helping them figure out what was not working.
This gave me a deep appreciation for API design early in my career. I saw clear examples of elegant APIs as well as horrific monstrosities. I once created a sample for selecting a font in GDI that I called “ReallyCreateFont” because both of the built-in Windows APIs (CreateFont() and CreateFontEx()) were basically broken.
The CreateFont case taught me first hand the pain resulting from exposing an API publicly, having customers use it in ways not expected, and then having to do unnatural acts to avoid breaking backwards compatibility.
I was supporting WordPerfect’s development of WordPerfect for Windows. This was sometime in early 1991 if I remember correctly. I had the WordPerfect engineer on the phone. He had given me access to the WordPerfect source code, but I couldn’t give him access to the Windows code. So we were each stepping through WordPerfect in the debugger, him in Utah and me in Bellevue. I could see what his code was doing and what Windows was doing, but he could only see disassembly for Windows.
The problem he was seeing had to do with font rendering. In the debugger we stepped into CreateFontEx(), which calls into CreateFont(), which calls into some internal GDI function, which calls into the device driver’s Escape() function (I can’t believe I actually remember all this crap like it was yesterday). Somewhere in this call stack I came across a block of code in Windows with comments that read something like
// This is a hack to deal with Adobe’s Type Manager.
// ATM *injects code* into GDI replacing our code with their own.
// In order to retain backwards compatability we detect
// ATM’s attempt to do this, let it do it, and then trick it
// into thinking it actually hacked us this way.
It turns out that the way WordPerfect was using CreateFontEx() was reasonable, but pushing the envelope of what anyone had done before, effectively exposing a bug caused not by the original API design or implementation, but something the API provider had to do to ensure app compatibility because of something another 3rd party had done!
Holy-shit! Let me get this straight:
* A 3rd party app (Adobe Type Manager) used CreateFontEx() in a way the API designer failed to anticipate.
* The API designer needed to retain backwards compatibility so it had to resort to putting 3rd party app specific code in the API implementation.
* Another 3rd party comes along and the complexity of the ‘fix’ caused another bug to surface.
Welcome to the world of a true virtuous platform...
One day, I was tracing through the OS code that handled a context switch and was surprised to find a bit of code that looked up the 32-bit creator code of the current application and compared it to 'WORD'. What the heck is this? I wondered. Turns out it was a hack added by the OS engineers at Apple to keep some other hacks in Microsoft Word working. If I recall correctly, they had to determine if Word was getting switched in or out so they could enable or disable the necessary hacks. So, in effect, they had a hack that would live-hack the OS to add hacks for Word's hacks and then live-unhack the hacks for everything else.
I mean, when you're hacking the system to hack the system on behalf of someone who already hacked the system but for an older version of the system that's no longer there, now you're doing some hacking.
Oh, yeah. And then I basically had to add my hacks to that mess.
> I had the WordPerfect engineer on the phone. He had given me access to the WordPerfect source code, but I couldn’t give him access to the Windows code. So we were each stepping through WordPerfect in the debugger, him in Utah and me in Bellevue. I could see what his code was doing and what Windows was doing, but he could only see disassembly for Windows.
FTA: Instead, we needed to inject our secondary dex files directly into the system class loader. This isn't normally possible, but we examined the Android source code and used Java reflection to directly modify some of its internal structures. We were certainly glad and grateful that Android is open source —otherwise, this change wouldn’t have been possible.
Facebook are not saying that Android/Dalvik has a broken API. They are saying that that due to the size of their app they hit an interna limit, this buffer limit increased between platform versions and they needed to create a robust hack to get their code working on older platforms, They were able to do this by taking advantage of the open-source nature of the platform and they did not need a Google engineer on hand to help them though they did get to use a Google test lab (which is not fair on smaller companies). Also Google have not inserted code into Android/Dalvik to deal with the Facebook trickery.
If the documentation has to go off and spend much verbiage on side cases, something's wrong. There's an imbalance in the API that's reflected in the documentation.
If the first few sentences of the documentation read like committeework, or sound like the explanation of the explanation, then something's wrong. The API is not going to be compatible with its users.
If the documentation talks more about specific use-cases instead of classes of behaviour, then something's wrong.
IF the documentation tells you to do this, then that, then that, then something's wrong. It should talk about useful, describable atoms, not things that make sense only as a sequence.
There are a few more.
All this doesn't help to improve an API before release, at least not directly. But reading the documentation makes it easy to detect API brokenness while it can still be unfucked.
Having always-connected clients and ability to force-update the code changes the game. I wonder if the backwards compatibility culture is still prevalent in Microsoft Windows division, provided they could always push a fix via Windows Update, or tell Adobe, etc. to fix their stuff.
This list usually only grows longer (and includes older versions of MS' own software too). There's also plenty of software where the vendor isn't around anymore which is still widely used. You can't just tell them to fix their code because they don't exist anymore. Raymond Chen oftentimes has to argue that part and given that he worked on application compatibility on Windows for quite some time I'm inclined to believe him.
If this observation is incorrect, write a comment so the rest of us can understand WHY it's incorrect, if it's correct then upvote it, but there's no excuse to downvote!
So Linus had to add a hack to unbreak this other application's hack to work around a kernel bug that no longer existed: http://lwn.net/Articles/494993/
Advantages of open source may not apply to closed source.
Unless you believe all software must be open source ( which is hopelessly unrealistic ) the question becomes how we can bring about most openness. Surely, closed software on an open platform is better than open software on a closed platform? But to do stuff like this takes a lot of resources. Group of volunteers may not have the resources to do so on their spare time. So we may all be better off because we have good engineers highly paid and with lots of resources to work on such an important platform as the OS.
TL;DR: Reading the comment fully recommended before snarking.
But let's address the original point. Suppose I want to run Steam on Linux. Some bug in Steam requires me to put an ugly hack in the kernel to address it. +1 ugly hack in the kernel, the same as it would be on Windows. Meanwhile in a hundred other cases where the thing with the bug was open source, I can fix it on the other end and it doesn't require an ugly hack to be put in the kernel. So the Linux kernel gets one ugly hack while the Windows kernel gets a hundred and one.
It doesn't stop being an advantage just because there are exceptions to the rule.
EDIT: or cek, Charlie Kindel :)
I seem to recall the WP guy just backing out his approach and going another way. Often times in dev support situations like that, we didn't get to help with the solution: Just the identification of the problem.
The ICS move wouldn't have been a bad option. It would've helped a lot of users convert to ICS+ devices, which Google probably would've loved.
The APIs from 4+ are much more robust than pre-ICS, and it probably would've saved more headaches while building the new version in addition to the one already described in the post.
Maybe in a couple of years, but not yet. Pre-ICS Android still has more than 50% of Android's share.
are we supposed to buy new phones every five years now?
Ancient but still shipping. I can go into a store and pick up a Samsung Galaxy Ace running 2.3 today. A significant minority of 2.3 based phones are still being sold, especially for budget/Pay As You Go users.
This hack will need to remain in the codebase for a significant time.
Facebook's Android Engineers chose the ego stroking solution which involved clever hacks. This only reinforces my being happy to have uninstalled the FB android app 2 years ago.
In return, maybe you could open-source some of your app, for example?
This. "There has to be a better solution", a quality which defines hackers - is it not? It's that pushing the boundaries for the sake of getting things done that drive progress, isn't it? It's tempting to take the easy route out. In this case they could have justified supporting devices ICS up, Gingerbread and down does not have a lot of market share I assume. But they chose to persist.
It's that persistence that we need to cultivate to become successful hackers.
Solution: Destroy the moon.