It's interesting to see the parallels between Android and Windows in this regard.
Here's a story from my past, copied from a blog post I wrote last year:
My first job at Microsoft was providing developer support for the early Windows SDKs. To do my job well, I spent hours studying the Windows SDK documentation, the Windows source code, and writing sample applications. I then spent hours poring over customers’ (such as Lotus, WordPerfect, and Computer Associates) code helping them figure out what was not working.
This gave me a deep appreciation for API design early in my career. I saw clear examples of elegant APIs as well as horrific monstrosities. I once created a sample for selecting a font in GDI that I called “ReallyCreateFont” because both of the built-in Windows APIs (CreateFont() and CreateFontEx()) were basically broken.
The CreateFont case taught me first hand the pain resulting from exposing an API publicly, having customers use it in ways not expected, and then having to do unnatural acts to avoid breaking backwards compatibility.
I was supporting WordPerfect’s development of WordPerfect for Windows. This was sometime in early 1991 if I remember correctly. I had the WordPerfect engineer on the phone. He had given me access to the WordPerfect source code, but I couldn’t give him access to the Windows code. So we were each stepping through WordPerfect in the debugger, him in Utah and me in Bellevue. I could see what his code was doing and what Windows was doing, but he could only see disassembly for Windows.
The problem he was seeing had to do with font rendering. In the debugger we stepped into CreateFontEx(), which calls into CreateFont(), which calls into some internal GDI function, which calls into the device driver’s Escape() function (I can’t believe I actually remember all this crap like it was yesterday). Somewhere in this call stack I came across a block of code in Windows with comments that read something like
// This is a hack to deal with Adobe’s Type Manager.
// ATM *injects code* into GDI replacing our code with their own.
// In order to retain backwards compatability we detect
// ATM’s attempt to do this, let it do it, and then trick it
// into thinking it actually hacked us this way.
I am not making this up (although that comment is paraphrased from my memory).
It turns out that the way WordPerfect was using CreateFontEx() was reasonable, but pushing the envelope of what anyone had done before, effectively exposing a bug caused not by the original API design or implementation, but something the API provider had to do to ensure app compatibility because of something another 3rd party had done!
Holy-shit! Let me get this straight:
* A 3rd party app (Adobe Type Manager) used CreateFontEx() in a way the API designer failed to anticipate.
* The API designer needed to retain backwards compatibility so it had to resort to putting 3rd party app specific code in the API implementation.
* Another 3rd party comes along and the complexity of the ‘fix’ caused another bug to surface.
Welcome to the world of a true virtuous platform...
Funny, mirror-image story. In the early 1990s I was writing utility software for Macintosh, then on the 68K platform. The utility basically had to patch the OS as it was running, a common practice back then. To figure out where to place the patches, I spent lots of time looking at disassembled 68K code in MacsBug, stepping through the OS as it ran.
One day, I was tracing through the OS code that handled a context switch and was surprised to find a bit of code that looked up the 32-bit creator code of the current application and compared it to 'WORD'. What the heck is this? I wondered. Turns out it was a hack added by the OS engineers at Apple to keep some other hacks in Microsoft Word working. If I recall correctly, they had to determine if Word was getting switched in or out so they could enable or disable the necessary hacks. So, in effect, they had a hack that would live-hack the OS to add hacks for Word's hacks and then live-unhack the hacks for everything else.
I mean, when you're hacking the system to hack the system on behalf of someone who already hacked the system but for an older version of the system that's no longer there, now you're doing some hacking.
Oh, yeah. And then I basically had to add my hacks to that mess.
I don't have any similar story to share ([un]fortunately?), but I have to say that parent and grandparent comments are what make HN that interesting and it's the reason why comments are often more valuable than the articles/stories they comment (and why I sometimes read comments without even checking the OP link at all). Thank you for sharing your tidbits, cek and tmoertel!
> I had the WordPerfect engineer on the phone. He had given me access to the WordPerfect source code, but I couldn’t give him access to the Windows code. So we were each stepping through WordPerfect in the debugger, him in Utah and me in Bellevue. I could see what his code was doing and what Windows was doing, but he could only see disassembly for Windows.
FTA: Instead, we needed to inject our secondary dex files directly into the system class loader. This isn't normally possible, but we examined the Android source code and used Java reflection to directly modify some of its internal structures. We were certainly glad and grateful that Android is open source —otherwise, this change wouldn’t have been possible.
Facebook are not saying that Android/Dalvik has a broken API. They are saying that that due to the size of their app they hit an interna limit, this buffer limit increased between platform versions and they needed to create a robust hack to get their code working on older platforms, They were able to do this by taking advantage of the open-source nature of the platform and they did not need a Google engineer on hand to help them though they did get to use a Google test lab (which is not fair on smaller companies). Also Google have not inserted code into Android/Dalvik to deal with the Facebook trickery.
I think it's also worth noting that this kind of evil hack is a lot safer when you're only targeting older OS releases. That essentially gives you a fixed target. Changes in a new release won't break your hack, because your hack doesn't apply to new releases anyway. It's still awful, but much less scary.
Came here to opine on the exact same topic! I was going to dig up some Raymond Chen blog entries but your CreateFont example illustrates it equally well. I guess hacks are inevitable when you are dealing with that kind of scale/complexity.
It's less of a scale issue and more of an ease of update issue. Back in the days someone who bought a copy of Windows would take it home, and if it broke some existing software they used, they were going to blame the OS and return it to the store (they surely are not returning WordPerfect or Adobe product that they bought eons ago).
Having always-connected clients and ability to force-update the code changes the game. I wonder if the backwards compatibility culture is still prevalent in Microsoft Windows division, provided they could always push a fix via Windows Update, or tell Adobe, etc. to fix their stuff.
Windows Vista shipped with around 6000 fixes for applications, ranging from "this is a game, add it to the Games Explorer", over "report the Windows version as XP because they broke their checks" to "let HeapAlloc allocate a few more bytes each time because they overrun a buffer at some point". The nice thing about those shims is that you don't have to sully your API design with individual hacks and fixes for dozens, hundreds or thousands of applications. You just enable certain behaviour externally which then replaces the API function by a wrapper (or another function entirely) for that process.
This list usually only grows longer (and includes older versions of MS' own software too). There's also plenty of software where the vendor isn't around anymore which is still widely used. You can't just tell them to fix their code because they don't exist anymore. Raymond Chen oftentimes has to argue that part and given that he worked on application compatibility on Windows for quite some time I'm inclined to believe him.
As soon as you mentioned "Adobe Type Manager" I remembered reading about this case some time ago. But knowing it was you that discovered this makes it all the better. If only we could create perfect APIs the first time around...
I seem to recall the WP guy just backing out his approach and going another way. Often times in dev support situations like that, we didn't get to help with the solution: Just the identification of the problem.
One of the advantages of open source is that things like that don't tend to happen. If a change to X breaks Y, the X maintainers don't include a horrible hack to avoid breaking it, they just submit a patch to Y. Then the package manager makes sure nobody installs new X without installing patched Y.
The flip side of this is that you have to keep up with breaking changes all the time. I've developed numerous in house apps that run on linux, and it is hard to keep up with every minute details that changes in all parts of such a system.
It is exceedingly hard to justify that you have to spend a day figuring out what they changed in gtk that now makes the text appear a bit different, or why the init.d script that has never had a problem now sometimes fail to stop the service etc.
ISVs, at least small ones, simply cannot keep up with the frequent external changes on such platforms, and the fewer people there are developing applications, in-house, open source, proprietary or what not, less people will use that platform.
Dunno - I think there was a fairly lovely hack a while back to deal with the fact that the autofs ABI for 32-bit applications on 64-bit kernels wasn't compatible with the ABI for 32-bit applications on 32-bit kernels, and this was breaking stuff. Someone inserted a hack to fix this, which then broke another application which had its own hack to work around the kernel bug.
Unless you believe all software must be open source ( which is hopelessly unrealistic ) the question becomes how we can bring about most openness. Surely, closed software on an open platform is better than open software on a closed platform? But to do stuff like this takes a lot of resources. Group of volunteers may not have the resources to do so on their spare time. So we may all be better off because we have good engineers highly paid and with lots of resources to work on such an important platform as the OS.
TL;DR: Reading the comment fully recommended before snarking.
Let's put aside that what you're saying is a red herring. (RedHat makes a lot of money and to do it they pay professionals to develop free and open source software. Samsung, Intel, Google, IBM and Oracle make a lot of money and to do it they pay professionals to develop free and open source software. "Open source running on open source" is entirely practical and just because closed source software exists doesn't mean you can't refrain from using it, see Debian et al.)
But let's address the original point. Suppose I want to run Steam on Linux. Some bug in Steam requires me to put an ugly hack in the kernel to address it. +1 ugly hack in the kernel, the same as it would be on Windows. Meanwhile in a hundred other cases where the thing with the bug was open source, I can fix it on the other end and it doesn't require an ugly hack to be put in the kernel. So the Linux kernel gets one ugly hack while the Windows kernel gets a hundred and one.
It doesn't stop being an advantage just because there are exceptions to the rule.
Uninstalled. I checked, and Facebook took up more memory than any other application I have installed, even more than TSF Shell. (By the way, a MUCH more "feature-rich," useful, and snappy app than facebook!) Facebook isn't doing anything comparable to what TSF Shell does, not for me anyway. I bet this is why my phone recently started taking 30 seconds to make a phonecall (!).
I can't believe an app requires 8M RAM just for method names! That's 3% of my entire usable memory (what's left after Motorola Blur takes its share) -- just for the method names for a single app. Surely I must be misunderstanding what goes into that buffer. It's unimaginable to me that they really need such a vast amount of space. It's unimaginable that they really expect their app to work at all on an old phone without choking it to death.
If I grasp this correctly, printing just the method names would create a book double the size of the King James Version of the Bible. Maybe you are going to argue having so many methods is good programming practice, not bloat. But don't argue that you "support" a system when just the method names total around 3% of usable memory for that system. And there are a lot of devices with considerably less memory than what I've got.
Why don't you release a stripped down, "lite" version of the app that works on an Android 2 phone even if it has more than 5 applications installed? Some phones don't even allow uninstalling facebook. They are simply stuck with the bloat (I mean, good programming practices :-).
Yes, I'm in the same boat. On my Android 2.2 phone (HTC Legend) it is hard to fit some current apps and facebook is a system app so I can't uninstall it. However I can uninstall the updates, so my facebook is the original factory version. I won't be trying out this latest version.
It would be interesting to know what it is doing to be that complex. I'm guessing lots of third party libraries for ads, logging, etc?
The only reason to root my old HTC Desire that I still use happily was to be able to uninstall Facebook, Weather, Twitter and a few more apps I didn't need. Needless to say, the phone suddenly felt much snappier and my battery life doubled.
Wisdom of a three-million-method app aside, I wonder why they didn't fix this in the compilation toolchain rather than by poking around in native memory areas during app startup. Facebook wrote a PHP->C++ compiler; changing the Dalvik compiler to more aggressively inline small methods seems trivial in comparison.
Three million methods? Having both programmed for Android and having used the Facebook App for Android, I find it very difficult to believe that app has anywhere near 3 million methods, even if each one is basically a one-liner. If the number is correct, that really is some kind of crazytown code.
Still seems like a ridiculous number, though of course far less ridiculous especially for Java code with its legacy of setters and getters for even basic properties and FactoryOfFactoriesFactory classes.
I have a small background in static analysis of Java, and between reflection, the class loader, and the huge amount of polymorphism that you find in java, it is very very difficult to inline basically anything statically.
I'll just paste the same thing here that I wrote in response to commenters on FB misunderstanding the problem:
"Even though I disagree with most of your comments (to the extent that it is any more than unrelated editorializing) and everyone talking about "too many methods" etc, the actual bug that was triggered is here and provides a test case: https://code.google.com/p/android/issues/detail?id=22586 showing that multiple interface inheritance triggers this easily."
Issue 22586 - android - Dexopt fails with "LinearAlloc exceeded" for deep interface hierarchies
iirc, one the primary reasons they they wrote the compiler was to not have to deal with legacy php code. Is this some sort of pattern at FB? If the code doesn't work, let's fix the OS/compiler instead?
1. They bloated the app so badly, they would have had to monkey patch the OS to let it run at all.
2. They actually did have the stonkingly huge arrogance to monkey patch the OS.
What they should have done is shattered it down into a constellation of related apps and background processes. This app is FB news, that app is FB messaging, and so on. Bonus: people could pick and choose the parts of the system they cared for.
But that last is what they wanted to avoid, I think. Facebook à la carte means that they can't foist unwanted "services" on you.
I really love reading recounts like this. Seeing other developers try, fail, and try again to come up with creative solutions for difficult problems, and then finally arriving at a solution, is so satisfying. It really demonstrates how creativity is manifested in programming, and is truly inspiring.
Not everything is archeological (a lot is still interesting though), but the articles often link to a whole bunch of other articles, which in turn do the same. I suggest, for example, http://blogs.msdn.com/b/oldnewthing/archive/2012/06/29/10325... . I deliberately chose a later article in the series, since it also links to previous ones.
Certainly if there's a lot of small methods, they could have an optimization pass that inlines some of them and bring their overall method count down. I wouldn't be surprised if that was a small performance gain, too.
I thought of that, then the pragmatic engineer in me observed that they only discovered this after cracking their app into so many small methods. Given the time choice of 1. uncracking their app into small methods (which presumably wasn't atomic, so now that's all mixed in with other development, so it's not like it can just be reverted) or 2. hacking the system to handle more methods, I am forced to admit that choice 2 is two or possibly three orders of magnitude faster, in man-hours. (Probably two or three of their best engineers for a couple of weeks, vs. "the entire team", from dev to QA, tasked for months, on work that results in no new features). Not better in all possible ways, but it's hard to justify "do it right" against that sort of cost/benefit ratio.
Then I was going to say "I hope they move towards a more permanent solution" when again my inner pragmatist popped up and said "Yes, they did; unsupport Android 2 in the future." So... yeah.. nasty, nasty hack, probably an enormous win almost all the way around. So the wheel turns.
For as nasty as the hack is, it still boils down to twiddling with a few numbers.
Also, inlining is probably easier said than done. It's not enough to inline the methods, you need to make them entirely disappear, and ISTR reading within the last few days somebody else commenting that Java still has to keep the metadata about the methods around (which is the problematic part, not the methods themselves, if I'm reading this right) because reflection may demand them. You'd need something more sophisticated to do it at the source level, and with an imperative language with unrestricted side effects, while that is certainly possible, it's also very much easier said than done. That's not a weekend hack either.
I'm not sure inlining is that difficult. Java bytecode is a relatively straightforward language is it not? Especially if it doesn't have to be for all cases, which, if you're looking at eliminating small methods, it does not have to be.
Assuming Java has something like the Cecil library for MSIL, it should be a fairly straightforward task. Although, as you point out, so is fixing up a number.
It becomes hard when the app uses reflection. When you want to remove some method foo, you not only have to make sure that it isn't referenced by some code, but also make sure that no code is passing the string "foo" to reflection API. This may be hard for many apps and is undecidable in general.
Actually, even in plain, reflection-less java it may be quite complicated because subclass methods can be invoked by superclass-typed variables. For every method invocation you need to find all possible subclasses whose objects can be assigned to variable on which the method is invoked. You need to know which subclasses are ever passed as superclass/interface arguments to methods. You need to know which subclasses are assigned to superclass/interface member/static variables. You need to know which subclasses are stored in superclass collections. And so on.
The compiler would have to see all code that is linked with the app, including the Android runtime that calls into the app's code in various ways.
In Java land these kinds of fancy optimizations usually happen in the JVM JIT at runtime. Would this help the size of those tables in Android's Dalvik/dexopt setup? In any case you can't ship a custom version of Dalvik with your app.
You've got class foo, with methods "DoBar" which calls "DoBazStep1(), DoBazStep2(), SomeHelper.CalcPosition()".
You use a Java bytecode toolkit and identify leaf functions, small functions, whatever the criteria needs to be. Then you check every callsite for those functions, and if found, remove the call, and inline the code, doing fixup on locals and parameters. Now, "DoBar" has a lot more bytecode in it, as the bodies of DoBazStep1 and the others are now contained in it. The small functions are completely removed.
This process can be done totally offline, on the compiled Java code. The runtime just sees that you wrote one big function, versus a bunch of small ones.
The only real work is making sure you got everything, and perhaps being clever with stack local allocations. As the FB team says they moved to a new style with lots of small functions, I'm assuming they aren't doing all sorts of metaprogramming and inheritance, but have just split things up at the function level.
Bought one a year ago (2.3.3, I think). It's a Wildfire S, very small and lightweight, has a web browser. Can make calls and probably can send short messages. I was using Windows Mobile 2003 before that and thought I was finally on the "bleeding edge" of mobile technology, only to find out a month later that Android 2.3 is considered "legacy" and they are talking about not supporting it? What, are we supposed to buy new phones every five years now?
This seems like it will paint the Android core developers into a corner, supporting this particular hack for years to come, like the way Microsoft had to for Windows. If I were android core developers, this would not be a good day.
I think the limitation is only on older versions, so they would probably only do the hack if it was pre Gingerbread. They mention that ICS works as expected, so I would be more than a bit mortified if they did this hack for new working versions too.
Ancient but still shipping. I can go into a store and pick up a Samsung Galaxy Ace running 2.3 today. A significant minority of 2.3 based phones are still being sold, especially for budget/Pay As You Go users.
This hack will need to remain in the codebase for a significant time.
In addition to what others said about this only applying to already dead versions of Android, this is one of the upsides of distributing apps online. Nobody is installing an old boxed copy of Facebook 2.0 - google can just ask facebook to remove the hack when they've fixed it, and instantly everyone who installs the app it has the fix, and for anyone else the update is a tap away. This wasn't possible with the hacks to keep old Windows/DOS apps running.
The first paragraph is just rampant self praise. Basically these engineers decided to be really clever, painted themselves into a corner, smashed a hole in the wall so as to escape from said corner, and then bragged about how great they are.
As a programmer if you feel clever, most of the time you're doing something wrong. There might be 1% of code that should or can be clever but I prefer my code to be robust.
They clearly say that "this idea seemed completely insane" so it's not like they took this road lightly. They searched for a solution and became rather desperate. I wonder if they asked Google for help.
Having said that I'm not sure I would be frank about such a crazy hack for a slow behemoth app that is only considered "fast" because the previous version was insultingly slow.
I can't tell if I should be excited for the developers to have been so clever, or horrified with all the hacks they had to incorporate to make it work. Either way, the new FB app is definitely an improvement over the older version (lag central!). Is it reasonable for the OS to make such a low-level component configurable so this type of convolutions aren't required?
You should be angry that they have managed to persuade you that theirs was the only way out. The odds that properly modularizing their source tree such that they didn't provoke the limitations of the VM they were running on was impossible are vanishingly small, and the odds that they correctly hacked the VM are also vanishingly small.
Facebook's Android Engineers chose the ego stroking solution which involved clever hacks. This only reinforces my being happy to have uninstalled the FB android app 2 years ago.
From my experience the FB app is one of the worst apps on my phone. The messaging notifications are crap, the reloading of my feed is crap (aka doesn't load notifications or in an unpredeictable manner). I have much more complex apps on my phone which show no such behaviour, yet it somehow is a Dalvik problem? Mhhh... How many Twitter apps and FB apps are there that are snappy and behave well ( = much better then the FB app!)..
Is this me or Facebook tends to self-celebrate itself in most of its tech blog posts? I remember them posting a similar piece a few years back on their memcached scaling hacks - the article had a very similar tone, it was equally modest (e.g: not), and was actually nothing but software engineering imposture aimed at impressing a non-tech audience.
>>> It seemed like we would have to choose between cutting significant features from the app or only shipping our new version to the newest Android phones (ICS and up). Neither seemed acceptable. We needed a better solution.
This. "There has to be a better solution", a quality which defines hackers - is it not? It's that pushing the boundaries for the sake of getting things done that drive progress, isn't it? It's tempting to take the easy route out. In this case they could have justified supporting devices ICS up, Gingerbread and down does not have a lot of market share I assume. But they chose to persist.
It's that persistence that we need to cultivate to become successful hackers.
it's great for employees at facebook that the company allows their engineers such tinkering (to an absolute extreme in this point). but I think fb engineers should work on bigger problems than hacking the dalvik vm - like stopping the erosion of their younger user base.
I am working on a product that may solve this problem for you. The product is designed for a different purpose but it should work for you as well. The only caveat would be the need to break your app into multiple pieces that run in separate processes. Give me a shout at email@example.com if you are interested.