Hacker News new | past | comments | ask | show | jobs | submit login
Saving Data: Reducing the Size of App Updates by 65% (android-developers.blogspot.com)
315 points by jor-el on Dec 7, 2016 | hide | past | web | favorite | 136 comments

It's a bit humbling to think that my "quick hack" is yielding bandwidth savings measured in petabytes per day.

I wonder how it translates to the money savings for Google and end users.

A quick estimate based on prices quoted by Google Cloud Engine for Egress transfers ($0.08/GB) gives $183 million in yearly savings.

Surely Google pays much less than that for their own transfers, but even if we assume they are paying 1% of that price it is still respectable. Pretty sure they are paying more than that though.

EDIT: Found better estimates - Amazon Cloudfront is charging as low as $0.02/GB (on demand pricing), which translates to $46 million. Given very good knowledge Google has about their users and their vast infrastructure I'm guessing their costs are fraction of this. Anyone with domain knowledge that can comment on this?

I'm very much not in the know (my personal experience tops out at renting gigabit lines at 95th percentile), but it's my understanding that at extreme scale, you're doing peering arrangements directly with ISPs, and provisioning a fixed amount of bandwidth with them.

It wouldn't surprise me if Google was scheduling the majority of updates to take place in off peak times, when the bandwidth would simply be otherwise unused. If so, the cost to Google would be a rounding error of your estimate based on GCE traffic prices.

For end users: Google is well aware of how many android phones are being used in the developing world. Something like 97% of the smartphone market in Myanmar is on android 4.2 and later. 1GB of data paid via scratch card is a lot of money for some people.

I just tried bsdiff, and can I complain a little? I tried to diff two 1 GB files, and within a few seconds bsdiff was using enough memory to lock up my Ubuntu desktop. 10 minutes later I managed to get a text console and kill it. Maybe I should blame Ubuntu for letting a single process hose the system, but it was still annoying.

Edit: bsdiff is documented to use memory 17x the file size.

> Maybe I should blame Ubuntu for letting a single process hose the system

Speaking of: is there a good way to ask an OS to "reserve memory and CPU-time" for emergency interactive maintenance (e.g. spawning a console on a thrashing box), in a similar way to how ext2/3/4 "reserves space" for the root user for emergency maintenance?

The only real way I can think of to do this (that doesn't involve already knowing what processes could thrash and cgroup'ing them) is to run a hypervisor and treat the dom0 as the emergency console. That still doesn't let you get inside the thrashing instance to fix it, though.

If you use systemd you can assign services to slices which can be assigned cgroup resource reservations / limits. I don't think any distro does too much with this out of the box yet (IIRC by default all services run under a single "system" slice, and user sessions under a user slice), but it should be possible to run an "emergency environment" slice in which the processes are always guaranteed some small amount of resources necessary to recover.

The LinkedIn people report that it's not that simple.

* https://engineering.linkedin.com/blog/2016/08/don_t-let-linu...

It won't help with memory usage but you can nice a process to reduce its share of time in the process scheduler.

isn't "Ctrl + Alt + F1" to F7 essentially that? (on linux)

Interesting, I had assumed this (runaway process and unable to do anything) only happens on Windows. (speaking in general)

This happens to me quite often on Linux and I've always wondered if there's a way to fix it. Is there?

Usually this happens on Linux because an application is using too much memory and the kernel is spending tons of resources reclaiming memory to prevent a system OOM.

The system can easily slow to a crawl as memory accesses and allocations are constantly triggering reclaim and causing the system to thrash. The kernel has no way to know that it should prioritize the terminal you are trying to open to kill things rather than try its hardest to keep the offending application running.

The best way to prevent this, if you know an application will use lots of memory, is to run it in a memory cgroup. With a memory cgroup you can limit the applications total allowed memory, reserving some for the rest of the system.

Unfortunately, I don't know of a super nice CLI, but here's an example using cgcreate/cgexec (https://packages.debian.org/stretch/cgroup-tools):

    $ # Create a memory cgroup named 'mygroup', allowing $USER to change the parameters and run tasks in the cgroup.
    $ sudo cgcreate -a $USER -t $USER -g memory:mygroup
    $ # Set the memory limit to 10MB
    $ echo 10000000 > /sys/fs/cgroup/memory/mygroup/memory.limit_in_bytes
    $ # Run a program in the cgroup
    $ cgexec -g memory:mygroup ls

> The kernel has no way to know that it should prioritize the terminal you are trying to open to kill things rather than try its hardest to keep the offending application running.

This is one of the problems multitask OS's were invented to solve. The answer is always let other tasks run too.

man ulimit?

ssh to the box then kill the process.

I've often been unable to connect to SSH on a host for lack of available CPU/memory to spawn the SSH receiver process + the shell.

It's especially a problem with AWS's tN.micro instances: when something thrashes them for long enough, they ratchet down to such a low CPU allocation that it will likely take hours before SSHD gets context-switched to enough times to start your process. By then, all sorts of timeouts will likely have killed your session.

That's worth making persistent noise about, I think. Because you just described something that can be accurately described as a type of DoS.

Considering the number of AWS credentials out there, I can see malicious users logging in, forkbombing the instance, getting it to throttle back, and then walking away whistling until the person with dashboard login wakes up and can reboot the machine.

> I just tried bsdiff, and can I complain a little?

It's HN. Do you even have to ask?

Without even opening this article, I knew that they would be using bsdiff.

Thanks, truly, for creating such a tremendously useful tool.

I don't want to diminish your work, but today's improvement is not in using bsdiff (which they already did), but in using it file-by-file. That's where bsdiff can really shine, because both inputs looks way more similar.

Sure, but running file-by-file is how bsdiff was designed to operate. (I wrote it for FreeBSD Update, which operates on a file-by-file basis, because I was paranoid about the problem I'd seen on Windows systems where an update was registered as "installed" but the system had not in fact been properly updated.)

Ah, that's where it got its name :)

Really awesome utility.

I believe it stands for Byte Subtraction Diff, or something like that. As I remember, something like Bentley McIlroy diff is used to identify identical runs, and where there are short runs of bytes separating two runs that are identical between the two files, the bytewise subtraction of the short run is stored.

The key observation is that when performing updates, large runs of machine code will be identical, with only relative offsets and absolute addresses changing. Code changes tend to shift huge pieces of binary code in parallel (by the same offset), so the bytewise subtraction expresses these shifts as many repeats of the same bytewise difference, which is then more easily compressed by zlib DEFLATE (or maybe another compressor... I forget the details).

These programs were originally named bdiff and bpatch, but the large number of other programs using those names lead to confusion; I'm not sure if the "bs" in refers to "binary software" (because bsdiff produces exceptionally small patches for executable files) or "bytewise subtraction" (which is the key to how well it performs). Feel free to offer other suggestions.

Ah, thanks for the information. I must admit I have never used it and never looked at what makes it different from other diff algorithms. Will definitely have a look at it !

I find bsdiff works fine on concatenated files, because it searches for common sections over the whole file, rather than a traditional diff which just searches in order.

It can work on concatenated files (if you avoid compression, which screws up everything) but it's usually more efficient to run bsdiff on individual matched pairs. The heuristic bsdiff uses for identifying "mostly-matching" regions is just that -- a heuristic -- and the larger the amount of data you have the higher the chance that bsdiff will try to build a mostly-matching region out of a coincidental sequence of matching bytes.

The archive tools (like 7z) have a special flag to allow for diff over the entire archive by external tools.

The flag will modify the storage format, -kinda packing files/chunk together instead of just compressing into a huge block- so that tools can detect diff and same sections over the full archive.

It's not well know but it is an absolute must known. e.g. for backup archive that goes over rsync, and change little over time.

I looked in the 7z manual but couldn't find anything obviously related to this functionality. Would you mind giving a reference?

'gzip --rsyncable' for gzip.

I couldn't find the equivalent for 7z in a quick glance. It supports at least 5 compression algorithms so there is probably not a general option. Maybe one has to play with the advanced settings per algorithm, if they support it.

gzip --rsyncable does make gzip set boundaries so one change doesn't affect the entire rest of the file, just one block.

However, the resulting differences don't seem nearly as amenable to the bsdiff algorithm as the changes to the underlying uncompressed data. In almost all cases, you'll likely get better results with bsdiff on the uncompressed data (and bsdiff internally compresses its result).

Don't know about how bsdiff operate, maybe it's naive.

Good deduplication and diff algorithms have adaptive block size detection.

If I'm not mistaken, an APK file is a zip archive, meaning that bsdiff was previously run on compressed stuff. As you can see in the article compressing content tends to make the output greatly differ (so much that gzip has a special option to make it more easily diffable, typically for rsync use). So bsdiff wasn't previously run on a concatenation of file; it was not used optimally.

It should in theory work similarly on a tar file, no? (tar without compression)

Apple has also been using bsdiff for incremental app updates (on device). I discovered bsdiff while poking around in one of their update files.

One wonders how much more they'd save if they could use the more sophisticated version you developed. ;-)

I'm reminded of Courgette: https://www.chromium.org/developers/design-documents/softwar...

It's used to deliver unbelievably small Chromium OS updates.

To quote the link above:

  Full update:      10,385,920
  bsdiff update:    704,512
  Courgette update: 78,848
The reason it's not being used here is because Courgette's disassembler/symbol resolver only works on x86/x86_64/ARM assembly language.

But it got me thinking - how much bulk does binary code take up in an Android APK? If it's noteworthy, a JVM disassembler could produce really interesting results.

If it's mostly images and sounds then things are pretty much as optimized as they already are. (I have no idea myself.)

> But it got me thinking - how much bulk does binary code take up in an Android APK?

It's going to depend greatly on the type of packages, and the most problematic (bulky) ones are games, where the vast majority of the package is assets.

> a JVM disassembler

Note that Dalvik bytecode is very very different from the Java bytecode read by JVMs. For one, Dalvik is a register machine (like LuaJIT's internal bytecode) and the JVM is a stack machine (like Python bytecode).

Oooh, that might be one of the reasons why LuaJIT is so fast.

I did some reading about Forth some time ago, and learned that modern CPUs have minor struggles with stack-based interpreters. I understand it's almost negligible but has a small impact. (Forth is stack-based.)

There are many reasons why LuaJIT is false, Mike Pall has talked pretty extensively about it (e.g. http://lambda-the-ultimate.org/node/3851).

I don't recall being register-based coming up as a first-order reason, though it may have allowed optimisations which a stack-based machine would not allow (e.g. the JIT doing register hinting and renaming).

What do you mean by "LuaJIT is false"?

Thanks heaps for that link though, reading through that thread has been really interesting.

> What do you mean by "LuaJIT is false"?

Meant to write "LuaJIT is fast", but was probably thinking about something else and the fingers went haywire, sorry about that.

> Thanks heaps for that link though, reading through that thread has been really interesting.

Pall has written in other venues (mostly LuaJIT mailing lists but I think I remember comments of his on reddit or HN, though I might be mistaken), comments of his are always interesting reads.

Ah, I've had the same kinds of finger-"autocorrect" issues myself. Thanks!

I've been trying to find more info about Mike, he seems to be a really private/mysterious type. What he writes about is definitely interesting though.

Disclaimer: I used to work at Google exactly on this project.

I'm so happy this finally made it to daylight! =) I was the original one researching for the feasibility of this, but at the time (3.5 years ago), the re-compression burden made it a no-go even for the most modern phones so we decided to table it.

Possibly dumb question: why doesn't the hash verification take place on the uncompressed contents, moving recompression from the updating-critical-path to the nice-to-have-later-to-save-disk-space category?

My dream updater would just be a bunch of stream processors, so the downloading / decompression / disassembly (a la courgette), patching / recompression happens all at once, as the data packets come in (thus utilizing both CPU and IO). If done right an update shouldn't take much longer to complete than the download step, no?

Many (most?) Android phones are chronically short of flash space, even in the status quo storage shortage often prevents applying app updates. There is really no room to make storage use more lax.

Instead of faffing about re-compressing all the App data in order to compare signatures, wouldn’t it would be simpler to store two signatures alongside each App in the first place: one for the compressed version & one for the uncompressed one?

Then computing the signature after using the compressed diffs to upgrade an existing App in place would just require walking over the upgraded files & comparing the hash to the previously computed one. No CPU intensive re-compression required.

(Clearly you’d have to sign the diffs in some way to prevent bad actors injecting data into the system, but that’s a separate problem to the 'does this data match what the App developer sent us in the first place' question which Google is currently solving by re-compressing everything at great CPU expense.)

I really wish Google would let us download pre-compiled native blobs on fast WiFi connections. Downloading is by far the fastest part of app installation / update for me, the time spent recompiling the bytecode can measure in the minutes for some apps.

Even worse is the forced-recompilation of every single app for the tiniest system patches, which can take upwards of an hour.

That's no longer the case with Nougat. Apps are now interpreted with a JIT compiler and a profile is created for performing optimized AOT compilation when the device is charging and unused for subsequent executions.

That's great news for the 0.4% of phones with Nougat. :)


Does this have to be turned on somewhere? I updated to Nougat last night and it still recompiled all my apps on the first boot. I didn't notice any improvements while updating apps from the play store either.

The upgrade from Marshmallow to Nougat should be the last time you see "Optimizing Apps".

So they are recompressing the files after applying the patches and hope that it's the same? Wouldn't it make more sense to work on having the signature work on uncompressed files?

Also it's kind of shocking that almost everyone is using deflate in with compression level 6 or 9. Why is 6 even used? Why does almost nobody use zopfli which achieves 3–8% higher compression rates. Why aren't other compression formats supported like lzma, brotli, etc.?

> Wouldn't it make more sense to work on having the signature work on uncompressed files?

No! That would create an additional security risk, in addition to potential performance issues.

Ideally, unchecked data should only ever hit the signature check and nothing else. Otherwise, it could also attack the decompression, not just through security holes, but also through denial of service (by injecting something that explodes in size on decompression).

This is a general principle! You should neither decompress, nor decrypt, nor unpack/untar, nor parse (XML/JSON/...), nor anything else before the signature check. See also:

"Preauthenticated decryption considered harmful"


However, I agree that it makes sense to have an additional check after decompression. However, that check usually happens in most decompression tools anyway, and usually through fast, lightweight checksums. Because, again, the point here is to detect bugs in the decompressor, not to protect from malicious tampering, which should always happen before complex stuff like decompression.

> Otherwise, it could also attack the decompression, not just through security holes, but also through denial of service (by injecting something that explodes in size on decompression).

> This is a general principle! You should neither decompress, nor decrypt, nor unpack/untar, nor parse (XML/JSON/...), nor anything else before the signature check.

Google have stated that they're going to be decompressing, patching and recompressing data in an attempt to recreate the original data stream - simply to perform a signature check.

So, whilst minimising the attack surface is a great goal in general, your advice is not relevant here. They're already vulnerable to having their decompression, patching and compression software attacked.

Your old apk is already signed. I guess it would make sense to sign the patch too (if they don't do it already).

Yes, but the payload to check here is the diff, not the full APK.

What happens today is that the diff is verified, then applied, then the full APK checksum is verified. So the diff is checked before being applied; whether the checksum is verified at APK level or at file level, we have passed the signature checking.

But the original question was whether the signature check should happen after applying the patch instead of before, and the answer is clearly "No!". You can and should clearly do both, but if you'd have to decide either-or, always verify first, as early as possible.

> Why is 6 even used?

The zlib default is 6. The "maximum compression" is 9. Most people would either use the default, or request maximum compression.

> Why does almost nobody use zopfli which achieves 3–8% higher compression rates.

Most people are probably using Google's tools to create the APK, and these tools probably use zlib.

> Why aren't other compression formats supported like lzma, brotli, etc.?

Most compression formats with higher compression ratios (like lzma) need more power and memory to uncompress. As for brotli, not only IIRC it's newer than the APK format, but also most of its gains come from a predefined dictionary, which works only for text files.

Also, the APK format is AFAIK based on Java's JAR format, which uses Deflate exclusively. Many other recent formats also use a JAR-like container, like OpenDocument and EPUB.

I'd add that lzma memory usage is accounted in [multiple?] GB per core for the highest compression levels. That amount of memory is a big no-no for mobile devices.

That’s for compression only isn’t it? I think lzma decompression requires a max of ~64Mb.

(Checks manpage) Yup:

  Preset   DictSize   CompCPU   CompMem   DecMem
  -0     256 KiB       0        3 MiB    1 MiB
  -1       1 MiB       1        9 MiB    2 MiB
  -2       2 MiB       2       17 MiB    3 MiB
  -3       4 MiB       3       32 MiB    5 MiB
  -4       4 MiB       4       48 MiB    5 MiB
  -5       8 MiB       5       94 MiB    9 MiB
  -6       8 MiB       6       94 MiB    9 MiB
  -7      16 MiB       6      186 MiB   17 MiB
  -8      32 MiB       6      370 MiB   33 MiB
  -9      64 MiB       6      674 MiB   65 MiB
Default of -6, so 8Mb. Doesn’t seem too unreasonable on a modern device.

Isn't it usually just the compression stage that's very demanding with these algorithms?

Yes for the most common algorithms. The compression takes a lot more memory than the decompression.

But... you still have to be careful with the decompression if you intend to support small devices. Be it a tiny embedded device (obvious issue) or an old generation $500 android phone where the JVM processes are limited to xx MB (less obvious issue).

> So they are recompressing the files after applying the patches and hope that it's the same? Wouldn't it make more sense to work on having the signature work on uncompressed files?

I was thinking the exact same thing when I read that portion of the article. Trying to recreate the exact APK seems entirely irrelevant.

All the user (and developer) cares about is that the installed product is identical to the one the developer submitted via the Google Play developer console. This is much more easily achieved by generating a signature on the decompressed content. It also gives Google greater freedom to change compression algorithms etc.

Because then to check the signature you'd have to decompress the app. Android runs on lots of hardware, much of which is not very powerful. I would imagine this is also why they use deflate, perhaps others achieve x% better compression if it'd a lot slower on certain hardware then maybe it's not worth it.

Two signatures: one on the compressed APK which is used on the initial install & one for the concatenation of the uncompressed data which is used to check in-place upgrades.

Obviously the diffs themselves also need to be signed in some way to prevent bad actors injecting data into the decompression system, but that’s a separate issue to checking that the updated App matches the data originally delivered to Google by the App developer.

Now if someone could figure out how to reduce the size of installed apps. And preferably explain why most Android apps seem to be about 60MB even for trivial things.

I have worked on a few mobile games where there was a big developer push for reducing the app size but it was shot down by management to add more new features instead. A lot of features had expected performance targets to hit before they got the go ahead and it was hard for reducing the app size to have a sizeable difference on KPIs until it got to critical levels.

I think there is also a certain amount of developer apathy towards application size.I was recently talking to a colleague who had worked on a new banking app and they saw no problem with developing, what was essentially a web front end, with the Unity game engine. When I asked if he was concerned the effect on the app size including a game engine would have he said it was what they where used to using and that was more important.

A banking app in Unity??!?

Come to think of it, my banking app takes forever to start up, I wonder why...! I've never bothered to check how big it is, though, which possibly proves your colleague's point. :(

Well, if they are a C# shop, that might be a really easy way to get their app on phones while being able to reuse code that might already exist, while also not having to deal with phone specific toolkits.

Might have Crosswalk or similar browser based view embedded

> banking app

> Unity game engine

Bruce Schneier just had a heart attack

3 items needed to login: Username, Password, beat this Candy Crush level in under 2 minutes.

Which bank is that?

I believe it was Atom bank. https://www.atombank.co.uk/

And the app can be found here confirming it was created in Unity https://play.google.com/store/apps/details?id=com.atombank.r...

Which trivial app is 60MB? Most apps I use are 1-10MB. The games are, of course, much larger.

I actually did exactly the same thing over a year ago. I needed to download Cynogen nightlies quite frequently which was not feasible at that time as I was on limited data. So I used to download the nigthly on my digitalocean droplet and calculate patch over their. Patch sizes were just around 12MBs. I would then download the patch file and patch the old nightly on my local system. Voila! phone upgraded by downloading just 12MBs! I did thought why Google can't do the same thing with Android applications.:)

Slightly offtopic: What algorithm does git use for binary diffs in packfiles? Is it bsdiff, or the improved algorithm form Colin's thesis, or something else?

As often is the case, SO has an answer: http://stackoverflow.com/questions/9478023/is-the-git-binary...

Google figured out you meant Colin Percival, but I'm not quite sure how/where to find this improved algorithm you speak of, which sounds very interesting.

> Disclaimer: if you see different patch sizes when you press "update" manually, that is because we are not currently using File-by-file for interactive updates, only those done in the background.

Out of curiosity, does anybody know why that would be the case? I had assumed automatic app updates used exactly the same code paths / APIs as manually pressing "update" in the Google Play app did, just triggered automatically. But it sounds like they actually go via a different route?

The article implies they prioritise installation time for manual installs (which sounds sensible to me, since waiting time is the biggest source of frustration for users, and decompression is the bottleneck there, especially on older devices.

This kind of problems could be made public.

Small or big company, government or organisation could define problem, give test input, expected output and performance points (delta size, cpu usage etc); submissions open for all.

Similar to programming competitions but instead of made-up problems, there is a real world problem to be solved.

This could be good for everybody - companies, because they'd get (I hope) the best solutions (there could be some price for top solutions, but still fraction of in-house r&d cost), for people, because they could be hired or at least would have a chance to test their skills on real problems; it could be used as part of hiring process to skip early nonsense etc.

The biggest win, IMHO, would be to have government organisations getting involved in this kind of "Open R&D" - open for contributions and also open for new, not yet articulated ideas.

It already is public. That's why Google is using bsdiff, an algorithm developed for FreeBSD, and which they are legally able to use due to the way in which it is licensed. We don't need some sort of central clearinghouse of problem instances.

bsdiff predates the entire concept of Android by many years. To the extent that Google wasn't using it from day 1, it isn't because the solution didn't exist somewhere. (It sounds like it was due to problems that would have existed for any clever solution and they chose deliberately to ship down un-diffed-in-any-way updates.)

There's actually a profound lesson about how open source really works here. And I'm not really saying this as an open source evangelist... it's more an observation that our instinctive (and probably genetic) instinct in favor of centralization is not always an accurate model of how the world either does or should work. There's not much some sort of centralized contest could do in the present that wasn't already done in the real world over a decade ago without it.

On a side note: I "love" when I have to download 100+ MB update of the mobile app with only changelog description: "minor fixes".

Or worse, opening up the app on the train just to be greeted "This version is outdated. Please update to continue using."

It's great when it happens with Uber and you're outside, with a crappy 3G connection.

Why recompress at all?

Rather than using .zip, an improved APK format could be more like a tarball. You'd tar up the files, sign it, and compress the result.

To verify, you decompress (streamily, without saving the result anywhere) in a tight sandbox and verify the signature on the output. Then you decompress again to install. Call it APK v3.

This adds a second pass at decompression, but decompression is generally quite fast. In exchange, you avoid a much slower recompression step.

Yes, it definitely seems like fixing the flaws in the original APK design would be a win! Just signing the uncompressed data, as many others here have pointed out, would allow most of your suggestion to be implemented.

I wonder if it's a sign of communication problems between the Android and Google Play teams? The Play team seems to have spent years bending over backwards to build this really awkward solution, which could have been done a lot better with some low-level Android changes (i.e. a new APK format).

I bet there's a valid historical reason for the current design: APK looks a lot like JAR and Android is kind-of-sort-of Java. Java random-accesses its JARs, so the awkward format makes sense in the context. Android may, too, to a limited extent, but Android has no concept of running an APK in place, so it doesn't need this capability.

Well, exactly, both APKs and JARs are just zipfiles with some extra conventions over format and contents.

Why not update the format? Android has been around long enough to make the advantages and disadvantages pretty clear.

Perhaps I'm misunderstanding what you're suggesting, but IIRC installed Android apps are still compressed on disk, to save storage space, hence the recompression.

Enabled only for automatic updates because it's CPU intensive.

Auto update is the default and I'm sure it's what most people is using. I disabled it many years ago to prevent app abusively add permissions and to prevent crashes. It was pretty common for new versions not to work properly on many devices. I remember checking the reviews before updating. This hasn't happened anymore for at least a couple of years and Android 6 solved the issue with the permissions but I didn't lose the habit. I'll download the full update. Furthermore I keep the phone in airplane mode at night. I want to sleep.

Before Marshmallow it would refuse to auto-update apps if they added a permission. It would just show you a notification about the new permission and whether to accept/decline the update.

So I did it because of the crashes. I forged the other memory. Thanks.

Doesn't your phone have a sleep mode where all notifications etc are disabled for the time period you set. As this is the only phone I have I can't disable it completely have just whitelisted important numbers that would call in an emergency.

It's got a do not disturb mode, I didn't check if it has a whitelist. People that could call me for emergencies have my land line number which is over my home Internet connection. If I had only a mobile phone I'd probably do like you do.

You can whitelist them if you go to the notification settings for the app: http://iob.imgur.com/uYGg/Jj0DmTtRUy

If you have enabled Power Notification Controls then you can also blacklist from the same place: https://imgur.com/VgDowW3

I don't really understand this bit. Are packages stored on Android devices compressed? If not, why are they recompressing data?

The APK signature is over the compressed file and needs to match.

Aha! They should fix that. :-)

How long does it take to do this, and what is the impact on battery life? I'd like to know for a high end device like the S7 or Pixel, but also the low end $99-$199 phones.

We need to love the work those PhDs able to invert a binary tree on a whiteboard are able to come up with.

They moved away from pure AOT compilation to JIT/AOT, because it was taking too much time and processing power.

So now they come up with unpacking and recompressing APKs on the device using CPU intensive algorithms.

It's all about the user experience. The old AOT-only system made system updates take ages, during which time the phone was unusable.

This new app update system is slower and more CPU intensive, but doesn't affect the UX because it only happens when the phone is idle and charging - the perfect time to do something CPU intensive.

And in both cases, Apple and Microsoft do it server side via the store, instead of destroying the user experience on the devices.

How can you implement file-by-file based binary diffs without recompressing anything?

Using something like App Thinning introduced in iOS 9?

If I understand it correctly it seems a bit orthogonal thing to binary diffs.

Slicing will prepare an app bundle per target from source app bundle.

Bitcode will be compiled to target spec.

On-demand resources are self-explanatory.

In Android AFAIK you have to do all those things by yourself. With this difference that you can't compile Java bytecode yourself and send it to the user. If you use NDK you have to do this. But I would argue that in most cases it does not result in major size differences. The biggest difference would be in case of app using lots of graphical assets that could be delivered with different DPI depending on device. However, as I said, AFAIK you can do this.

The bundle, the compiled binary and resources have to be somehow delivered to the user. The files probably in many cases would be delivered in compressed form. Either they will stay compressed and be decompressed on demand or be decompressed after download. Most probably it will be the former.

You can send whole data each time or give some kind of delta to whatever user has. In case of delta you can diff either compressed or uncompressed data. In case of former you will have the same issue as Google had with their bsdiff between APKs. In case of latter you will have to decompress the data and it would be a waste of space not to compress it again later. The results would be the same as what Google currently do.

If Apple would like to optimize the size of updates they would have to do the same (I don't know if they are not doing it already). To achieve this the data would have to be recompressed.

This is great for games like Hearthstone that currently requires a re-download of the whole app which takes 2 GB whenever there's an update.

> This is great for games like Hearthstone


Hearthstone is a Unity game, and the issue (for mobile packaging at least) is Unity creates a giant bundle of all the assets, so it's a single binary file which tends to greatly change from one release to the next, even with per-file diffing you're not getting any improvement[0].

Other developers get around that by downloading assets separately from the main application bundle à la Supercell. I don't know that Supercell uses Unity, but they do follow that pattern, they have an application bundle of code & trivial assets, but most of the assets are downloaded at the loading screen.

Checking on my phone, Hearthstone is ~2GB with 10.5MB of "documents and data" (= a ~2GB base app bundle), clash royale is ~450MB with 336MB of data (= a ~100MB base app bundle), jetpack joyride is similar with ~330MB split between a ~90MB base bundle and ~250MB "documents and data".

The default unity approach makes sense for rarely updated applications (Monument Valley or République), it's an absolute bear for dynamic frequently updated on-line games like Hearthstone.

[0] IIRC per-file diffing is what apple does on the appstore, it does not work at all for HS

Unity has asset bundles precisely for use cases like streaming and downloading of assets:


Blizzard just doesn't seem to be using them...

Unity aside, the Play Store has a standard method for shipping large assets (expansion files) but it's badly designed and I think won't work with this new scheme.

There's maximum file size for APKs, and if you exceed that you have to put extra stuff in one or two (why only two?) expansion files. They recommend that these files be compressed, but they're explicitly Opaque Binary Blobs so the decompress/diff/recompress hack won't work. There is a standard format they recommend -- a disk image -- but it's so buggy on Android as to be unusable. (I finally saw some action on the bug tracker recently, so it might be fixed, or nearly fixed, in the latest version of Android.)

Even before bsdiff and this new file-by-file patching mechanism I find Android updates much faster than iOS. I don't know if iOS updates are just bigger becuase application are bigger, or is just that apple has a worst CDN (I live in Argentina).

TL:DR? Compress diffs, don't diff compressed things?

Could rsync algorithm be used to prepare binary diffs? How would it compare to bsdiff?


Thanks for the downvote.

There is a tool using rsync algorithm for binary diffs. It's called rdiff [1]. I found Master's thesis Binary Differencing for Media Files by Vladimir Komsiyski [2]. What I take from there is that rdiff is fast, but bsdiff produces smaller patches.


> bsdiff is marginally better than xdelta as far as compression is concerned. rdiff and especially edelta generate much larger patches. The upload gains from the viewpoints of the server are exactly the same as the compression rates and the saved disk space is proportional to it. For example, an expected number of versions per file of 3.7 (which is the case in Dataset A) and differencing with xdelta result in 70% less required disk space.

> As expected, bsdiff ’s slow differencing time results in 3 times slower upload from the viewpoint of the client when compared to the server’s experienced time. Apart from that, the added time due to the differencing is smaller compared to the transfer of the produced patch and lead to up to 21 times faster transfer than when no differencing is done. Even the worst performing tool edelta reaches a speed-up with a factor of 3. xdelta, showing the best results out of the four tools, achieves a 21x speedup when transferring a file to the server. The second best tool, bsdiff , is only 9 times faster. However, its coefficient may dramatically increase with increasing the file size (Adobe Photoshop files) because of the non-linear runtime of this tool. edelta is severely outperformed by all other tools. The fastest tool rdiff loses its lead due to the bigger patches it produces.

> For Adobe Documents (and most likely other image editing software files) the binary differencing tool xdelta shows the best run-time for the compression it achieves. rdiff is faster, but its patches are bigger. bsdiff shows better compression, but its non-linear runtime makes it unusable for files of this size. The benefits of xdelta are considerable - decrease of the file size with more than 96% for Adobe InDesign files, causing up to 30 times faster file transfer. Adobe Photoshop documents also show substantial gain despite their much larger size.

[1] https://en.wikipedia.org/wiki/Rsync#Variations https://linux.die.net/man/1/rdiff

[2] http://scriptiesonline.uba.uva.nl/document/490827

I wonder how this deals with renamed files.

If it doesn't, one should avoid renames of bigs assets when working on updates of an already released app.

The question, why took this so long ? 8 years for figuring out how to make differential updates...

Is this basically something like git? Isn't it crazy that something like this hasn't been implemented until today?

I am not sure what you mean by git. This is binary data diffing using bsdiff, applied at the individual file level instead of the whole package (single file).

Even in git there are lot of challenges: http://githubengineering.com/how-we-made-diff-pages-3x-faste...

Nice work there but who really talks like this?

"in order to provide users with great content, improve security, and enhance the overall user experience"

The author is listed right above that: Posted by Andrew Hayden, Software Engineer on Google Play.

The author of this post now works at Amazon!


They deserve serious kudos for managing to finish and ship their product at Google first. (Another commenter here said the project started 3.5 years ago, wow)

If only they spent this much effort in getting Android OS updates to consumers.

How does this differ to the delta updates that came out over four years ago?


It's in the article: that worked on the compressed data, this change works on uncompressed data (and so can be much more effective).

Read the article before commenting?

Which part of the article covered the 2012 announcement? There was the piece near the top: `earlier this year, we announced that we started using the bsdiff algorithm (by Colin Percival).`

But earlier this year would be 2016, not 2012 so I assumed that this was something else.

And clicking on the link in your quoted text would give you this:

> Google Play has used delta algorithms since 2012, and we recently rolled out an additional delta algorithm, bsdiff

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact