A quick estimate based on prices quoted by Google Cloud Engine for Egress transfers ($0.08/GB) gives $183 million in yearly savings.
Surely Google pays much less than that for their own transfers, but even if we assume they are paying 1% of that price it is still respectable. Pretty sure they are paying more than that though.
EDIT: Found better estimates - Amazon Cloudfront is charging as low as $0.02/GB (on demand pricing), which translates to $46 million. Given very good knowledge Google has about their users and their vast infrastructure I'm guessing their costs are fraction of this. Anyone with domain knowledge that can comment on this?
It wouldn't surprise me if Google was scheduling the majority of updates to take place in off peak times, when the bandwidth would simply be otherwise unused. If so, the cost to Google would be a rounding error of your estimate based on GCE traffic prices.
Edit: bsdiff is documented to use memory 17x the file size.
Speaking of: is there a good way to ask an OS to "reserve memory and CPU-time" for emergency interactive maintenance (e.g. spawning a console on a thrashing box), in a similar way to how ext2/3/4 "reserves space" for the root user for emergency maintenance?
The only real way I can think of to do this (that doesn't involve already knowing what processes could thrash and cgroup'ing them) is to run a hypervisor and treat the dom0 as the emergency console. That still doesn't let you get inside the thrashing instance to fix it, though.
The system can easily slow to a crawl as memory accesses and allocations are constantly triggering reclaim and causing the system to thrash. The kernel has no way to know that it should prioritize the terminal you are trying to open to kill things rather than try its hardest to keep the offending application running.
The best way to prevent this, if you know an application will use lots of memory, is to run it in a memory cgroup. With a memory cgroup you can limit the applications total allowed memory, reserving some for the rest of the system.
Unfortunately, I don't know of a super nice CLI, but here's an example using cgcreate/cgexec (https://packages.debian.org/stretch/cgroup-tools):
$ # Create a memory cgroup named 'mygroup', allowing $USER to change the parameters and run tasks in the cgroup.
$ sudo cgcreate -a $USER -t $USER -g memory:mygroup
$ # Set the memory limit to 10MB
$ echo 10000000 > /sys/fs/cgroup/memory/mygroup/memory.limit_in_bytes
$ # Run a program in the cgroup
$ cgexec -g memory:mygroup ls
This is one of the problems multitask OS's were invented to solve. The answer is always let other tasks run too.
It's especially a problem with AWS's tN.micro instances: when something thrashes them for long enough, they ratchet down to such a low CPU allocation that it will likely take hours before SSHD gets context-switched to enough times to start your process. By then, all sorts of timeouts will likely have killed your session.
Considering the number of AWS credentials out there, I can see malicious users logging in, forkbombing the instance, getting it to throttle back, and then walking away whistling until the person with dashboard login wakes up and can reboot the machine.
It's HN. Do you even have to ask?
Thanks, truly, for creating such a tremendously useful tool.
Really awesome utility.
The key observation is that when performing updates, large runs of machine code will be identical, with only relative offsets and absolute addresses changing. Code changes tend to shift huge pieces of binary code in parallel (by the same offset), so the bytewise subtraction expresses these shifts as many repeats of the same bytewise difference, which is then more easily compressed by zlib DEFLATE (or maybe another compressor... I forget the details).
The flag will modify the storage format, -kinda packing files/chunk together instead of just compressing into a huge block- so that tools can detect diff and same sections over the full archive.
It's not well know but it is an absolute must known. e.g. for backup archive that goes over rsync, and change little over time.
I couldn't find the equivalent for 7z in a quick glance. It supports at least 5 compression algorithms so there is probably not a general option. Maybe one has to play with the advanced settings per algorithm, if they support it.
However, the resulting differences don't seem nearly as amenable to the bsdiff algorithm as the changes to the underlying uncompressed data. In almost all cases, you'll likely get better results with bsdiff on the uncompressed data (and bsdiff internally compresses its result).
Good deduplication and diff algorithms have adaptive block size detection.
It's used to deliver unbelievably small Chromium OS updates.
To quote the link above:
Full update: 10,385,920
bsdiff update: 704,512
Courgette update: 78,848
But it got me thinking - how much bulk does binary code take up in an Android APK? If it's noteworthy, a JVM disassembler could produce really interesting results.
If it's mostly images and sounds then things are pretty much as optimized as they already are. (I have no idea myself.)
It's going to depend greatly on the type of packages, and the most problematic (bulky) ones are games, where the vast majority of the package is assets.
Note that Dalvik bytecode is very very different from the Java bytecode read by JVMs. For one, Dalvik is a register machine (like LuaJIT's internal bytecode) and the JVM is a stack machine (like Python bytecode).
I did some reading about Forth some time ago, and learned that modern CPUs have minor struggles with stack-based interpreters. I understand it's almost negligible but has a small impact. (Forth is stack-based.)
I don't recall being register-based coming up as a first-order reason, though it may have allowed optimisations which a stack-based machine would not allow (e.g. the JIT doing register hinting and renaming).
Thanks heaps for that link though, reading through that thread has been really interesting.
Meant to write "LuaJIT is fast", but was probably thinking about something else and the fingers went haywire, sorry about that.
> Thanks heaps for that link though, reading through that thread has been really interesting.
Pall has written in other venues (mostly LuaJIT mailing lists but I think I remember comments of his on reddit or HN, though I might be mistaken), comments of his are always interesting reads.
I've been trying to find more info about Mike, he seems to be a really private/mysterious type. What he writes about is definitely interesting though.
I'm so happy this finally made it to daylight! =) I was the original one researching for the feasibility of this, but at the time (3.5 years ago), the re-compression burden made it a no-go even for the most modern phones so we decided to table it.
My dream updater would just be a bunch of stream processors, so the downloading / decompression / disassembly (a la courgette), patching / recompression happens all at once, as the data packets come in (thus utilizing both CPU and IO). If done right an update shouldn't take much longer to complete than the download step, no?
Then computing the signature after using the compressed diffs to upgrade an existing App in place would just require walking over the upgraded files & comparing the hash to the previously computed one. No CPU intensive re-compression required.
(Clearly you’d have to sign the diffs in some way to prevent bad actors injecting data into the system, but that’s a separate problem to the 'does this data match what the App developer sent us in the first place' question which Google is currently solving by re-compressing everything at great CPU expense.)
Even worse is the forced-recompilation of every single app for the tiniest system patches, which can take upwards of an hour.
Also it's kind of shocking that almost everyone is using deflate in with compression level 6 or 9. Why is 6 even used? Why does almost nobody use zopfli which achieves 3–8% higher compression rates. Why aren't other compression formats supported like lzma, brotli, etc.?
No! That would create an additional security risk, in addition to potential performance issues.
Ideally, unchecked data should only ever hit the signature check and nothing else. Otherwise, it could also attack the decompression, not just through security holes, but also through denial of service (by injecting something that explodes in size on decompression).
This is a general principle! You should neither decompress, nor decrypt, nor unpack/untar, nor parse (XML/JSON/...), nor anything else before the signature check. See also:
"Preauthenticated decryption considered harmful"
However, I agree that it makes sense to have an additional check after decompression. However, that check usually happens in most decompression tools anyway, and usually through fast, lightweight checksums. Because, again, the point here is to detect bugs in the decompressor, not to protect from malicious tampering, which should always happen before complex stuff like decompression.
> This is a general principle! You should neither decompress, nor decrypt, nor unpack/untar, nor parse (XML/JSON/...), nor anything else before the signature check.
Google have stated that they're going to be decompressing, patching and recompressing data in an attempt to recreate the original data stream - simply to perform a signature check.
So, whilst minimising the attack surface is a great goal in general, your advice is not relevant here. They're already vulnerable to having their decompression, patching and compression software attacked.
What happens today is that the diff is verified, then applied, then the full APK checksum is verified. So the diff is checked before being applied; whether the checksum is verified at APK level or at file level, we have passed the signature checking.
The zlib default is 6. The "maximum compression" is 9. Most people would either use the default, or request maximum compression.
> Why does almost nobody use zopfli which achieves 3–8% higher compression rates.
Most people are probably using Google's tools to create the APK, and these tools probably use zlib.
> Why aren't other compression formats supported like lzma, brotli, etc.?
Most compression formats with higher compression ratios (like lzma) need more power and memory to uncompress. As for brotli, not only IIRC it's newer than the APK format, but also most of its gains come from a predefined dictionary, which works only for text files.
Also, the APK format is AFAIK based on Java's JAR format, which uses Deflate exclusively. Many other recent formats also use a JAR-like container, like OpenDocument and EPUB.
(Checks manpage) Yup:
Preset DictSize CompCPU CompMem DecMem
-0 256 KiB 0 3 MiB 1 MiB
-1 1 MiB 1 9 MiB 2 MiB
-2 2 MiB 2 17 MiB 3 MiB
-3 4 MiB 3 32 MiB 5 MiB
-4 4 MiB 4 48 MiB 5 MiB
-5 8 MiB 5 94 MiB 9 MiB
-6 8 MiB 6 94 MiB 9 MiB
-7 16 MiB 6 186 MiB 17 MiB
-8 32 MiB 6 370 MiB 33 MiB
-9 64 MiB 6 674 MiB 65 MiB
But... you still have to be careful with the decompression if you intend to support small devices. Be it a tiny embedded device (obvious issue) or an old generation $500 android phone where the JVM processes are limited to xx MB (less obvious issue).
I was thinking the exact same thing when I read that portion of the article. Trying to recreate the exact APK seems entirely irrelevant.
All the user (and developer) cares about is that the installed product is identical to the one the developer submitted via the Google Play developer console. This is much more easily achieved by generating a signature on the decompressed content. It also gives Google greater freedom to change compression algorithms etc.
Obviously the diffs themselves also need to be signed in some way to prevent bad actors injecting data into the decompression system, but that’s a separate issue to checking that the updated App matches the data originally delivered to Google by the App developer.
I think there is also a certain amount of developer apathy towards application size.I was recently talking to a colleague who had worked on a new banking app and they saw no problem with developing, what was essentially a web front end, with the Unity game engine. When I asked if he was concerned the effect on the app size including a game engine would have he said it was what they where used to using and that was more important.
Come to think of it, my banking app takes forever to start up, I wonder why...! I've never bothered to check how big it is, though, which possibly proves your colleague's point. :(
> Unity game engine
Bruce Schneier just had a heart attack
And the app can be found here confirming it was created in Unity
Out of curiosity, does anybody know why that would be the case? I had assumed automatic app updates used exactly the same code paths / APIs as manually pressing "update" in the Google Play app did, just triggered automatically. But it sounds like they actually go via a different route?
Small or big company, government or organisation could define problem, give test input, expected output and performance points (delta size, cpu usage etc); submissions open for all.
Similar to programming competitions but instead of made-up problems, there is a real world problem to be solved.
This could be good for everybody - companies, because they'd get (I hope) the best solutions (there could be some price for top solutions, but still fraction of in-house r&d cost), for people, because they could be hired or at least would have a chance to test their skills on real problems; it could be used as part of hiring process to skip early nonsense etc.
The biggest win, IMHO, would be to have government organisations getting involved in this kind of "Open R&D" - open for contributions and also open for new, not yet articulated ideas.
bsdiff predates the entire concept of Android by many years. To the extent that Google wasn't using it from day 1, it isn't because the solution didn't exist somewhere. (It sounds like it was due to problems that would have existed for any clever solution and they chose deliberately to ship down un-diffed-in-any-way updates.)
There's actually a profound lesson about how open source really works here. And I'm not really saying this as an open source evangelist... it's more an observation that our instinctive (and probably genetic) instinct in favor of centralization is not always an accurate model of how the world either does or should work. There's not much some sort of centralized contest could do in the present that wasn't already done in the real world over a decade ago without it.
Rather than using .zip, an improved APK format could be more like a tarball. You'd tar up the files, sign it, and compress the result.
To verify, you decompress (streamily, without saving the result anywhere) in a tight sandbox and verify the signature on the output. Then you decompress again to install. Call it APK v3.
This adds a second pass at decompression, but decompression is generally quite fast. In exchange, you avoid a much slower recompression step.
I wonder if it's a sign of communication problems between the Android and Google Play teams? The Play team seems to have spent years bending over backwards to build this really awkward solution, which could have been done a lot better with some low-level Android changes (i.e. a new APK format).
Why not update the format? Android has been around long enough to make the advantages and disadvantages pretty clear.
If you have enabled Power Notification Controls then you can also blacklist from the same place: https://imgur.com/VgDowW3
They moved away from pure AOT compilation to JIT/AOT, because it was taking too much time and processing power.
So now they come up with unpacking and recompressing APKs on the device using CPU intensive algorithms.
This new app update system is slower and more CPU intensive, but doesn't affect the UX because it only happens when the phone is idle and charging - the perfect time to do something CPU intensive.
Slicing will prepare an app bundle per target from source app bundle.
Bitcode will be compiled to target spec.
On-demand resources are self-explanatory.
In Android AFAIK you have to do all those things by yourself. With this difference that you can't compile Java bytecode yourself and send it to the user. If you use NDK you have to do this. But I would argue that in most cases it does not result in major size differences. The biggest difference would be in case of app using lots of graphical assets that could be delivered with different DPI depending on device. However, as I said, AFAIK you can do this.
The bundle, the compiled binary and resources have to be somehow delivered to the user. The files probably in many cases would be delivered in compressed form. Either they will stay compressed and be decompressed on demand or be decompressed after download. Most probably it will be the former.
You can send whole data each time or give some kind of delta to whatever user has. In case of delta you can diff either compressed or uncompressed data. In case of former you will have the same issue as Google had with their bsdiff between APKs. In case of latter you will have to decompress the data and it would be a waste of space not to compress it again later. The results would be the same as what Google currently do.
If Apple would like to optimize the size of updates they would have to do the same (I don't know if they are not doing it already). To achieve this the data would have to be recompressed.
Hearthstone is a Unity game, and the issue (for mobile packaging at least) is Unity creates a giant bundle of all the assets, so it's a single binary file which tends to greatly change from one release to the next, even with per-file diffing you're not getting any improvement.
Other developers get around that by downloading assets separately from the main application bundle à la Supercell. I don't know that Supercell uses Unity, but they do follow that pattern, they have an application bundle of code & trivial assets, but most of the assets are downloaded at the loading screen.
Checking on my phone, Hearthstone is ~2GB with 10.5MB of "documents and data" (= a ~2GB base app bundle), clash royale is ~450MB with 336MB of data (= a ~100MB base app bundle), jetpack joyride is similar with ~330MB split between a ~90MB base bundle and ~250MB "documents and data".
The default unity approach makes sense for rarely updated applications (Monument Valley or République), it's an absolute bear for dynamic frequently updated on-line games like Hearthstone.
 IIRC per-file diffing is what apple does on the appstore, it does not work at all for HS
Blizzard just doesn't seem to be using them...
There's maximum file size for APKs, and if you exceed that you have to put extra stuff in one or two (why only two?) expansion files. They recommend that these files be compressed, but they're explicitly Opaque Binary Blobs so the decompress/diff/recompress hack won't work. There is a standard format they recommend -- a disk image -- but it's so buggy on Android as to be unusable. (I finally saw some action on the bug tracker recently, so it might be fixed, or nearly fixed, in the latest version of Android.)
Thanks for the downvote.
There is a tool using rsync algorithm for binary diffs. It's called rdiff . I found Master's thesis Binary Differencing for Media Files by Vladimir Komsiyski . What I take from there is that rdiff is fast, but bsdiff produces smaller patches.
> bsdiff is marginally better than xdelta as far as compression is concerned. rdiff and especially
edelta generate much larger patches. The upload gains from the viewpoints of the server are
exactly the same as the compression rates and the saved disk space is proportional to it. For
example, an expected number of versions per file of 3.7 (which is the case in Dataset A) and
differencing with xdelta result in 70% less required disk space.
> As expected, bsdiff ’s slow differencing time results in 3 times slower upload from the viewpoint
of the client when compared to the server’s experienced time. Apart from that, the added
time due to the differencing is smaller compared to the transfer of the produced patch and lead
to up to 21 times faster transfer than when no differencing is done. Even the worst performing
tool edelta reaches a speed-up with a factor of 3. xdelta, showing the best results out of the
four tools, achieves a 21x speedup when transferring a file to the server. The second best tool,
bsdiff , is only 9 times faster. However, its coefficient may dramatically increase with increasing
the file size (Adobe Photoshop files) because of the non-linear runtime of this tool. edelta is
severely outperformed by all other tools. The fastest tool rdiff loses its lead due to the bigger
patches it produces.
> For Adobe Documents (and most likely other image editing software files) the binary differencing
tool xdelta shows the best run-time for the compression it achieves. rdiff is faster,
but its patches are bigger. bsdiff shows better compression, but its non-linear runtime makes it
unusable for files of this size. The benefits of xdelta are considerable - decrease of the file size
with more than 96% for Adobe InDesign files, causing up to 30 times faster file transfer. Adobe
Photoshop documents also show substantial gain despite their much larger size.
 https://en.wikipedia.org/wiki/Rsync#Variations https://linux.die.net/man/1/rdiff
If it doesn't, one should avoid renames of bigs assets when working on updates of an already released app.
"in order to provide users with great content, improve security, and enhance the overall user experience"
But earlier this year would be 2016, not 2012 so I assumed that this was something else.
> Google Play has used delta algorithms since 2012, and we recently rolled out an additional delta algorithm, bsdiff