Hacker News new | past | comments | ask | show | jobs | submit login
A better zip bomb (bamsoftware.com)
483 points by masklinn on July 4, 2019 | hide | past | favorite | 131 comments

I recently created a zip validator and decoder for scanning email attachments, @ronomon/zip.

It's not yet open-sourced but it has defenses against excessive compression ratios, mismatching local and central directory headers, ambiguous filenames, directory traversals and symlink traversals, and anything ambiguous that could exploit differences in zip implementations, e.g. some zip implementations decode from the front while others decode from the back. Most importantly, it balks at deviations from the zip format, including any kind of overlapping or sparseness, buffer bleeds etc.

At least it detected all three of the author's samples as malicious.

Now might be a good time to open-source it, even if it doesn't "feel ready" yet. There may be developers who would install it today, as zip bombs are on their mind, and upgrade it in the near term, but would otherwise forget about zip vulns entirely as they go about their days.

Thanks for the encouragement:


Make a Show HN!

If it does not "feel ready" I wouldn't want to expose it to random emails or HTTP. There are a lot worse things than zip bombs that can be used to attack and those avenues are basically open to the internet.

If I recall my PKZip lore, being able to append an updated file to the end of a zip was considered a feature. That leaves a dead copy of the file earlier in the archive, but saves you having to do the floppy shuffle to update one file.

But I don’t recall ever mentioning that fact to someone who already knew it. While you could probably get away with rejecting that file (who still uses that? Some sort of streaming protocol?), it was a feature at one point.

Yes, I made a decision to reject incrementally updated archives. It was a feature back in the day, but as you say it also leaves dead zones in the archive. For a rarely used feature, it's a dangerous feature.

These "invisible" dead zones can be used for malware stuffing, or to exploit ambiguity across different zip implementations, those that parse forwards (using the local file header as the canonical header) and those that parse backwards (using the central directory header as the canonical header).

For example, a malware author might put an EXE in the first version of a file, and a TXT in the second version of that same file. Those that parse forwards get the EXE. Those that parse backwards get the TXT. Of course, the spec advocates parsing backwards according to the central directory record, but implementations exist that don't do this.

The goal of @ronomon/zip is to scan email attachments at the gateway and reject zip archives that might prove dangerous to more vulnerable zip software running downstream (MS Office, macOS) etc.

Also, as you say, I don't think incrementally updated archives are used much. From what I could see, there were no false positives for rejecting gaps between referenced local files on a small sample of 5000 archives.

I understand that you don't want to release prematurely but what I don't understand: Why don't you release your code from the first line written? I start committing to Github right from the start. Others might already want to follow your project or even contribute PRs. I'd definitely be interested. Maybe at least create the repo so we can subscribe to activities ;) Best!

It's like asking an artist to show their initial drafts and sketches that are probably looking pretty bad, at a point they don't yet know where it's going or if they will even finish the piece.

Some have enough confidence and are willing to do it, others would rather only show the final piece and be free to not publish anything if they aren't happy with the result. Both stances are ok.

Pretty powerful! Lots of breakage with things that touch this file. Simply downloading it in Chrome caused issues, Chrome began extracting it to a temp folder (presumably for some malware scanning?) and quickly started filling the disk.

Windows 10 then began doing the same thing for Windows Defender, but some sane limits aborted it after a few seconds.

See, I don’t understand this. Zip is a streamable format. I don’t understand why you would extract the archive before checking the contents?

I worked on productizing a code signing tool a while back and I believe the first thing I did after we got it working was change it so nothing touched the disk until after the signature had been validated (in this case the signers had business relationships with each other. This would be necessary but insufficient for download s from the internet).

There were already well known CERT advisories about how relative paths can confuse archive tools, email tools and web servers from Microsoft. Know history or repeat it.

I didn’t know “zip bomb” as a phrase but I knew a good bit about compression, so when I needed to fix a problem with zips over 2G I managed to make myself a test fixture that was around 80k without modifying the file format. I think it was just 2.01G of white space.

> Zip is a streamable format

Not really. For a normal, created in one-shot ZIP, yes. But the types of ZIPs in the article are going to act differently if you tried streaming them. The core idea of the article is overlapping the various files within the ZIP, s.t. they share bytes. But this is only apparent if you're using the central directory, which you can't if you're streaming, since it appears after all the data. If you're streaming, you're using the local file headers, but for ZIPs such as those in the article, you will see many less LHFs than had you looked them up in the central directory, b/c they overlap. (In the streaming case, I think you'd see exactly 1 file. It would still be huge, given the other tricks in the article.)

Also, I think you can "append" to ZIPs (this is why the central directory is at the end, s.t. it can be overwritten by new data, and then re-appended.) I think this approach allows tools to also "delete" data by simply removing the entry from the central directory, and re-appending it w/o, so the central directory is essentially the authoritative source for the ZIPs contents. (Though I suppose a streaming decompressor could decompress to a temporary location and then only move non-deleted entries into their final place.)

The Wikipedia page echos this:

> Tools that correctly read ZIP archives must scan for the end of central directory record signature, and then, as appropriate, the other, indicated, central directory records. They must not scan for entries from the top of the ZIP file, because (as previously mentioned in this section) only the central directory specifies where a file chunk starts and that it has not been deleted. Scanning could lead to false positives, as the format does not forbid other data to be between chunks, nor file data streams from containing such signatures.


Why is everybody fixating on my preface?

> I don’t understand why you would extract the archive before checking the contents?

Because a curious part of written internet culture is that of fixating on small mistakes that aren't even important to the overall context or meaning of a post, and ignoring the actual content. Why, I'm unsure, but I've been noticing it a lot more now that I'm aware of it!

Been thinking the same, but it also often feels like “waiting for your turn to talk”.

> See, I don’t understand this. Zip is a streamable format.

Zip is a streamable format but it also supports random access through a table of contents (the "central directory") located at the end of the file. This bomb works by overlapping the file offsets in the table of contents.

> I don’t understand why you would extract the archive before checking the contents.

If by "checking" you mean to examine the archive structure to determine whether it's corrupt, and "extracting" you mean "write the archive's contents to disk" then they are fundamentally the same thing. The only difference is that checking sends the content to /dev/null instead of a file.

As to why they're writing the contents to disk I can only speculate. Perhaps they're using a library that doesn't expose an "extract to memory" feature, or maybe it's an anti-zipbomb measure to avoid out-memory/denial-of-service attacks.

Have you ever done any work in this area? Because it sounds like you know what you’re talking about, except it’s all nonsense.

Zip format can be de/compressed progressively, which is one reason why it’s nice for HTTP transport encoding. The file format is decompressed one record at a time and many or most libraries can give you this as a stream, so it never has to hit disk or be “sent to dev/null”.

If you take responsibility for streaming the records to disk (trivial), then you can check the canonical path before writing, and any other filesystem sanity tests you want to do.

Last year I implemented zip reading and zip writing in a hobby project of mine. I'm not an expert, but I know enough to write a working zip reader/writer.

> Zip format can be de/compressed progressively, which is one reason why it’s nice for HTTP transport encoding.

Do you mean HTTP transfer encoding? If so then it's not the zip archive format that's used, but rather the deflate compression algorithm (which zip also uses.)

> The file format is decompressed one record at a time

But not necessarily in the order they appear.

> many or most libraries can give you this as a stream, so it never has to hit disk or be “sent to dev/null”.

My point is that the compressed bytes have to be decompressed and checksummed in both extraction and checking, but after that the bytes may either be written or discarded.

> If you take responsibility for streaming the records to disk (trivial), then you can check the canonical path before writing, and any other filesystem sanity tests you want to do.

That's true but there's nothing wrong with the paths in this case.

Why would Chrome automatically begin unzipping the file?

I'm afraid to even download it now...

I accidentally downloaded it without realising this. I'd assume it performs malware scanning. Luckily I was able to end the process from the chrome task manager (shift+escape) without disrupting the rest of the browser.

TIL chrome has a task manager. Thanks for posting the shortcut! +1

Or it could do the same as Safari, open "safe files" by default (which for zip archives means "decompress them").

Safari always unzips any downloaded archive, wraps it in a folder and puts it into ~/Downloads.

I prefer this functionality as most of the time I do want to unarchive it. I can rearchive it later (or remember to use another browser) when I need to.

If you download a Wordpress plugin, you need to re-up it zipped -- not a major use-case but I could see other similar situations (downloading tarballs). It's an interesting choice.

MS Windows' transparent zip treated as folders works well (it's the same on Kubuntu), one could just do that and pre-cache an uncompressed version; then you get to use the file or open the folder with minimal friction?

It's not always, there's an option to turn this off in the Safari Preferences

And if someone does not want this there is a checkbox in Preferences under "General": "Open 'safe' files after downloading". Unchecking it will prevent Safari form auto-extracting.

Have had this set for many years, I recall some early macOS malware would be present in malformed PDF files that could be downloaded in the background of an infected page & the would open/execute automatically for users who had 'Open safe files' set.

How can you for instance verify the checksum of the file when it's deleted?

It should be noted, when Safari deletes an extracted archive, it isn’t deleting it; it’s putting the archive in the Trash. You can just turn around and fish it back out. That’s one of the Trash’s roles in macOS: to serve as a place for the OS to put things that you probably don’t want, “but if you do, here’s an opportunity to grab them before they’re gone.” (I’m honestly surprised that every time you Cmd+C, the previous contents of the clipboard don’t end up as a file in the Trash. It’d be perfectly in line with the metaphor they’re going for.)


Also, fun bonus fact re: checksumming:

Apple uses the https://en.m.wikipedia.org/wiki/Xar_(archiver) file format for their own software downloads (e.g. App Store downloads; and developer-tools packages from the Apple developer website; and downloading Safari extensions, before those were rolled into the App Store; etc.). Despite Apple being seemingly the sole user of .xar, it’s not an Apple-specific format; rather, it’s developed by OpenDarwin. So you can use it too (for, at least, your macOS-targeted downloads), if you like.

A .xar file contains embedded checksums (both for the archival representations of each file, and for their extracted representations); when Safari auto-unpacks a .xar, the .xar unpacker (Archive Utility?) verifies those checksums as it does so. IIRC, if the verification fails, the extraction stops, what has been extracted so far is deleted, and the user is told the archive is broken and asked whether they want to keep it or move it to the Trash.

A neat thing about .xar extraction, is that it seemingly tags the extracted files with an xattr declaring that they’ve already been checksummed. Apple ships applications like “Install macOS Whatever.app” as a .xar containing an .app bundle containing several mountable .dmg files; normally those .dmg files would do their own checksumming when they mount, but since they came out of a .xar, they know they’ve already been checksummed recently, so they just skip the internal checksumming step. (I think this is one of the main reasons Apple chose to move to .xar; they wanted to be able to make the macOS Installer run faster, by having it not have to do any checksums of its support .dmg files during install.)

So that’s the deeper answer to your question: ultimately, Apple expects people who want archives with checksums, to use .xar or a format like .xar, that does checksumming during extraction.

> I’m honestly surprised that every time you Cmd+C, the previous contents of the clipboard don’t end up as a file in the Trash.

Purely my hypothesis: it’s too high a risk that someone accidentally leaves a password or private info in their clipboard. People don’t expect a clipboard to persist, so you’d need to re-educate everyone to avoid this “bug”, just like browser history and incognito mode.

Apple is notorious for assimilating popular third party extensions. Screencapture, night shift, colour picker , why no stack based clipboard? It’s too useful to have been overlooked. Must have been a conscious decision.

You don't. Though to Apple's credit, users who are concerned with verifying checksums probably overlap with those that take cursory steps to harden their browser by, among various steps, disallowing Safari to open "safe" documents (that's what they're called in Safari's option).

Would it be prejudice to think that the checksum verifying users are not using safari?

Might be, I've no idea. Anecdotally I use Safari - faster, I prefer the UI, and I don't trust Chrome.

The only reason I've Chrome around is for the growing number of sites that only work with Chrome. Apps made by Google, in particular, increasingly don't support macOS/Safari. Which I find infuriating, but that's another topic.

Chrome is the new IE.

Yes. Chrome is surveillance-ware now. I’d rather use Safari or Firefox than Chrome.

Convenience over security

Wouldn't the decompression fail in that case?

The idea is to publish the checksum of the archive separately. After downloading the archive you can calculate its checksum and compare with the published checksum. If they differ you known something is up (possibly bad).

When a browser helpfully decompresses the archive you can no longer perform this check.

If it fails to decompress because the file is corrupt, the browser would more then likely keep the archive?

But if someone replaces the archive with a malicious file that decompress normally, he will also probably change the listed checksum on the download page....

There are signature schemes that can fix that issue, and cases where it's useful anyway, like:

Many linux distro isos are available from several different mirrors. Having a secure hash on the original site with the links to mirrors means I don't have to trust the mirror(s).

Another case where the archive hash is useful is when there's some public key crypto involved. I can have a public key from a publisher (gotten either out-of-band or in the past) and the hash can be signed so I can verify it. These schemes would mean that an attacker would at the least need to have compromised a site for an extended period of time (if I have history with the site, the first visit it doesn't do anything extra), or in the case of out-of-band key sharing, multiple communication methods might need to be compromised for an attack to succeed.

But yes, in the common case a hash next to a file link hosted on the same domain really doesn't do anything.

The actual download might be hosted by 3rd party mirrors.

You can compare the checksum to the one on the author's site to ensure the mirror provider didn't alter the file.

Because Autorun so gud. Will people ever learn?

>Why would Chrome [...] //


If windows defender breaks scanning when it encounters a zip bomb, could that be used to mask malware later in the file?

It probably flags it as unsafe and triggers smart screen. Meaning you have to go out of your way to use the file.

As for malware that would be unzipped when using an external zip file; first you would need to trigger the zip bomb on defender but not the external tool, and second defender will still scan the individual files getting unziped by that tool.

Maybe do a self extractor that's specially crafted to ignore the zip-bomb part and only extract the malware part? Then the self extractor can execute the malicious code after extracting to memory.

I work at a threat intelligence firm which provides services for aggregating desktop antivirus engines & specialized malware detection tooling; I've submitted dozens of public and manually constructed compression-bombs for analysis (including this one). Many antivirus engines, even very small vendors, handle this case better than you might imagine.

I'm just thinking of strategies for dealing with it. Sandboxed whitebox fuzzing seems like one way of dealing with executables that could take some unknown commands directing the malicious behaviour - but you introduce the possibility of a sandbox breach making the antivirus itself the vector for execution.

The other possibility I see harks back to algorithms - build a DAG from the archive's interfile dependencies and run an iterative deepening search through the structure against some heuristic checking for malicious design.

I've never gotten into anything security related (other than reading Schneier's blog) but the cat and mouse game is fascinating.

In such a case could anything actually extract that malware and run it?

Sure, as long as that other thing knew not to treat the file as a zip, or had a mechanism to skip over or blank out the first chunk of the file.

I wonder how Windows Defender would treat something that looked like a self-extracting archive? Perhaps the archive portion could be this zip bomb affair, but the executable portion had a small change in it to bypass that and do something else nefarious instead, eg hand execution control to a point later in the file.

I don't see why not; just have the payload stored after the bomb - if there are no share dependencies (which there wouldn't be) there's no reason why they shouldn't be independently extractable.

Just don’t go posting about this on YouTube!

Is there a way to automatically send this to SSH spammers/directory scanning bots?

SSH has some kind of compression, so if you can write a Twisted ssh server sending the file as a compressed ssh packed. For detection, fail2ban provides a plugin architecture that allows it to do any action once it noticed an abuse, so you could switch the regular ssh implementation with you tricky one on the fly.

Fun project.

One should also do a lua nginx plugin for that: aggressive crawler ? Comment spammer ? Take this nice gzip HTTP response...

You don't need a zip file; just send the other side a gzip response with an endless amount of '<div>' inside. I've had some fun with that and some bots truly just stop responding after about 8 minutes of downloading div tags with no end.

This file exploits the zip container format, not the actual compression algorithm. SSH only uses the latter, not the former, so it is not applicable.

Would it be possible to do it just exploiting the algorithm?

You can make some limited deflate bombs, but they are nowhere near as massive as this one.

Does it work with gzip?

Not sure, but let's see if I can try without crashing my laptop.

EDIT: nope, steaming doesn't work, the zip relies on the fact it contains many files, and gzip assume there is only one big blog.

EDIT 2: tried with zlib but it expects a different header. So my guess is you really need to open it as an archive.

gzip and zlib (and tar) are "streaming" formats, the essay notes that "streaming" zip libraries are not affected as this bomb exploits the relationship between the central directory and the individual files.

Put it wherever in a "secret.zip" file? If you have the logs, just add a dummy page that they usually scan (/wp-admin or whatever) that automatically makes the client download the file.

SSH however works with zero trust. Clients are protected from bad servers just as servers are protected from bad clients. It shouldn't be possible to send a file. If it is, it is a serious ssh vulnerability.

It's not about downloading, it's about unzipping.

Most bots won't unzip a file they download.

But they will deflate a SSL packet.

SSL packets do not use the zip container format which this targets, though. They only use the deflate compression algorithm.

That would be something new! A website crashing your browser because http or tls compression is sending „zip bombs“.

Not sure if your post is sarcastic but if not it already exists: https://blog.haschek.at/tools/bomb.php

Usually aimed against bots though: https://hackaday.com/2017/07/08/dropping-zip-bombs-on-vulner...

Interesting. My immediate thought was, "that's awesome", followed by a plan to implement it on my own website, which gets regularly scanned for vulnerabilities.

But then two questions sprang to mind:

1. Does this eventually get your domain marked as potentially harmful in Firefox/Chrome/other browser?

2. What happens if you're fronted by a CDN like Cloudflare? I mean, I assume nginx won't be screwed over by this but, even then, will it infuriate your CDN provider and put you at risk of getting your account shut down.

My fit of vengeful glee has therefore been somewhat ablated for the time being.

1. You put it in a URL marked as "noindex-nofollow". Google will avoid it. You are supposed to only serve the page to identified spam bots anyway.

2. You create an exception so that they never cache the page and don't proxy this exact URL.

> 1. You put it in a URL marked as "noindex-nofollow".

Better yet, mark it Disallow in robots.txt - to see "noindex, nofollow", they'd still need to request the URL, running the risk to be served with the bomb.

> 2. You create an exception so that they never cache the page and don't proxy this exact URL.

They work as reverse proxies on host-basis, I don't think you can exclude a single URL. CF at least will never cache text/html (unless specifically told to), but I don't know whether they will unpack (and possibly cross-compress to a better suited compression algorithm) the content while transmitting.

I put, as a test and for fun, a "Disallow" entry in my robots.txt (with a campy name to be honest) and not a single crawler hit that dir in more than three years, don't know if others had the same experience.

I was suggesting Disallow to make sure Google doesn't request it ;) I don't know if any bots look at robots.txt to see potentially interesting URLs. I do when I take a better look at sites, but I usually don't qualify as a bot.

My experience is that most bots just hit the usual suspects, /wp-login.php, /phpmyadmin/ etc, regardless whether they are in robots.txt or not.

> My experience is that most bots just hit the usual suspects, /wp-login.php, /phpmyadmin/ etc, regardless whether they are in robots.txt or not.

Yeah, basically what I see in my logs. To be more clear, the disallow is for a non existent path in the document dir. I somewhat expected to find at least one script to actively crawl it, but it makes sense, as no sane people would put secrets on a website and protect them with a robot.txt... ^__^;

Thank you (both you and GP)!

You may be interested in an SSH tarpit like Endlessh [1].

[1]: https://github.com/skeeto/endlessh

This specifically exploits peculiarities of the PKZIP archival format (in combination with DEFLATE). SSH compression doesn't use zip, so unless the scanning bots also start downloading and processing / decompressing files they find on your honeypot, this is not going to apply (you can still use normal compression bombs to try and crash them: https://hackaday.com/2017/07/08/dropping-zip-bombs-on-vulner...)

Just add a file named "passwords.zip", or "creditcard_numbers.zip".

Hmmm.. wouldn't it be nice to add this to a honeypot webserver? if someone's looking for wp-login directories, just return zblg.zip. It would be nice if there's a 10TB option of about 0.5 - 1mb.

Facebook accesses links shared in private messages... not sure if they'll unzip it

lovely... its on my "collection" along with the "one square kilometer" pdf :


Compressed files that contain themselves:


Add the USB kill there too.


I remember being so excited when the zip format came out. Prior to zip, we suffered with the arc format. At 2400 baud, the difference in compression was very very significant.

I always preferred ARJ; I remember it having both better compression and more useful options than ZIP.

When I was a kid someone gave me a game compressed with ARJ...

Internet didn't existed yet in my country, took me 3 years to figure out how to open that file, and in the meantime I infected my computers multiple times with tons of viruses (seemly packing viruses in unzippers was popular... the one with most viruses was "pkunzip" or something like that)

And later came RAR

RAR was great at the time because it offered "solid" archiving like a tar/gz combo. Zip and ARJ both compressed at the file level rather than the archive level.

> Zip and ARJ both compressed at the file level rather than the archive level.

No. You may be thinking of gzip, a spiritual successor and replacement for the Unix compress/uncompress (and even earlier pack/unpack).

But both ZIP and ARJ, and the earlier ARC, all made multiple-file archives.

Or did you mean that the compression was “carried over” from file to file inside the archive?

The latter.

And now, IMHO, it is great because of its Fastest mode - it is really fast and at the same time compresses quite well.

And now 7z.

ARJ came after ZIP. Us poor eighties kids had to suffer with ARC.

Yea but when you got on a bbs that supported zmodem transfers...

Oh hell yes. Zmodem was the shit. Resumable transfers!

+1 :-)

Not currently detected by anyone, according to VirusTotal: https://www.virustotal.com/gui/url/d98f43e8a91f1ddb4980d4602...

A lot of modern formats - jar, Apple's pages, etc, come to mind - are just zip files with a different extension.

So which of these files which are really zip do browsers or mail programs auto-open? Anyone think of any?

DOCX, XLSX are also just ZIP files with the extension renamed.

> A lot of modern formats - jar, Apple's pages, etc, come to mind - are just zip files with a different extension.

They use zips as embedded file systems with well-specified structures, I wouldn’t expect them to blindly decompress everything. The format usually defines specific entry points (specially named files) which serve as pointers to the relevant information and link to other files. The bomb part would mostly be ignored filler there.

Snapchat messages were zip files at one point. Not sure if that's still true.

Out of curiosity, does Google open zip files to properly index what's inside them when it crawls websites?

Did anyone notice how neat the HTML was for this article? For once HTML5 elements are in use, sections, asides, figures, correct use of tables - very nice!

Only this week did I discover that yahoo and gmail don't let you send zip attachments. I thought this was a bit silly but now I am agreeing with them!

Gmail do let you send zip attachments. It just depends what's in them.

Well it wasn't happy with some source code files I tried to send the other day. None of which were natively executable and all text only. They may have had the 'x' flag set in the permissions. But I was surprised by that, not allowing me to send a bunch of php files.

Is there any way to detect these things before unzipping them?

Whenever you download any untrusted file and attempt to parse or unpack it (and this includes zips, tarballs, PDFs, even images) your program should fork off some kind of sandbox which limits CPU time, memory, disk space, and access to local resources such as the filesystem and system calls.

There are various sandboxing technologies from using simple rlimit, or a cgroup, or even running a full VM (see libvirt-sandbox) depending on the threat level and the amount of effort you want to put in vs the perceived risk.

It looks reasonably easy - the main thing you're looking for local file headers that are referenced multiple times by the central directory.

This is not invalid itself of course - some compression programs likely deduplicate files with this technique. But if it seems excessive, or it's the only thing in the archive then you've got a zip bomb.

You could probably come up with some techniques to obfuscate this of course but it'll increase the size of the archive.

I run unzip -l, it tells you which files are how big are inside.

All deflators need memory allocation and CPU counters that trigger errors on massive resource utilization. Guessing it is uncomputable like the Halting Problem to detect all zip bombs.

It's an interesting point. For classical ZIPs I actually disagree that it's needed. And then again I do agree for some other potential compression algorithms :)

In case of ZIPs: an implementation can look through the Central Directory index of files and sum up "uncompressed size" fields of all the files, and then check the sum vs the set limits - no prior decompression is needed (this is neither CPU intensive nor requires a lot of memory allocations).

The obvious "gotcha" here is that the "uncompressed size" might be declared low, while the actual data inside the compressed stream might be much higher - this is detectable only when trying to decompress, so it would seem we would fall into your idea (memory allocation / CPU counters). But that actually is not needed, as all good decompression libraries have functions to "decompress at most N bytes" - so the implementation just uses the previously declared "uncompressed size" as the limit, and therefore guarantees that the actual total decompressed size is within the checked (in previous step) total limit.

That said, I do recognize that some decompression algorithms might have possible inputs which get really CPU intensive even for a single byte, though that's not the case for typical "DEFLATE"-using ZIPs (i.e. you probably might structure the decompression stream in a way that does a lot of cache misses, but that's about it).

For non-DEFLATE compression YMMV and your method comes to mind as a decent solution.

Gmail detects this as an unsafe file.

Gmail detects almost every zip as unsafe file...

Great hack! It will take a long time before all installations everywhere are updated to check for such overlaps. This is a powerful weapon with many uses.

Are you able to theoretically look at compressed data without size Metadata and estimate its final size without actually uncompressing it?

At worst you can just run the decompression algorithm while piping the output into the void, only counting the length. If you want to be slightly more efficient you can probably write an optimized stream parser for the compressor's format and sum up the lengths of dictionary/literals/back-references while skipping any other processing, but that depends on how amenable to fast-forwarding the stream format is.

If the goal is to avoid zip bombs you can't get any more sloppy than that because then an attacker could exploit inaccuracies in your estimator.

I was thinking that you could just go through the process without writing anywhere, and show a growing "estimate". I feel like if you can get a size of each kernel and follow the instructions you could do so much faster, but I don't know enough about the in's and out of the DEFLATE algorithm and the edge cases.

Nice. Decades age we used zip bombs to destroy BBSes that were not well protected.

what happens if you put this in dropbox? (does dropbox open archives?)

The comparison table would look a little better if it nested a copy of the Cox quine inside, so that the recursively decompressed size was also infinite. :)

i have a Nigerian prince in my inbox why might be in for a treat....

In my time as a scambaiter, one of the most important lessons taught to everyone was: "Don't weaponize them"

Honest question: why? Seems like there is a nice story behind this claim.

Because anything you give them to fool them (such as fake passports or fake documents to comply with their initial demand), they will turn around and use on innocent victims.

Same goes for files that can crash your computer, they can send that to non-complying victims. Key is to waste as much of their time as possible while giving them as little value as possible.

Can it be packaged as a png?

No. This ZIP bomb workings rely on two important facts:

1. ZIP archive has multiple files.

2. ZIP is an "index+pointers" based format (meaning the Central Directory index of archive files is basically a table of pointers - or rather offsets - telling where to look for data inside the file).

Thanks to these two properties David could create a very clever compressed stream that could be (partially) re-used by multiple files inside the archive.

While one could argue that PNGs do meet the first criteria (multiple compressed separate blocks - vide https://www.w3.org/TR/PNG/#10CompressionOtherUses - do note that multiple IDATs make a single compressed stream, so one has to use these other separate blocks like iTXt, iCCP or zTXt; YMMV for animated PNGs extensions), it certainly doesn't meet the second one - it's a block/chunk format (and by definition blocks are unable to overlap).

One note here is that in case of a faulty block/chunk format parser implementation - one with integer signess/overflow problems related to block size - one might be able to pull an overlapping block trick (see Bug 2 in https://gynvael.coldwind.pl/?id=533 for an example in a different file format).

afk, sending some mails ;)

Surprised nobody commented on this before:

> A final plea

> It's time to put an end to Facebook. Working there is not ethically neutral: every day that you go into work, you are doing something wrong. If you have a Facebook account, delete it. If you work at Facebook, quit.

> And let us not forget that the National Security Agency must be destroyed.

Personally I do agree. By the way, I got to meet the author (David Fifield) in person. An extremely bright mind!

Yeah, was quite a pleasant surprise, albeit the outlook is not very comforting.


No, maybe, sorta, no, definately.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact