
Defending a website with Zip bombs - ridgewell
https://blog.haschek.at/2017/how-to-defend-your-website-with-zip-bombs.html
======
colanderman
I would think Transfer-Encoding would be a better choice than Content-
Encoding. It's processed at a lower level of the stack and _must_ be decoded –
Content-Encoding is generally only decoded if the client is specifically
interested in whatever's inside the payload. (Note that you don't have to
specify the large Content-Length in this case as it is implied by the transfer
coding.)

Also worth trying is an XML bomb [1], though that's higher up the stack.

Of course you can combine all three in one payload (since it's more likely
that lower levels of the stack implement streaming processing): gzip an XML
bomb followed by a gigabyte of space characters, then gzip that followed by a
gigabyte of NULs, then serve it up as application/xml with both Content-
Encoding and Transfer-Encoding: gzip.

(Actually now that I think of it, even though a terabyte of NULs compresses to
1 GiB [2], I bet _that_ file is itself highly compressible, or could be made
to be if it's handcrafted. You could probably serve that up easily with a few
MiB file using the above technique.)

EDIT: In fact a 100 GiB version of such a payload compresses down do ~160 KiB
on the wire. (No, I won't be sharing it as I'm pretty sure that such reverse-
hacking is legally not much different than serving up malware, especially
since black-hat crawlers are more likely than not running on compromised
devices.)

[1]
[https://en.wikipedia.org/wiki/Billion_laughs](https://en.wikipedia.org/wiki/Billion_laughs)

[2] [https://superuser.com/questions/139253/what-is-the-
maximum-c...](https://superuser.com/questions/139253/what-is-the-maximum-
compression-ratio-of-gzip/579290)

~~~
masklinn
> EDIT: In fact a 100 GiB version of such a payload compresses down do ~160
> KiB on the wire. (No, I won't be sharing it as I'm pretty sure that such
> reverse-hacking is legally not much different than serving up malware,
> especially since black-hat crawlers are more likely than not running on
> compromised devices.)

[http://www.aerasec.de/security/advisories/decompression-
bomb...](http://www.aerasec.de/security/advisories/decompression-bomb-
vulnerability.html) has a triple-gziped 100GB file down to 6k, the double-
gzipped version is 230k.

I'm trying on 1TB, but it turns out to take some time.

~~~
Filligree
One useful trick is that, for gzip, d(z(x+y)) = d(z(x) + z(y)).

So you don't need to compress the entire terabyte.

~~~
masklinn
I'd expect that to provide a lower compression, though it may not matter given
the additional followup gzips.

The compression finally finished after 3h (on an old MBP), "dd if=/dev/zero
bs=1m count=1m | gzip | gzip | gzip" yields a bit under 10k (10082 bytes), and
adding a 4th gzip yields a bit under 4k (4004 bytes). The 5th gzip starts
increasing the size of the archive.

~~~
Filligree
It does, though I once used that trick to create a file containing more
"Hello, World" lines than there are atoms in the universe. By, hmm, quite a
large factor. It probably isn't a serious concern.

It still fit on a floppy disk. :)

------
geek_at
How come when I posted this (my blog post) here I only got 2 points?
[https://news.ycombinator.com/item?id=14704462](https://news.ycombinator.com/item?id=14704462)
:D

~~~
z3t4
I'm also intrigued by this, it also happens in comments, that the top post
with 20+ upvotes can have the same content as the most down-voted post with -3
points, but not as common as reposts getting to the front page, while the
original only got 2 points.

After submitting something to HN I like to watch the HTTP logs, I get a lot of
visitors from bots, but it's actually only ca 10-20 real people that actually
read your blog. I don't know eneough of statisitcs to explain it well, but as
20 people is so small amount of the total HN readers, it's basically _luck_.
And the representation of those who reads the "new" section might be a bit
skewed from those who only reads the front page. If you want to help HN get
better with more interesting content, you can help by actually visiting the
"new" section.

~~~
mncharity
We know how to deal with this, and have for years. A bot which instruments and
invokes humans, learning about content and individuals both. Few humans are
needed each time, and those need not be experts, if used well. 20 people is
much more than enough. A candy machine, in an undergraduate lounge, can grade
CS 101 exams.1 Ah, but discussion support - from Usenet to reddit (and on HN
too), incentives do not align with need. Decades pass, and little changes.
Perhaps as ML and croudsourcing and AR mature? Civilization may someday be
thought worth the candles. Someday?

1 [http://represent.berkeley.edu/umati/](http://represent.berkeley.edu/umati/)

Edit: tl;dr: Future bot: "I have a submission. It has a topic, a shape, and
other metrics. It's from a submitter, with a history. Perhaps it has comments,
also with metrics, from people also with histories. I have people available,
currently reading HN, who all have histories. That's a lot of data - I can do
statistics. Who might best reduce my optimization function uncertainty? I
choose consults, draw the submission to their attention, and ask them
questions. I iterate and converge." Versus drooling bot: "Uhh, a down vote
click. Might be an expert, might be eternal September... duh, don't know,
don't care. points--. Duh, done."

~~~
mncharity
Hmm, two downvotes. My first. So comments would be especially welcome.

Context: The parent observed that with a small number of people up/down
voting, the result was noisy. I observed the numbers were sufficient, if the
system used more of the information available to it. And that the failure to
do so is a long-standing problem.

Details, in reverse order: "Civilization": Does anyone _not_ think the state
of decision and discussion support tech is a critical bottleneck in
engineering, business, or governance? "AR": A principle difficulty is always
integrating support tech with existing process. AR opens this up greatly. At a
minimum, think slack bots for in-person conversations. "crowdsourcing": or
human computation, or social computing, is creating hybrid human-computer
processes, where the computer system better understands the domain, the humans
involved, and better utilizes the humans, than does a traditional systems.
"ML": a common way to better understand a domain. As ML, human computation,
textual analysis, etc, all mature, the cost and barrier to utilizing them in
creating better discussion systems declines. "Usenet": Usenet discussion
support tooling plateaued years before it declined. Years of having a problem,
and it not being addressed. Of "did person X ever finishing their
research/implementation of tool Y". "decades": that was mid-1990's, two
decades ago. "little changes": Discussion support systems remains an "active"
area of research - for a very low-activity and low-quality value of "active".
I'm unclear on what else could be controversial here.

For anyone who hasn't read the paper, it's fun - it was a best paper at SIGCHI
that year, and the web page has a video. A key idea is that redundant use of a
small number of less-skilled humans (undergraduates grading exam questions),
can if intelligently combined, give performance comparable to an expert human
(graduate student grader). Similar results have since been shown in other
domains, such as combining "is this relevant research?" judgments from cheaper
less-specialized doctors with more-expensive specialized ones. On HN, it's not
possible to have a fulltime staff of highly skilled editors. But it _is_
technically plausible to use the exiting human participants to achieve a
similar effect. That we, as a field, are not trying very hard, reflects on
incentives.

------
Scryptonite
Reminds me of a time I once wrote a script in Node to send an endless stream
of bytes at a slow & steady pace to bots that were scanning for vulnerable
endpoints. It would cause them to hang, preventing them from continuing on to
their next scanning job, some remaining connected for as long as weeks.

I presume the ones that gave out sooner were manually stopped by whoever
maintains them or they hit some sort of memory limit. Good times.

~~~
reitanqild
I do the same to "Microsoft representatives" that call me because I have "lots
of malware on my computer".

Keep them on line by being a very dumb customer until they start cursing and
hang up on me. : - )

~~~
Gaelan
1-347-514-7296 is a phone number that automates this. Add it to a conference
call, and frustrate the caller with no additional work.
[http://Reddit.com/r/itslenny](http://Reddit.com/r/itslenny) is the closest
thing it has to an official site.

~~~
throwanem
It just rang forever when I tried it...

~~~
Gaelan
Ah, apparently you need to be whitelisted, my bad. There’s more information in
the Reddit link.

EDIT: Nope, apparently the person who runs it takes it down at night for some
reason. Maybe to minimize people using it as a prank call?

~~~
cdubzzz
You do have to get whitelisted though, last I checked. I was able to use it
one time and now if I call it I get a message about whitelisting. Same for
JollyRoger.

------
ruytlm
Interesting and related re attacks on a Tor hidden service:
[http://www.hackerfactor.com/blog/index.php?/archives/762-Att...](http://www.hackerfactor.com/blog/index.php?/archives/762-Attacked-
Over-Tor.html)

And the follow up:
[http://www.hackerfactor.com/blog/index.php?/archives/763-The...](http://www.hackerfactor.com/blog/index.php?/archives/763-The-
Continuing-Tor-Attack.html)

~~~
vgb2k11
Very interesting read indeed. I've a question about it; the article is about
defeating malicious crawlers/bots affecting a TOR hidden service, so my
question is, how might the author differentiate bot requests from standard
client requests on a request-by-request basis? I mean, can I assume that many
kinds of requests arrive at hidden service through shared/common relays? Would
this mean other fingerprinting methods (user agent etc) would be important,
and if so, what options remain for the author if the attackers dynamically
change/randomise their fingerprint on a per-request basis?

------
compguy
Wait a minute... He is doing the exact same thing as the former RaaS
(ransomware as a service) operator Jeiphoos (he operated Encryptor RaaS). It's
know that Jeiphoos is from Austria. Exactly one year after the shutdown of the
service, someone from Austria is publishing the exactly same thing an Austrian
ransomware operator were doing a year ago.

~~~
jagermo
Aha! the Hackernews detectives are on the case!

------
avaer
Does anyone know if this kind of white hat stuff has been tested by law?

Because it seems in the realm of possibility that if a large botnet hits you
and your responses crash a bunch of computers you could do serious time for
trying it. I'm hoping there's precedent against this...

~~~
vbezhenar
There are laws allowing person to shoot intruder in their house. And I can't
serve nulls from my own web server? That would be ridiculous.

~~~
test1235
From what I've read, in some parts of America it seems okay to shoot at
intruders running away from your house, which I find unreasonable.

A farmer here in UK stirred up a whole load of shit when he shot two burglars
[1] trying to escape from his property.

[1]
[https://en.wikipedia.org/wiki/Tony_Martin_(farmer)](https://en.wikipedia.org/wiki/Tony_Martin_\(farmer\))

~~~
rmc
The UK (or English?) law about self defence is "back to the wall", i.e. you
can invoke leathal force to defend your own life when your back is against the
wall, when you have no other option, and no way to escape. In other words, if
you can retreat from the situation, then you must retreat.

Some places in the USA have "stand your ground" laws. These say you aren't
required to retreat, that you can "stand your ground", that you can use
(legally) leathal force without requiring that your back is against the wall.

~~~
rocqua
As I recall, stand your ground laws, based on castle doctrine, means that "but
you could have fled your own home" does not invalidate self-defence. I think
you are still required to retreat when on the street.

As for people running away, the only way I see self defence working is when
they still pose an 'imminent threat to life' which seems rather hard to argue.

~~~
test1235
Your last line is the bit I've never been able to understand. If someone is
running from you, do you have any legal argument for killing them?

~~~
suneilp
What if they have stolen your property. Do you not have the right to get it
back by force? Does the value of the property matter? If so, who gets to
decide that in the moment?

~~~
chillydawg
You have zero right to kill someone for stealing.

~~~
technofiend
In Texas you may make use of your weapon to stop the execution of a crime if
you yourself are not also engaged in criminal activity. It's far larger than
castle doctrine because it applies anywhere.

I'm not arguing for actually using the law to shoot people: I don't ever want
to be in that situation myself, but I'm saying depending on the situation you
do in fact have the law on your side.

------
matt_wulfeck
This is why web crawlers are built with upper boundaries on _everything_!

Nobody malicious brings down crawlers. It's just unexpected things you find
out on the internet.

~~~
jacquesm
> Nobody malicious brings down crawlers.

You're wrong about that. I've more than once brought down crawlers on purpose,
especially the ones that didn't respect robots.txt.

------
eyuelt
The article says that 42.zip compresses 4.5 petabytes down to 42 bytes. It
should say 42 _kilobytes_.

I don't see a way to comment on the article itself, but hopefully the author
reads this.

~~~
ben174
Thank you. I was going crazy trying to think of what the contents of that 42
bytes would have been.

~~~
rootlocus
Without any headers, metadata or padding and using RLE one byte for 0 and 8
bytes for the number of zeroes: 10^15 will easily fit 9 bytes and can be used
to generate a file filled with one petabyte of zeroes.

~~~
gberger
Well, yes, but that's not ZIP encoding.

------
TekMol
I don't think this "Defends" your website. If anything, it draws attention to
it.

Might also be used for some kind of reflection attack. Want to kill some
service that let's users provide a url (for an avatar image or something) -
point it to your zip bomber.

~~~
oelmekki
To be fair, people wanting to do that don't need author to have create a zip
bomber, they can do it by themselves.

Actually, I don't see how to defend this. Is there any way to ask a gzip file
which size it will be once unzipped, without needing to decompress it?

~~~
danesparza
I think this is exactly what the HTTP 'HEAD' verb is for:
[https://developer.mozilla.org/en-
US/docs/Web/HTTP/Methods/HE...](https://developer.mozilla.org/en-
US/docs/Web/HTTP/Methods/HEAD)

~~~
oelmekki
Wouldn't a HEAD give the size of the zipped content, and not the size of the
content once decompressed?

~~~
danesparza
This is a great question. I'm actually not sure, since gzip encoding/decoding
is built-in to several webservers and browsers.

------
jacquesm
A friend of mine has a very useful little service that tracks attempts to
breach servers from all over the world:

[https://www.blockedservers.com/](https://www.blockedservers.com/)

It's a lot more effective to kill the connection rather than to start sending
data if you're faced with a large number of attempts.

~~~
leipert
Interesting page :) I'll write your friend an email with improvements for the
site. The text shadow in the code blocks make them barely readable and the map
color coding is bad for color blind people.

~~~
jacquesm
I'm sure he will appreciate that. He's sysadmin at a Dutch IPSP and this is
his side project.

------
cypherpunks01
This is like the soft equivalent of leaving a USBKill device in your backpack,
to punish anyone who successfully steals it and tries to comb through your
data.

~~~
Pitarou
Way to make friends with the TSA. ;-)

~~~
rocqua
Is this actually illegal? I'm sure they'd arrest you, but could they actually
charge you?

I guess if you see them try to use the USB killer, you'd be obligated to
report it. Otherwise I don't think its an issue.

~~~
Pitarou
Regardless of what the law says, in practice, breaking TSA equipment,
interfering with TSA duties and, above all else, pissing off a TSA officer are
all arrestable offences.

If there’s any question about culpability, all they have to do is ask you, “Is
there anything in your baggage you think we should know about?” and if you
don’t disclose it then, you’re screwed.

What? You say you were never asked such a question? You say you even tried to
warn them? Well I have sworn testimony from a TSA officer that says they
ALWAYS ask that question, and you’re the guy who was caught carrying a piece
of equipment designed for trickery and vandalism. Case dismissed.

------
bberenberg
I think the natural next step is to make this into a Wordpress plugin.

------
hirsin
This would be an entertaining way of dealing with MITM agents as well, over
HTTP. As long as the client knows not to open the request, you could trade
them back and forth with the MITM spy wasting tons of overhead.

~~~
steego
It would be an interesting way of streaming data if both sides used a custom
decompression algorithm that skipped n bytes without allocating it anywhere.

The payload could be encrypted text of two chat bots talking jibberish.

~~~
mirimir
Now _that 's_ very interesting. Maybe hack a custom ssh with this feature.
Adversaries that intercepted data or attempted MitM would be inconvenienced.

Edit: Or even more useful, bbcp. Which is the _best_ file transfer app that
I've ever used.

~~~
danesparza
This is similar to kippo or cowrie, SSH honeypots:

[https://github.com/desaster/kippo](https://github.com/desaster/kippo)

[https://github.com/micheloosterhof/cowrie](https://github.com/micheloosterhof/cowrie)

~~~
mirimir
I meant more than a honeypot. But rather, a functional SSH app that messes
with standard SSH apps and libraries.

------
petre
Another method is wasting attackers' time by sending out a character per
second or so. It works so well for spam, that OpenBSD includes such a _spamd_
honeypot.

~~~
hossbeast
I would love to know how to configure this for ssh connection attempts

~~~
dredmorbius
fail2ban, in a general sense.

~~~
pmlnr
No, this is a tarpit. Fail2ban simply rejects or drops vua iptables.

~~~
dredmorbius
[https://gist.github.com/Belphemur/82d27b1b6dfd675d15f2](https://gist.github.com/Belphemur/82d27b1b6dfd675d15f2)

Tarpit Action for Fail2ban with rate limit

------
banku_brougham
We need some legal advice in this thread.

What if the compressed file is plausibly valid content? How could intent be
malicious if a request is served with actual content?

~~~
eksemplar
In this day and age, finding a vulnerability in a system like a mistakenly
open API and running a script to call it a few times to investigate the
weakness is considered hacking.

It probably shouldn't be, but law is funny that way.

Intentionally sending a zip bomb could potentially get you in trouble as well.
Especially if you're just one private person or a small company without a
legal division to brush it off.

There isn't a real black/white interpretation though, at least not outside the
US (where there may be history to influence ruling on the subject), and
obviously most victims wouldn't report you, but more often than not you
wouldn't want to test interpretation of IT related law.

------
ajarmst
Reminds me a bit of Upside-Down-Ternet: [http://www.ex-parrot.com/pete/upside-
down-ternet.html](http://www.ex-parrot.com/pete/upside-down-ternet.html)

------
em3rgent0rdr
Defending by throwing things back at the attacker, instead of simply locking
your door.

~~~
ajarmst
This is more that the thief is parked at your front door permanently trying to
pick the lock, so you replace the valuables he's looking for with big chunks
of lead.

~~~
bulatb
More like hungry bears.

------
Theizestooke
A great way... to provoke a war with people running botnets.

------
ianai
This could also be seen as a bug on the browser side. I'd also be interested
in the browser results for the petabyte version.

I wonder if there's room to do this with other protocols? Ultimately we want
to crash whatever tool the scriptkiddy uses.

~~~
tyingq
I thought of http2's hpack. It does have built in protection though...the
client sets a maximum header table size. Which encourages client
implementations to think about it.

------
ioquatix
About a month ago one of my websites was being scraped. They were grabbing
JSON data from a mapping system.

I replaced it with a GZIP bomb. It was very satisfying to watch the requests
start slowing down, and eventually stop.

------
DamonHD
Interesting!

That also crossed with another thought about pre-compressing (real!) content
so that Apache can serve it gzipped entirely statically with sendfile() rather
than using mod_deflate on the fly, so unless I've misunderstood I think that
bot defences can be served entirely statically to minimise CPU demand. I don't
mind a non-checked-in gzip -v9 file of a few MB sitting there waiting...

[http://www.earth.org.uk/note-on-site-
technicals.html](http://www.earth.org.uk/note-on-site-technicals.html)

------
merricksb
Similar topic a couple of months ago:

[https://news.ycombinator.com/item?id=14280084](https://news.ycombinator.com/item?id=14280084)

------
dveeden2
Directly serving /dev/zero or /dev/urandom also gives interesting results. (Be
aware of bandwidth costs)

~~~
vgb2k11
Oh this seems quite an interesting experiment. Curious though if this defence
poses no additional risks (beside bandwidth) on the server. I mean, is there
any significant chance that the random data could cause a glitch on the server
implementation?

------
Pitarou
Wow, you killed Tails!

I tried visiting the payload site with Tails OS (a Linux distro for privacy
minded) and the whole OS is frozen.

------
jscheel
Both ZIP and GZIP file formats store the uncompressed filesize in their
headers. You could stream and check for these headers to determine if the a
zip bomb is being delivered. Obviously something script-kiddies aren't going
to do, but the scripts they use can be improved and redistributed fairly
easily.

~~~
vgb2k11
Could the head be spoofed in such a way that the header says 1MB, or might the
clients/bots be typically strict on ensuring header values are valid? I think
your raised issue is important though, and any serious client/bot should be
ignoring files with 1KB -> 1GB decompression ratios.

------
iopuy
Was there a reduction in ip's that Fail2ban would have picked up but instead
were treated with the zip bomb?

------
brian-armstrong
Do browsers protect against media served with Content- or Transfer-Encoding
like this? If you use something that lets you embed images, what's to stop you
from crashing the browser of anyone who happens to visit the page your "image"
is on?

~~~
TazeTSchnitzel
Nothing. I mean, crashing browsers with a client-side DoS is possible in many
ways.

With some horrible WebGL code I've crashed the macOS compositor before.

------
wooptoo
A similar `slow bomb` could be created for attempted ssh connections to a host
using a sshrc script. For example clients which do not present a key, just
keep them connected and feed them garbage from time to time. Or rickroll them.

------
oceanbreeze83
doesn't this incur large bandwidth data charges for the defender?

~~~
zspitzer
no, it's just sending a tiny zip file, decompression occurs at the other end

~~~
chamakits
10 MB (the compressed GZIP given in the example) can be considerable. Even
more so if you consider just how frequently bots are hitting those wp
endpoints.

~~~
toomuchtodo
Non-cloud providers don't rake you over the coals for transfer.

------
ilurkedhere
Wouldn't all but the most naive scanners use time-out settings, maximum
lengths on bytes read etc?

~~~
vgb2k11
> Wouldn't all but the most naive scanners use time-out settings, maximum
> lengths on bytes read etc?

It wouldn't save a scanner from crashing to use a time-out or max read bytes.
The defense can send the 100kb zipped data in a matter of seconds. The client
then decompresses the zipped data which expands to gigabytes, causing crashes
by out-of-memory.

~~~
ilurkedhere
Was thinking more about a maximum length for the decompression stage.

~~~
vgb2k11
User ruytlm has posted links to hacker factor blog, and it seems some
sophisticated scanners (e.g., Eddie) were crashed by the exploit. In that blog
the author postulates that Eddie is a nation-state level (not script kiddie)
scanner, so I'd say that the answer to this question will be in your
definition of naive. It's tempting to qualify any scanner which crashes on
this as naive though, I'd agree. Especially moving forward with the publicity
of this post/topic.

Well actually from memory the author of the blog was doubtful if this exploit
actually crashed Eddie or not, but it did crash the other bots (Eddie V1 did
go offline, possibly as a crash), so it would appear you are correct. Only
truely naive bots might well be affected by this.

------
glenscott1
What are good strategies for protecting your website against ZIP bomb file
uploads?

------
justusthane
Ironically, it looks like the site has been DOSd by HN.

------
a1exus
you better have unlimited bandwidth to try 10G))

~~~
DamnInteresting
In the example, the server only sends 10MB. The data is 10GB only after
unzipping, which occurs on the client.

------
late2part
brilliant

------
vacri
Interesting, on FF54 the test link pegs a CPU but the memory doesn't rise.
Eventually it stops and CPU returns to normal. But then I did a 'view source',
and the memory use rose until the browser got oomkilled (20GB free ram + swap)

~~~
bluedino
I wonder is the browser is smart enough to not decompress when viewing or just
uses it at a stream?

------
aluhut
* Firefox: Memory rises up to 6-7gig, then just loads endlessly. Tab closable.

------
futang44
Just tried it using piedpiper's middle-out algorithm and seeing astonishing
results. It's so simple! D2F.1 = D2F.2, D2F.3 = D2F.4

