
Exploding Git Repositories - ingve
https://kate.io/blog/git-bomb/
======
hathawsh
I wonder what the author means by "a lot" of RAM and storage. I tried it for
fun. The git process pegged one CPU core and swelled to 26 GB of RAM over 8
minutes, after which I had to kill it.

~~~
wscott
Yeah I tried it too. Killed at 65G. Disappointed that Linux killed Chrome
first.

    
    
        Oct 12 15:47:52 x99 kernel: [552390.074468] Out of memory: Kill process 7898 (git) score 956 or sacrifice child
        Oct 12 15:47:52 x99 kernel: [552390.074471] Killed process 7898 (git) total-vm:65304212kB, anon-rss:63789568kB, file-rss:1384kB, shmem-rss:0kB
    

Edit:

Interesting. Linux didn't kill Chrome, it died on its own.

    
    
        Oct 12 15:42:21 x99 kernel: [552060.423448] TaskSchedulerFo[8425]: segfault at 0 ip 000055618c430740 sp 00007f344cc093f0 error 6 in chrome[556188a1d000+55d1000]
        Oct 12 15:42:21 x99 kernel: [552060.439116] Core dump to |/usr/share/apport/apport 16093 11 0 16093 pipe failed
        Oct 12 15:42:21 x99 kernel: [552060.450561] traps: chrome[16409] trap invalid opcode ip:55af00f34b4c sp:7ffee985fb20 error:0
        Oct 12 15:42:21 x99 kernel: [552060.450564]  in chrome[55aeffb76000+55d1000]
        Oct 12 15:47:52 x99 kernel: [552390.074289] syncthing invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0, oom_score_adj=0
    

Seems Chrome faulted first, but it was probably capturing all signals and
didn't handle OOM. Then next, syncthing faulted and it started the oom-killer
which correctly selected 'git' to kill.

~~~
Tharre
> [..] and didn't handle OOM.

How would Chrome 'handle' an OOM anyway? As far as I'm aware, malloc doesn't
return ENOMEM when the system runs out of memory, only when you hit RLIMIT_AS
and alike.

~~~
exikyut
Or when you hit 4G VIRT on 32-bit.

Took me a good day's worth of debugging before some bright spark piped up and
said "wait, you said you were on x86-32...?"

...yeah, I use really old computers.

~~~
katastic
I'm setting up my last machine for my wife for gaming. Athlon X4 630, and 16
GB of RAM. I loaded windows up and said it had ~2 GB free and I was like "oh
crap, the RAM sticks must be dead" (because the last motherboard that I just
replaced broke some RAM slots).

I fixed my old video card, a GTX 560, and wanted to see what it could run. I
loaded steam and PUBG said "invalid platform error". It took me a moment. I
hit alt-pausebreak, presto, Windows 32-bit. Whoops.

Hadn't had that problem in a long time except at clients running ancient
windows server versions complaining about why Exchange 2003 won't work with
their iPhones anymore "it used to work and we didn't change anything!"
(Yeah... but the iPhone DID change--including banning your insecure 2003
Exchange protocols.)

------
timdorr
I'm curious how this was uploaded to GitHub successfully. I guess they do less
actual introspection on the repo's contents than I thought. Did it wreak havoc
on any systems behind the scenes (similar to big repos like Homebrew's)?

~~~
enzanki_ars
I too was curious about this.

[https://github.com/Katee/git-
bomb/commit/45546f17e5801791d4b...](https://github.com/Katee/git-
bomb/commit/45546f17e5801791d4bc5968b91253a2f4b0db72) shows:

"Sorry, this diff is taking too long to generate. It may be too large to
display on GitHub."

...so they must have some kind of backend limits that may have prevented this
for becoming an issue.

I wonder what would happen if it was hosted on a GitLab instance? Might have
to try that sometime...

~~~
ballenf
Since GitHub paid a bounty and Ok'd release, perhaps they've patched some
aspects of it already. Might be impossible to recreate the issue now.

My naive question is whether CLI "git" would need or could benefit from a
patch. Part of me thinks it doesn't, since there are legitimate reasons for
each individual aspect of creating the problematic repo. But I probably don't
understand god deeply enough to know for sure.

~~~
mnx
is this a git->god typo, or a statement about your feelings towards Linus?

~~~
warent
Please don't let Linus read this

------
JoshMnem
Because that page is AMP by default, it takes about 7 seconds to load the page
on my laptop. AMP is _really_ slow in some cases.

Edit: see my comment below before you downvote me.

~~~
katee
Huh, I've tested on a bunch of devices/connections and haven't encountered
that. Do you know what causes AMP to be that slow for you? I'll take a look at
serving non-AMP pages by default. It will require tweaking how image inclusion
works.

~~~
JoshMnem
For people who use extensions or browsers that block third party JS, AMP pages
will take many seconds to load in non-mobile Web browsers.

Here is information about some of the other problems with AMP:

[https://www.theregister.co.uk/2017/05/19/open_source_insider...](https://www.theregister.co.uk/2017/05/19/open_source_insider_google_amp_bad_bad_bad/)

[https://danielmiessler.com/blog/google-amp-not-good-
thing/](https://danielmiessler.com/blog/google-amp-not-good-thing/)

[https://ethanmarcotte.com/wrote/ampersand/](https://ethanmarcotte.com/wrote/ampersand/)

[https://css-tricks.com/need-catch-amp-debate/](https://css-tricks.com/need-
catch-amp-debate/)

[https://daringfireball.net/linked/2017/01/17/schreiber-
amp](https://daringfireball.net/linked/2017/01/17/schreiber-amp)

~~~
xpaulbettsx
Fix your browser /shrug

~~~
JoshMnem
It isn't just my browser. AMP performs very badly in some non-mobile browsers
(no extensions).

------
pmoriarty
Why not just always run git under memory limits?

For example:

    
    
      %  ulimit -a
      -t: cpu time (seconds)              unlimited
      -f: file size (blocks)              unlimited
      -d: data seg size (kbytes)          unlimited
      -s: stack size (kbytes)             8192
      -c: core file size (blocks)         0
      -m: resident set size (kbytes)      unlimited
      -u: processes                       30127
      -n: file descriptors                1024
      -l: locked-in-memory size (kbytes)  unlimited
      -v: address space (kbytes)          unlimited
      -x: file locks                      unlimited
      -i: pending signals                 30127
      -q: bytes in POSIX msg queues       819200
      -e: max nice                        30
      -r: max rt priority                 99
      -N 15:                              unlimited
      %  ulimit -d $((100 * 1024)) # 100 MB
      %  ulimit -m $((100 * 1024)) # 100 MB
      %  ulimit -l $((100 * 1024)) # 100 MB
      %  ulimit -v $((100 * 1024)) # 100 MB
      %  git clone https://github.com/Katee/git-bomb.git
      Cloning into 'git-bomb'...
      remote: Counting objects: 18, done.
      remote: Compressing objects: 100% (6/6), done.
      remote: Total 18 (delta 2), reused 0 (delta 0), pack-reused 12
      Unpacking objects: 100% (18/18), done.
      fatal: Out of memory, malloc failed (tried to allocate 118 bytes)
      warning: Clone succeeded, but checkout failed.
      You can inspect what was checked out with 'git status'
      and retry the checkout with 'git checkout -f HEAD'

------
ericfrederich
Run this to create a 40K file which expands to 1GiB

    
    
      yes | head -n536870912 | bzip2 -c > /tmp/foo.bz2
    

I would imagine you could do something really creative with ImageMagick to
create a giant PNG file as well that'll make browsers, viewers, editors crash
as well.

~~~
tedunangst
PNG has dimensions in the header so the decoder should know when it's
decompressed enough.

------
warent
Odd. It's surprising to me that this example runs out of memory. What would be
a possible solution?

Admittedly I don't know that much about the inner-workings of git, but off the
top of my head, perhaps something with traversing the tree depth-first and
releasing resources as you hit the bottom?

~~~
ericfrederich
You need a problem to have a solution to it. What do you consider to be the
problem here?

This is essentially something that can be expressed in relatively few bytes
that expands to something much larger.

Imagine I had a compressed file format for blank files "0x00" the whole way.
It is implemented by writing in ascii the size of the uncompressed file.

So the contents of a file called terrabyte.blank is just ascii "1000000000000"
... or the contents of a file called petabyte.blank is "10000000000000"

I cannot decompress these files... what is the solution?

~~~
geezerjay
>You need a problem to have a solution to it. What do you consider to be the
problem here? > >This is essentially something that can be expressed in
relatively few bytes that expands to something much larger.

That seems to be the problem. I mean, if an object expands to something much
larger to the point that it crashes services just by the sheer volume of the
resources it takes... That is pretty much the definition of an attack vector
of a denial-of-service attack.

~~~
TeMPOraL
There is a problem here, but it's not with data. It's with the service.

Being able to express trees efficiently in a data format is an useful feature,
but it requires the code processing it not to be lazy and assume people will
never create pathological tree structures.

------
gwerbin
Would this be possible with a patch-based version control system like Darcs or
Pijul? Does patch-based version control have other analogous security risks,
or is it "better" in this case?

~~~
fanf2
If the patch language includes a recursive copy than it's possible to
reproduce this problem in that setting.

~~~
geezerjay
If I understood correctly, this problem isn't caused by recursive copies but
simply by expanding references. The example shows that the reference expansion
leads to an exponential increase in resources required by the service.

~~~
TeMPOraL
This means the same in this context; if it was just expanding references one
by one while walking through the tree this would not happen - the bomb
requires copies of expanded references to be stored in memory.

------
emeraldd
Bare for the win.

    
    
        git clone https://github.com/Katee/git-bomb.git --bare

------
TeMPOraL
Going to second level on Github breaks commit name for me - it gets stuck with
"Fetching latest commit..." message. Curiously, go one level deeper and the
commit message is again correct.

[https://github.com/Katee/git-
bomb/tree/master/d0/d0](https://github.com/Katee/git-bomb/tree/master/d0/d0)

(INB4 The article suggests Github is aware of this repo, so I have no qualms
posting this link here.)

------
infinity0
Directory hard links would "fix" this issue since `git checkout` could just
create a directory hard link for each duplicated tree. I wonder why
traditional UNIX does not support this for any filesystem.

(Yes you would need to add a loop detector for paths and resolve ".."
differently but it's not like doing this is conceptually hard.)

------
breakingcups
Has anyone tried to see how well BitBucket and Gitlab handle this?

------
Retr0spectrum
What happens if you try to make a recursive tree?

~~~
ethomson
As in a tree that points to itself? You cannot, since a tree would have to
point to its own SHA1. So this would require you to know your own tree's SHA
and embed it in the tree.

~~~
mv4
Reminded me of the GIF that displays its own MD5 hash:

[https://twitter.com/bascule/status/838927719534477312](https://twitter.com/bascule/status/838927719534477312)

~~~
jwandborg
So it's possible, but impractical?

~~~
mv4
I think it's possible.

------
porfirium
If we all click "Download ZIP" on this repo we can crash GitHub together!

Just click here: [https://codeload.github.com/Katee/git-
bomb/zip/master](https://codeload.github.com/Katee/git-bomb/zip/master)

~~~
abritinthebay
Wouldn't that just do a `git fetch` and therefore not have the issue?

~~~
minitech
"Download ZIP" downloads the repository’s files as a zip. No Git involved for
the downloader.

~~~
chii
i expect the download zip to be implemented as running 'git archive --format
zip | write-http-response-stream'

~~~
mschuster91
Hmm I'd hope they do a caching step in between ;)

------
kowdermeister
I thought it would self destruct after cloning of forking before clicking :)

