
How to dismantle a compiler bomb - gregorymichael
https://codeexplainer.wordpress.com/2018/01/20/how-dismantle-compiler-bomb/
======
PaulHoule
I had a famous machine learning professor send me a C program that crashed in
init on a 32-bit machine because it allocated a 8GB array...

It was particularly strange to set a breakpoint at the beginning of main()
with gdb and see the program never got there. Oddly enough, he never actually
used the 8GB array, even though he had no problem allocating the array on the
POWER workstation he was using.

~~~
Aaargh20318
Not sure what OS he used but some operating systems, Linux for example,
overcommit memory. You can alloc an 8GB array just fine and as long as you
don’t use it no actual memory gets used.

That’s why it worked on his machine. It didn’t work on yours because of the
lack of address space.

~~~
kragen
The typical cause of this problem is not that the program runs out of memory
but that it overflows its stack. In this case, of course, 8GiB would wrap
right back around to the start — but I would think that would result in
failing to generate an executable, not generating a crashing executable,
unless the compiler was super slapdash.

~~~
pjc50
Compilers rarely have any idea how large your stack space is. Quite often it's
determined at runtime in various ways, especially in multithreaded
environments.

(There are some surprising exceptions such as PIC hardware stacks where you
might be allowed exactly 8 call frames, and your whole program's call stack
must be a DAG with no recursion)

~~~
kragen
Okay, but in this particular case, we're talking about an 8GiB array on a
32-bit platform. Its size truncated to 32 bits is 0x00000000. If the compiler
doesn't detect that, it's a compiler bug.

------
pjc50
I'm reminded of the C++ exploding error competition:
[https://tgceec.tumblr.com/post/74534916370/results-of-the-
gr...](https://tgceec.tumblr.com/post/74534916370/results-of-the-grand-c-
error-explosion)

------
kochb
Also interesting are the other compiler bomb submissions:

[https://codegolf.stackexchange.com/questions/69189/build-
a-c...](https://codegolf.stackexchange.com/questions/69189/build-a-compiler-
bomb)

All are short bits of code in a variety of languages that expand to massive
files.

------
tadeegan
This would be an interested DDOS attack for open source CI systems that check
pull requests. Especially when combined with some sort of distributed network
build cache like bazel, you could easily fill it the cache by making a few
pull requests with this.

------
nur0n
The main practical takeaway is to prefer iterators over pre-generated arrays.

Also, TIL according to the C standard, 'main' is not a reserved identifier!
([https://stackoverflow.com/questions/34764796/why-does-
declar...](https://stackoverflow.com/questions/34764796/why-does-declaring-
main-as-an-array-compile))

If anyone can clarify: I assumed that the gcc read literals directly into
their 'target' type, but it seems like some literals (such as '-1u') are read
as signed integers first then typecasted to the the target type?

~~~
derefr
> according to the C standard, 'main' is not a reserved identifier!

In fact, main() is just a convention of libc. You can have C without libc.
(Such as when writing a kernel!)

Now, attempting to link a standalone executable without a '_start' symbol, on
the other hand...

~~~
shakna
> In fact, main() is just a convention of libc.

No, it's not just a convention. 'main' is defined as the execution entrypoint
in at least the C11 [0], C90 [1] standards. Both have both these forms
defined:

    
    
        int main(void) {}
    
        int main(int argc, char* argv[]) {}
    

You don't have to follow that convention, but then it becomes implementation-
defined behaviour.

C without libc can still expect to have a main. One doesn't imply the other.
It's just without a main, you also have to manually link to _start as well.

[0] [http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n1570.pdf](http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n1570.pdf)

[1] [http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n1256.pdf](http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n1256.pdf)

------
bcatanzaro
I'm a little baffled to discover -1u is exactly 0xFFFFFFFF — the maximum value
of unsigned int.

I would have expected a type error.

Likely there's some important code out there that relies on this strange
behavior.

~~~
thomasrognon
This is common knowledge for C programmers. In the embedded/firmware space,
I've seen #define UINT_MAX (unsigned int)(-1) very often. It's convenient
because it is always the maximum unsigned integer value regardless of whether
int is 8/16/32/64 bits.

~~~
dfox
According to standard, (unsigned int)(-1) is undefined behavior (as is signed
overflow) because the machine can use some other representation of signed
integers than twos-complement. On the other hand you will probably never find
non-twos-complement architecture in any vaguely production use today.

~~~
comex
Nope, casting signed to unsigned is well-defined. The C standard requires it
to act like two’s complement regardless of what the machine actually uses:

> Otherwise, if the new type is unsigned, the value is converted by repeatedly
> adding or subtracting one more than the maximum value that can be
> represented in the new type until the value is in the range of the new type.

[https://stackoverflow.com/questions/50605/signed-to-
unsigned...](https://stackoverflow.com/questions/50605/signed-to-unsigned-
conversion-in-c-is-it-always-safe)

~~~
popmatrix
~0u is also a safe equivalent if there is unwarranted fear around wrapping.

------
jotux
Why doesn't the {1} in this case initialize the array to all 1s?

~~~
bArray
It's because you can initialize the array with particular values, like: `type
a[3] = { 1 , 2, 3 };`.

But I agree that it would make sense to fill the array (or have an easier
method to do that), it may just be an argument of speed. I think OSes
generally hand over uninitialized memory zero'ed out to prevent reading of
memory from previous programs - so it's a case of allocating the memory space
and then continuing, as opposed to setting values for each position.

~~~
jotux
Until now, I always thought this:

    
    
        int array[5] = {X};
    

Could be be used to set all values of the array to X, but have never used it
for anything other than 0. That's somewhat surprising behavior.

------
early
Or when you use range instead of xrange with big numbers (python)...

~~~
slig
Not an issue anymore on Python 3.

~~~
pjmlp
Because xrange became range.

------
tyingq
Mildly interesting at best. Other languages have range operators you can
abuse, or other similar tricks with macros, constants, etc. I'm not sure this
is a problem that needs attention.

~~~
bpicolo
If you're compiling/running random unverified source you have plenty of other
problems already. :)

------
flavio81
TL;DR: a one liner source file tells the compiler to set main as a 4GB array;
in the process the compiler might run out of memory.

~~~
waynecochran
s/4GB/4GBx4 = 16GB/

~~~
vog
To add to this, a "tl;dr" is only useful if written by people who actually
read the whole article. That is, written by people who did not apply "tl;dr"
themselves.

In this particular case, the article clearly says:

 _> The array will contain 4294967295 integers, each with the size of four
bytes, taking up 17179869180 bytes in total._

Later, the article again stresses again that you have to multiply by four:

 _> having the size of 10000 integers, that is, 40000 bytes_

It's really hard to imagine how somebody who actually did read the article
could have missed that.

~~~
giancarlostoro
Its easy it was too long so they didn't read they skimmed.

~~~
khedoros1
Well, right. So we're back to the starting proposition: That you shouldn't
write a TL;DR if you didn't read the article.

Anyhow, this was more of a short blog post than a full article. It was a 2
minute read (not exactly a novel). And the thought experiments at the end are
interesting, and illuminate some important points about how the language
operates. Definitely worth the read.

~~~
nicky0
I think their interpretation of "tl; dr" is: "it was too long so I didn't read
it, and here is what I think anyway".

