
A primer on some C obfuscation tricks - simonpure
https://github.com/ColinIanKing/christmas-obfuscated-C/blob/master/tricks/obfuscation-tricks.txt
======
nneonneo
I publicly reverse-engineered a cute bit of C code many years ago that employs
some of these tricks:

[https://stackoverflow.com/questions/15393441/obfuscated-c-
co...](https://stackoverflow.com/questions/15393441/obfuscated-c-code-
contest-2006-please-explain-sykes2-c)

It's a digital clock (which has to be compiled and run once per second to be
accurate).

Of course, if you're interested in obfuscated C, you can't miss the
International Obfuscated C Code Contest, which is where most of these evil
tricks show up. Submissions for this year's IOCCC are still open:
[https://www.ioccc.org/](https://www.ioccc.org/)

The IOCCC has been running since 1984, and there are some absolutely marvelous
gems: [https://www.ioccc.org/years.html](https://www.ioccc.org/years.html). A
great rabbit-hole to dive down if you're stuck at home ;)

------
moonchild

      ({_:&&_;});
    
      ({});
    
      ({;});
    

All of these rely on a gcc extension ('statement expressions').

________________________________

    
    
      a = '-'-'-'
    

You can also do:

    
    
      a = '/'/'/'
    

To generate a 1 instead of a 0.

________________________________

    
    
      printf("%d %d\n", 0 == sizeof(count = 2, count++), count);
    

This works because:

1\. There are two forms of sizeof; sizeof(T) where T is a type, and sizeof x
where x is an expression.

2\. There are 'comma expressions'; if you have (x, y) where x and y are
expressions, then x is executed and the expression evaluates to y.

3\. Parameters to sizeof are not evaluated (which is important because
otherwise the value of the 3rd argument would be undefined, since there's no
sequence point between evaluation of function parameters).

~~~
saagarjha
> Parameters to sizeof are not evaluated

Oh, but they are ;) Try running sizeof on a VLA sometime.

~~~
_kst_
The argument to sizeof is evaluated _if and only if_ it's of a variable length
array type.

(If it's a parenthesized type name, it's unclear what it means to "evaluate"
it.)

------
saagarjha
Is it bad that I've written some of these in my normal code? In particular, I
use the concepts in these when they're relevant:

    
    
      *(c ? &x : &y) = v;
    
      return (char *[]){"No", "Yes"}[!!x];
    

(Note: for the second one, I use the "selecting from a temporary array" and !!
separately; if I only had two options then I'd use x ? "Yes" : "No" of
course.)

~~~
userbinator
I don't find those obfuscatory at all either, and instantly understood them
upon first reading. I think they are a form of "data-oriented" style of
programming, where array indexing and selection via ?: are preferred over
"code-oriented" control flow statements. In my experience, the former actually
tends to create more concise and maintainable code, since it often means a lot
of the logic becomes table-driven and easily modified. The latter tends to
result in very long and "branchy" code with lots of if/else statements that
ultimately contain tiny bodies that don't do much.

Combining multiple booleans into an integer and using that as a switch or
index is another related technique.

To modify an old saying slightly, perhaps "one person's obfuscation is another
person's simplicity and elegance."

~~~
jackewiehose
> I don't find those obfuscatory at all either, and instantly understood them
> upon first reading

I understood the whole list of these obfuscation tricks upon first reading but
its still obfuscation.

Creating an array instead of an if-else-switch like the second line is fine
but for just two elements and in combination with !!x (and !!x in general) its
just nonsense and doesn't help at all.

I'm all for code-density and avoiding repetition so I too use ?: whenever I
can but not on the left side of an assignment. That's just an obfuscated if-
else-statement. If normal code results in very long and "branchy" code you can
rearrange your code in a better way.

~~~
gpderetta
FWIW, !! is a fairly idiomatic way to cast a value to bool.

~~~
jackewiehose
Ints are automatically converted to bool, there is no need to cast. What this
does is it restricts the int to 1 or 0. The clean way to write this is x ? 1 :
0. This is understandable even if you don't know the Boolean inversion of 0 is
1 (Which isn't self-evident. For example in Basic its -1).

~~~
gpderetta
They are converted to bool in bool context. Outside of that you have to force
it.

    
    
       int countNonZero = 0;
       for (int i = 0; i < values.size(); ++i) {
           countNonZero += !!values[i];
       }
    

Also c++ has explicit operator bool();

edit: not a great example as values[1] >0 would be better.

~~~
jackewiehose
You don't have to enforce a conversion from int to bool if you just want a
bool. In this example you don't want to increment by 'bool', you want to
increment by 1 or 0 (which are ints). So you use !!x to convert to bool and
back to int.

The reason I'm pointing this out is that often enough I had to work with code
where this distinction was unknown to the author and you come across code like
if (!!x) which is, again, just nonsense.

------
ramshorns
Hey, I've got one! Use the exponent operator.

    
    
      if (2^3 == 8)
          puts("two cubed is eight");
      if (5^2 == 25)
          puts("five squared is twenty-five");

~~~
bear8642
That's Bitwise Xor _not_ exponentiation.

2^3 => 1

5^2 => 7

~~~
ramshorns
It works though! And if you do

    
    
      if (2^3 != 1)
          puts("two cubed is not one");
      if (5^2 != 7)
          puts("five squared is not seven");
    

it prints those messages too.

The trick is operator precedence.

~~~
bear8642
Cool - thought had operator confused

------
faehnrich
That index[x] was used in one of my favorite IOCCC one-liners.

[http://faehnri.ch/have-fun/](http://faehnri.ch/have-fun/)

------
xnhbx
From bulletin #4: "-2147483648 is positive. This is because 2147483648 cannot
fit in the type int, so (following the ISO C rules) its data type is unsigned
long int. Negating this value yields 2147483648 again."

This is not true. If a decimal integer constant value cannot be represented in
type "int", the next candidate type is "long int". If the value cannot fit in
"long int" either, the next type to try is "long long int" in C99 and
"unsigned long int" only in C89.

------
inetknght
Today I learned... _magic_

    
    
        #include <cstdio>
        
        void sw(int s)
        {
            switch (s) while (0) {
                case 0:
                    printf("zero\n"); continue;
                case 1:
                    printf("one\n"); continue;
                case 2:
                    printf("two\n"); continue;
            }
        }
    
    

[0] [https://gcc.godbolt.org/z/Q26LWG](https://gcc.godbolt.org/z/Q26LWG)

~~~
lostmsu
Huh, what does it do?

~~~
strbean
Looking at the link, it generates the exact same assembly (and behavior of
course) as

    
    
        void sw(int s) noexcept
        {
            switch (s) {
                case 0:
                    printf("zero\n"); break;
                case 1:
                    printf("one\n"); break;
                case 2:
                    printf("two\n"); break;
            }
        }

~~~
lostmsu
Is noexcept here the consequence of while? Or noexcept simply does nothing at
all?

Not a C expert, so just curious.

~~~
eMSF
AFAIK noexcept is a C++ specifier that isn't valid C. Even so, it has nothing
to do with the loop, but in C++ it would cause the program to terminate if an
exception occurred in the sw function.

------
Sharlin

        > return (char *[]){"No", "Yes"}[!!x];
     

I prefer

    
    
        return (!x<<2)+"Yes\0No";

~~~
bear8642
Clever! Though feel having "No" relate to x = 0 is slightly clearer

~~~
Sharlin
But surely the goal here is the opposite of clarity? :)

------
jackhalford
> x[index] is *(x+index) > index[x] is legal C and equivalent too

Can't be unseen, I can't believe I never thought of that.

~~~
guitmz
I find that after learning assembly language, things like this become very
obvious when seen in different languages, specially C

~~~
roywiggins
The main reason that works is historic, I think.

It could just as easily throw a compilation error to index a constant with an
array rather than the other way around. I don't think this works in Rust even
if the resulting machine code for array indexing is the same.

~~~
eMSF
The main reason is that in C, "indexing" an array is purely syntactic sugar
for pointer arithmetic, which itself is commutative; that is. ((A)[B]) is
equivalent to ((A)+(B)), which itself is equivalent to ((B)+(A)) (assuming one
of them has an integral type and the other a pointer to complete object type).

Now, of course an array type isn't a pointer type, but as "indexing" isn't one
of the very few cases where an expression that has an array type isn't
converted to an expression with a pointer type, you aren't really indexing an
array, but a pointer to its first element.

~~~
wahern
Another way to look at it is that C syntax was designed to be extremely simple
to parse, and C semantics to simplify code generation. Early C compilers
immediately generated code as they parsed each expression, keeping minimal
state. (No AST!) Also consider that in B the only data type was the machine
word, so the type of the operands were irrelevant to the code you generated.
In early C the biggest difference was structures, which required some minimal
bookkeeping (very minimal when all members were in the same namespace), but a
struct dereference is just syntactic sugar for an (address + offset)
expression, so underneath the covers the compiler was still just chewing
through identifiers, left to right, and emitting simple assembly for addition
and multiplication, because each identifier was just a symbol for an integer.

So index[array] isn't an historical accident. It might not have been
deliberate, but it follows naturally from the nature of the language.

Go very much follows the same discipline. Speed and simplicity of compilation
constrain the syntax, most notably the lack of generics. Goroutines, channels,
etc, only require minimal syntactic and compiler support. Contrast that with
Rust--Rust front loads _everything_ into the parsing phase--lifetimes, async,
etc. Deep AST analysis and transformation is everything for Rust. Of course
these days people abhor even the possibility of allowing something like
index[array], so even a compiler like Go goes out of its way to disallow it.

------
_kst_
> or since int is default type

Not since the 1999 ISO standard.

> int main(){ return linux > unix; }

A conforming compiler must diagnose "linux" and "unix" as undeclared
identifiers. Many C compilers are not conforming by default.

~~~
ben0x539
I suspect the post is about GNU C rather than ISO C.

------
fortran77
I wonder when there'll be the Obfuscated Rust Programming Contest.

~~~
gallier2
No need for that, any Rust program is obfuscated.

------
dvfjsdhgfv
I like if (val && ~val) as an alternative to if(val) based on the fact that
for non-zero ints ~val will also be non-zero.

------
waltpad
That probably shows (once more) that the C language really needs an overhaul,
with a stricter grammar disallowing that sort of tricks.

Or maybe not.

~~~
platinumrad
This is a valid Rust program:

    
    
      fn main() {
          return return return return return return return
      }
    

You can do silly things in any language.

~~~
waltpad
Indeed, anyone can also write code over-abusing lambdas in a functional
language, but lambdas are also quite useful in the majority of cases. On the
other hand, can you point out a single situation where swapping _int_ and
_typedef_ in a type definition brings anything good?

Likewise, in your example, I doubt that this could bring anything. Is there a
practical obfuscation method based on that quirk in Rust, or a reason for
keeping it in the syntax? I cannot tell, but maybe you can?

Overall, the problem that I have with this is not that it is silly, it is that
it makes it harder to understand and maintain. In some cases, it requires
active engineering to fix the issue, but the language should be designed so
that most of these problems are taken care of by design.

Also, many people seem to enjoy the fact the C can be bent that way. I don't
mean to remove that from them, I just think that for system programming, it
should be less permissive. Perhaps a 'strict mode' could be devised, not at
the syntax level like in javascript (which I suppose couldn't be avoided), but
as compiler flag (like the c++ people did it).

------
Nevermark
I keep my code interesting, to make work more enjoyable for future
maintainers, by keeping all my code ...

inline with_all_ColinIanKing_standards()

------
freefal
What does the provided example do?

5) Surprising math:

int x = 0xfffe+0x0001;

~~~
monocasa
It doesn't compile.

It's trying to parse the e as scientific notation I think.

    
    
      surprising_math.c: In function ‘main’:
      surprising_math.c:4:10: error: invalid suffix "+0x0001" on integer constant
        int x = 0xfffe+0x0001;
                ^~~~~~~~~~~~~

~~~
inetknght
Indeed, I think you're exactly right. Changing it to `0xffff+0x0001` lets it
compile.

~~~
shaklee3
Doesn't that defeat the purpose then? It's no longer using scientific
notation.

~~~
gallier2
The initial intent was to add 2 hex values, but using 0xfffe tripped the
parser.

------
TimSchumann
Thanks for sharing this!

------
m463
how about this one?

    
    
      int main() {
        fork();
        printf("choo");
      }

~~~
saagarjha
What's the trick (besides calling functions after fork that you shouldn't be
calling?)

~~~
oddlama
I think parent wanted to write the following:

int main() { printf("choo"); fork(); }

Here, "choo" can be printed twice, even though we fork after printing. This is
a result from line buffering when the flush happens after the process is
forked. Essentially, the output buffer is copied when forking and therefore
duplicated.

~~~
m463
oh yes, that's what I meant! your example and mine print the same output.

