
Main is usually a function – when is it not? - b0b_d0e
https://jroweboy.github.io/c/asm/2015/01/26/when-is-main-not-a-function.html
======
fugyk
The reason that I use C many of times is not because of its speed but because
it (almost) completely reveals it inner working even when there is clear
reason not to do so.

(1) Though it is bad for most of time but still I feel more contended
sometimes with it, especially for complex code.

(2) Even if this transparency is at superficial level with optimising
compilers, but there are very few high level optimisation applied to C code
and that too in the form of loop unrolling, tail call optimisation, loop
interchange etc.

~~~
pjmlp
For this type of stuff, I rather use a Macro Assembler.

~~~
jerf
For this type of stuff, I use... _NOTHING_. _NOTHING AT ALL_. I DON'T DO IT.
_BECAUSE IT IS INSANE_.

Gloriously, wonderfully insane, though. My compliments to the chef.

------
ejk314
Reminds me of this 'Hello World':
[http://stackoverflow.com/questions/15182496/why-does-this-
co...](http://stackoverflow.com/questions/15182496/why-does-this-code-using-
random-strings-print-hello-world)

I bet these two ideas would go well together.

~~~
vortico
The seed for the random number generator is not executed though.

~~~
ejk314
I was thinking more along the lines of making the main function (array) call
some other function which does this in such a way that it looks like you
implemented a random number generator but it actually prints "Hello World".
And you could hide the fact that 'main' was a piece of code by making it the
initial seed for for the random number generator and using macros in a
separate header file to rename it to something like 'SEED'.

------
mmastrac
Next challenge... make it a real string!

[https://0xcd80.wordpress.com/2011/04/29/linux-x86-shellcodin...](https://0xcd80.wordpress.com/2011/04/29/linux-x86-shellcoding-104/)

[https://code.google.com/p/alpha3/](https://code.google.com/p/alpha3/)

~~~
jzwinck

      const char main[] =
        "UH\x89\xe5\xb8\1\0\0\0\xbb\1\0\0\0g\x8d"
        "5\x10\0\0\0\xba\r\0\0\0\xf\5\xb8\x3c\0\0\0"
        "1\xdb\xf\5Hello World!\n]\xc3";

~~~
mmastrac
I should have been more clear -- something more alphanumeric than ASCII.

------
DanielBMarkham
Not sure the title's question was answered. At the end he just said _" And all
this time we’ve been ignoring the warning message about main not being a
function :)"_

I remember 20 years or ago writing my first C/C++ function where I popped out
to inline ASM. I was like, wow! I'm in complete control of this box! (It was
mostly copied code. My ASM-foo is weak) I really liked the fact that I could
either code in these really abstract types -- or be concerned with what the
stack looks like on entering a function. So I _think_ I know the answer to the
question.

Instead of ignoring the error, I was thinking that you could just mangle the
link file and make the entry point whatever you want, right? I could have a
program that consisted of "void x", then continue along the lines of the
articles with the number list.

Heck, once you start pulling the threads apart like this, the entire structure
falls apart. Your source code could look like anything you want it to look
like. And that's cool.

------
userbinator
I believe you could get around the problem of finding the address of the
string by pushing 4 (8?)-byte pieces of it onto the stack and then doing "mov
esi, esp".

On the topic of "executable ASCII", the EICAR test file is an interesting
example:
[http://en.wikipedia.org/wiki/EICAR_test_file](http://en.wikipedia.org/wiki/EICAR_test_file)

~~~
Spl3en
There is another known trick :

Place your string after a "call" instruction, then when you are inside the
call, the address of the string is on the stack, you can pop it in any
register you want.

------
SixSigma
I like this little project of how one person delved into the fun.

But as far as the TA goes, most people who have an interest in C and
programming will look at the source and say to themselves "Oh, byte values of
machine code being executed, yes very clever but do the assignment again
properly."

~~~
VBprogrammer
If that is the case then the education system is thoroughly broken. It is
obvious that a person who took the time to do that learned far more than they
would than had they copied one of millions of Hello World examples from the
internet.

That being said, I wouldn't rule out a bored TA opening the file not seeing a
printf and assigning a grade based on submitting something.

~~~
_sigma
Not at all; TAs are paid poorly, have other more interesting work (i.e., their
research) to do, and spending the time to deal with 'clever' is not a good
usage of time. Sure the student learnt more, but that doesn't mean this is
what they should have handed in. Have fun on your own time, but don't waste
the TAs time.

~~~
VBprogrammer
We might have to agree to disagree. If TAs aren't interested in students
diving deeper into the course material then I'm not sure they should be paid
at all.

But perhaps the less radical approach would be to pay them enough such that
they can take the time to judge the work on it's merit.

~~~
_sigma
>But perhaps the less radical approach would be to pay them enough such that
they can take the time to judge the work on it's merit.

I assume you have not TAed: I'd like to agree with you, but the realities are
it is poor pay for crappy work. Most students don't care about their work.
Most students seem to be unable to follow basic instructions. I go out of my
way to help students if they care and stay late at labs for these students,
but I have no patience for wasting my time because someone wants to show off
and deliberately make my life harder. As is, when I'm being paid for 6hrs of
lab + marking/week, and I am already spending 8+hrs a week because students
can't format their assignments properly, don't write in complete sentences,
etc, I have zero interest in patting this guy on the back. Instead, I have to
do more work for no pay so he can feel clever.

------
vortico
I wonder if one could write a header file containing definitions that would
make the following possible.

    
    
      char main[] = {
      	movl(1, eax),
      	movl(1, ebx),
      	movl(message, esi),
      	movl(13, edx),
      	syscall(),
      	movl(60, eax),
      	xorl(ebx, ebx),
      	syscall(),
      };
    

Obviously there are some technical difficulties like handling literal values
and code sections, but it could be a fun hack, and I've love to see what
someone could come up with.

~~~
comex
It's possible, but once you're not obfuscating the code - and already
depending on OS details when you assume that global data can be executable,
whether const or not - I think you may as well just use the inline assembler
feature of your compiler. Well, if there is one... MSVC bizarrely doesn't
support on x86-64 what was a perfectly good feature on x86.

~~~
TazeTSchnitzel
It's not bizarre: inline asm hurts performance and requires more effort from
compiler authors than is worth it in our modern age of intrinsics.

~~~
comex
I admit I'm not too familiar with the old MSVC inline assembly system, but the
way GCC/Clang do it certainly allows emitting identical code to what you can
get with intrinsics, although using naive constraints might hurt you. However,
the main reason I personally use inline asm is not for performance, but to
access instructions which are not provided as intrinsics or require special
register handling. For example, I recently wrote some code that did syscalls
directly (because it was patching memory so the normal syscall functions might
not be accessible); I could have linked a separate .S file, but inline
assembly made the output look much nicer. Or, while I haven't written this
myself, it can be used to write out nops which zero-cost tracing facilities
will then patch, something which separate assembly files cannot do.

As for effort from compiler authors, I'd like to hear more about why it is
supposedly so hard. (MSVC is also the compiler which has to prioritize their
choice of C++11-17 features to implement over the years, while the competitors
have _complete_ C++11 and 14 implementations. I guess that's somewhat ad
hominem on the team, but ...)

~~~
TazeTSchnitzel
Theoretically they lead to identical code, but I believe a block of inline ASM
results in constraints on surrounding code and prevents some types of compiler
optimisation.

------
Someone
See also
[http://www.ioccc.org/1984/mullender.c](http://www.ioccc.org/1984/mullender.c)
(30 years ago). Hint at
[http://www.ioccc.org/1984/mullender.hint](http://www.ioccc.org/1984/mullender.hint)

~~~
sirclueless
Not having access to a PDP-11 (maybe this works on an emulator?) is it
cheating to ask what the output of this program is?

~~~
Someone
According to [http://www.ioccc.org/faq.html](http://www.ioccc.org/faq.html),
it is equivalent to

    
    
      main(){
        int i=512;
        do
          write(1,"  :-)\b\b\b\b",9),usleep(i);
        while(--i);
      }
    

So, it moves a smiley on the screen, at increasing speed.

------
lelandbatey
I'm impressed with the authors efforts, and share much of his pain when it
comes to doing weird things with C.

For other obfuscated fun, see this poem/tree printer in 505 bytes[0], or this
program:

    
    
        main(G){10<putchar(11^--G?89-(0x1F&2846984999544829581>>5*G):10)&&main(2+G);}
    

[0] -
[https://github.com/lelandbatey/tiny_tree_printer](https://github.com/lelandbatey/tiny_tree_printer)

~~~
gjm11
OK, your one-liner is simple enough that maybe it's worth a few minutes to
unpack it here.

Let's begin with that big constant. The values of

    
    
      89-(0x1F&2846984999544829581>>5*G)
    

for successive choices of G are: 76 69 76 65 78 68 66 65 84 69 89 74 87.
"Obviously" these are character indices; the characters are
L,E,L,A,N,D,B,A,T,E,Y,J,W. (I remark that Leland was the first name of the
founder of Stanford University and conjecture that the author of the code
works at, or studied at, Stanford. And then I look up at the username of the
person who posted it and feel a bit stupid.)

Now, what's the argument to _putchar_? It has the form

    
    
      11^--G ? thing1 : thing2
    

so it decrements G, and then if the result is anything other than 11 it
evaluates thing1; but if the result is 11 it evaluates thing2 instead. In this
case, thing1 is our LELANDBATEY generator and thing2 is 10 (= newline).

So far, so good. One more step out. What comes before the && inside _main_? It
looks like this:

    
    
      10<putchar(stuff)
    

and the return value from _putchar_ is (barring I/O failures) the same as
what's fed into it. So, putting the above together, when G isn't 12 we'll
decrement G, write out a character from LELANDBATEYJW (indexed by G), and this
thing will be true; when G _is_ 12 we'll decrement G, write out a newline, and
this thing will be false.

Nearly there. Now the whole thing looks like

    
    
      main(G) { thing_above && main(2+G); }
    

so when G isn't 12 it will decrement G, write out a character from
LELANDBATEY, and immediately call _main_ with G+2 (which is 1 more than G
started out as); but if G is 12 on entry to _main_ it will decrement G, write
out a newline, and then return (perhaps to a previous stack frame which will
also return, and so on until exit).

OK. So, I guess this program is intended to be invoked with no command-line
args. Then G (= argc) will be 1. We'll write out characters 0, 1, 2, ..., 10
of the LELANDBATEY string (getting exactly that far) at which point G will be
12 on entry to the next _main_ call. Newline, exit. Done.

(In case it isn't clear: it outputs "LELANDBATEY\n".)

No offence, I hope, but this is _really tame_ compared with typical short-
short IOCCC entries. Here's an example from the 22nd IOCCC. It's short enough
to tweet.

    
    
      char a;float b,c;main(d){for(;d>2e3*c?c=1,scanf(" %c%f",&a,&c),d=55-a%32*9/5,b=d>9,d=d%13-a/32*12:1;a=2)++d<24?b*=89/84.:putchar(a=b*d);}
    

See
[http://www.ioccc.org/2013/endoh3/hint.html](http://www.ioccc.org/2013/endoh3/hint.html)
for more about this; it interprets tunes written in a simple musical notation
and generates raw audio data.

~~~
lelandbatey
The one-liner I posted is really meant to be just understandable enough to
someone who's taken a solid first year computer science course with C and is
looking for more. At least that's where I was approximately when I first found
it. So yes, it's mean to be tame, more of a brainteaser than indecipherable.

Also, I wrote a program to generate these. So here's one for you:

    
    
        main(O){10<putchar(5^--O?77-(31&2278424678>>5*O):10)&&main(2+O);}
    

In fact, I originally found this style of tiny C program on Hacker News nearly
two years ago. Here's the link to the original:
[https://news.ycombinator.com/item?id=5524428](https://news.ycombinator.com/item?id=5524428)

------
ygra
I remember something similar that enabled you to embed machine code in VB6
programs, which involved a magic constant that was sort of an entry point into
a string containing opcodes. I don't remember the specifics, it's been a
while, but nowadays it's very likely a horrible no-go anyway because you're
executing data.

Still, was a nifty trick that allowed you to get more performance out of a
language that wasn't very fast to begin with.

~~~
evincarofautumn
I remember QuickBasic (the precursor of VB: VB 1.0 was effectively QB 8.0) had
a CALL ABSOLUTE statement, which let you execute machine code in a string. I
used it to great effect in graphics demos, once upon a time.

------
wheaties
Wait, what kind of company forces one of their programmers to take an intro
level comp sci course? If you're going to let people hire someone to do a job,
maybe you should either trust their judgement or have someone senior work with
them to fix obvious defects?

I don't know, something about that just didn't sit right with me.

~~~
Kudos
That's almost certainly not what happened here. He was probably doing a
Masters or something in a related field which had this requirement.

------
ctdonath
On a possibly comparable note: in C++, programs can be run entirely from the
constructors of static global objects ... letting one write a program with

    
    
      #include "X.h"
      X x;
      void main(){}

------
mncolinlee
This is very cool. For the next assignment, work in a Duff's Device.

[http://en.wikipedia.org/wiki/Duff%27s_device](http://en.wikipedia.org/wiki/Duff%27s_device)

------
derefr
I recall from somewhere:

    
    
       int main = 0;
    

This code compiles (cleanly!), but segfaults.

~~~
vortico
I think that sets the address of main to NULL, so it segfaults as soon as
`_start` jumps to `main`. It's known for being one of the smallest compilable
C programs.

EDIT: Your original post contained

    
    
      int main=0;

~~~
comex
Not quite - what matters (normally) is the address of the symbol, not the
bytes located there, since in the case of a real function those bytes would be
the instructions. So this will either execute the bytes at &main as
instructions (4 zero bytes, and whatever follows), or, more likely, crash due
to memory protections, as described in the article.

~~~
zwegner
Yup. You can get a working program with the simple:

    
    
      const int main = 0xC3;
    

...which is just a return. Or you can get fancy and make it exit successfully
by clearing eax first:

    
    
      const int main = 0xC3C031;

------
belovedeagle
> My problem solving process is typically the same thing I imagine most
> programmers do.

I'm afraid I may be a victim of Poe's law here, but this attitude is...
distressing. It completely discounts the value of problem solving, which, once
upon a time, was the entire point of the profession.

~~~
mden
It does not. It just moves what "the problem" is.

An example of that is the article itself which leverages the gained
information from SO to solve the real problem.

Also arcane knowledge is not the same thing as problem solving.

~~~
belovedeagle
> If not solved, try a different query and repeat.

I'm not sure if you even bothered to read the article... since that was the
very next sentence. I, too, wanted to give the author the benefit of the doubt
and assume it was mere information gathering, which is a good thing, but the
next sentence shows that the intent is to solve the problem from start to
finish with Google.

------
dmritard96
In python it is still a function but it is also an object?

~~~
ekimekim
Well more properly python doesn't have a "main" function. You just directly
execute the initial script, which will import other modules, define functions
and classes, etc, and finally (hopefully) execute something.

------
Buge
Wouldn't this be considered academic fraud? His coworker turning in code that
he didn't write?

~~~
b0b_d0e
Probably not, since this isn't a case where my co worker can just copy paste
my code and hope it works for two reasons. The first is I am not entirely sure
their testing environment is using 64 bit linux, it could be 32 bit linux
since I'm not in the class, I just assumed its 64bit so I could test it on my
machine. The next reason is they don't actually have a Hello World assignment
at all! The reason I wrote this article is so I could share something I
learned, and my co worker (who has already completed the first half of the
semester's assignments) could use this as a starting point for how to write
his own version for an assignment.

~~~
kps
> The first is I am not entirely sure their testing environment is using 64
> bit linux, it could be 32 bit linux

Maybe it's ARM, or MIPS, or… Followup problem: write a sequence of bytes that
performs a jump on one architecture, and something harmless or reversible on
the other(s).

------
ecesena
Not working on Mac, but I'm gonna tweet it anyway :)

~~~
0x44
I spent a bit getting it to work on OS X. The ASM becomes:

    
    
        movq $0x2000004, %rax  ; // BSD syscalls are divided into classes.
        movq $1, %rdi          ; // 64-bit registers use %rdi instead of %ebx
        lea message(%rip), %rsi; // Relative Address of message
        movq $13, %rdx         ; // Same Length
        syscall                ; // x86-64 ASM syscall
    
        movq $0x2000001, %rax  ; // Exit Syscall
        movq $0, %rdi;
        syscall
    
        message: .ascii "Hello World!\n";

~~~
neoteo
Nice. Thx.

