
What happens before main() is executed in C and why is it important? - arash9k
http://mymicrocontroller.com/2018/04/03/what-happens-before-main-function-is-executed-in-c-and-why-is-it-important/
======
leggomylibro
One other way to observe these steps a bit more clearly is to look at the
startup code provided by a chip's manufacturer or generated by an IDE that
targets the chip, such as Keil.

Most will put this type of code in '.S' startup assembly files, which often
also contain information like the memory addresses for hardware interrupts to
use, and linker scripts for telling the compiler about the memory available on
the chip.

For example, ST's 'CubeF0' package has some example projects for their simpler
ARM chips:

[http://www.st.com/en/embedded-
software/stm32cubef0.html](http://www.st.com/en/embedded-
software/stm32cubef0.html)

I think that it is good practice with microcontrollers to tell GCC not to
include generic startup logic in order to ensure that you are doing the right
thing for your particular chip, by using flags such as _\--specs=nosys.specs_
, and even _-nostdlib_ with _-lgcc_ and _-lc_ only if necessary.

~~~
monocasa
It's not just "good practice", but instead the startup code is so specific to
the platform that there really isn't "generic" startup code.

~~~
makomk
The generic compiler-provided startup code is fine for AVRs under most
circumstances, since they're only produced by a single manufacturer and don't
have fancy features like PLLs which need to be initialized. It's the more
complicated chips where it gets a little hairy, especially if you have to deal
with things like external RAM.

~~~
monocasa
I guess I rarely see the case for an AVR when you're not prototyping on an
Arduino. You can get chips that are an order of magnitude more powerful per
$/watt/and any other metric you'd want to use.

~~~
Bromskloss
Have you read "The Amazing $1 Microcontroller" [0], which might have been
posted here before? It's a survey of available microcontrollers, with the
intent of helping you choose one. If you have read it, are you in agreement
with the article, for example on how AVRs compare to others?

[0]
[https://jaycarlson.net/microcontrollers/](https://jaycarlson.net/microcontrollers/)

~~~
monocasa
Yeah, he put a lot of work into that, and I love that he published all of the
information.

My biggest complaint though is that he expresses GPIO cycle time and interrupt
latency in cycles, rather than normalized on real time.

Particularly when comparing a STM32F0 to an AVR, the perf looks almost even,
until you realize that the STM32F0 is clocked 2.5x faster.

~~~
Bromskloss
Good point. Have you, or someone else, mentioned this to the author?

------
popmatrix
It should be noted that relying on static initialization as suggested in the
article, while at times useful, can open a new set of problems until
initialization order is well-defined (it's out of scope for the article, so I
imagine the author is aware and just opting for brevity). Some compilers
provide an explicit ordering mechanism (clang and GCC) and the there are
proposals in the C++ standards at the moment, but until then relying on static
initialization is something one should think about potentially solving
differently. Depending on the architecture, there tends to also be well-
defined hooks for functions to run pre and post-main which is another
consistent model for initialization/shutdown.

Edit: spelling

~~~
monocasa
I had a really good experience designing an RTOS initialization system that
both depended on static initialization, but didn't have issues with ordering
either (and in fact didn't even have a main, post init just went right into
the idle thread).

Basically I only let children of one class be statically initialized (and
enforced this with a tool that goes through the generated binary and type
information). These 'subsystem' classes then gets callbacks after all of the
static initializers have run which allows them to find the other subsystems
they depend on, and then that dependency graph is walked to initialize the
full system (the dependency graph has to be a DAG or the system faults).
Combined with a code review rule that static initializations only happen in
one compilation unit in an anonymous namespace, and nothing else can happen
there, means that no one can really touch other subsystems before they've all
been initialized, and therefore the order doesn't matter.

I was really happy with how it turned out, despite being completely off the
wall compared to how C++ normally works.

~~~
pnathan
> despite being completely off the wall

embedded systems software in one phrase. :-D

------
kbumsik
Why not just looking at the source code?

I attached an example of Arduino Zero varients [1] (it is directly from
Atmel's SDK.) It is quite straightforward to see what happends before main().
In embedded system, you can even change it easily if you want.

In this example, the stack pointer is updated at very first point after
powering up the device [2]. And the undersocre-prefixed variables are defined
in its linker script. [3]

[1]: [https://github.com/arduino/ArduinoCore-
samd/blob/master/boot...](https://github.com/arduino/ArduinoCore-
samd/blob/master/bootloaders/sofia/Bootloader_D21_Sofia_V2.1/src/ASF/sam0/utils/cmsis/samd21/source/gcc/startup_samd21.c#L161)

[2]: [https://github.com/arduino/ArduinoCore-
samd/blob/master/boot...](https://github.com/arduino/ArduinoCore-
samd/blob/master/bootloaders/sofia/Bootloader_D21_Sofia_V2.1/src/ASF/sam0/utils/cmsis/samd21/source/gcc/startup_samd21.c#L108)

[3]: [https://github.com/arduino/ArduinoCore-
samd/blob/master/boot...](https://github.com/arduino/ArduinoCore-
samd/blob/master/bootloaders/sofia/Bootloader_D21_Sofia_V2.1/src/ASF/sam0/utils/linker_scripts/samd21/gcc/samd21j18a_flash.ld)

------
dicroce
Personally, I think global's with constructors are kind of an anti pattern.
I've noticed a high correlation between crappy app's and app's with a lot of
pre main() code. Especially if some of your global's have threads: because
then you potentially have threads running AFTER main exits... I've seen a
number of apps with random crashes at shutdown because a thread that was
started in a global was still running and since main returned globals were
being cleaned up and the thread used something that had just been cleaned up.

~~~
pensono
This article seems to be targeted toward embedded programming, so global state
isnt that big of a deal because you have complete control of the chip. It's
quite common to not even have threads or an operating system at all.

------
doomjunky
To me one curious thing about the main() function without arguments is, these
arguments are still pushed onto the stack. You can find them by dereferencing
a pointer of another stack variable and playing around with the offset.

    
    
        // Just tested with mingw-gcc on Win8.1
        // Maybe you need to play around with the offset, +28
        int main() {
            int i; // Another stack variable
    
            printf("%d\n", (long long)&i + 28); // &argv
            printf("%d\n", *((void**)((long long)&i + 28))); // argv
            printf("%s\n", ((char**)*((void**)((long long)&i + 28)))[0]); // argv[0] 
    
            return 0;
        }
    

Exclaimer: I don't know how this behaves in an embedded environment.

~~~
saagarjha
Here's what works on my Mac, compiled with clang -m32.

    
    
      #include <stdio.h>
      
      int main() {
      	int i;                                // Another stack variable
      	printf("%s\n", **(char ***)(&i + 8)); // argv[0]
      	return 0;
      }
    

Of course, this won't work at all on x86_64 because arguments are passed on
registers instead of the stack.

~~~
doomjunky
On Windows this dependes on the calling convention. The calling convention
determines whether function arguments are passed: In order or in reverse
order. Who takes care of cleaning the stack: caller or callee? And even if the
arguments are passed by stack or CPU registers.

In Visual Studio you can control this behavoir with the __cdecl, __clrcall,
__stdcall, __fastcall, __thiscall, __vectorcall [1] calling convention
function prefixes, but this also depends on the optimization options [2] of
the compiler.

Passing arguments via CPU registers has a huge preformance benefit, but also
some drawbacks.

[1]: [https://msdn.microsoft.com/en-
us/library/984x0h58.aspx](https://msdn.microsoft.com/en-
us/library/984x0h58.aspx)

[2]: [https://msdn.microsoft.com/en-
us/library/46t77ak2.aspx](https://msdn.microsoft.com/en-
us/library/46t77ak2.aspx)

~~~
saagarjha
> Passing arguments via CPU registers has a huge preformance benefit, but also
> some drawbacks.

Like what?

~~~
pjmlp
On architectures like x86, the set of registers is limited and some are
special, so there is some contention going on.

------
barbegal
In embedded applications a lot of things happen before you can even attach a
debugger. Most microcontrollers go through a non programmable hardware
initialisation sequence which initialises enough hardware for the processor to
run. The processor then runs from a section of read only (or at least write
protected) memory. This code performs a further stage of initialisation and
patching. This can be used by the manufacturer to implement their own
bootloader or to perform configuration of the hardware to get around certain
hardware bugs. Finally this code jumps to a known memory/flash location where
the code that the engineer has written is located. It is from this point that
you can usually start debugging.

~~~
RealityVoid
Is this even public for us as developers, can we influence it and it influence
us? If not, should we even think about it?

I had a lot of curiosity about how JTAG works and how I can build my own
debugger for general targets, but is seems documentation for this is thin. I
should look over openocd to see how those guys are doing stuff.

~~~
barbegal
Most manufacturers do in broad terms disclose how the boot sequence works.
Here is an example of the information disclosed about a Cypress
microcontroller [1]. In some applications the startup time can be critical so
this needs to be known. The code that runs before the engineers code doesn't
usually have any effect and it puts the microcontroller into a known
documented state so you don't really need to think about it.

[1]
[https://community.cypress.com/community/psoc-6/blog/2017/05/...](https://community.cypress.com/community/psoc-6/blog/2017/05/04/psoc-6-boot-
sequence)

------
usr1106
This is a misleading headline for the general audience.

What he writes is an introduction to bare metal programs for somebody who has
worked under an operating system before.

In both cases a lot of code is run before main(). But very different code. In
bare metal you more likely need to care what it does. In the operating system
case the developers of the OS more likely have done more for you than you will
ever need to know about.

~~~
usr1106
A couple of years ago there was a conference presentation (FOSDEM?) about what
a program does before reaching main(). The presenter (a Brit IIRC) went pretty
fast, did not go into too much detail and it took him 30-40 minutes. I think
the study was done on a BSD. Maybe somebody can provide a link, the video is
online, but Google resists to help me...

~~~
sedachv
That sounds like Brooks Davis' talk that he has given at a few conferences:

[https://www.youtube.com/watch?v=yWCMy5EiNkQ](https://www.youtube.com/watch?v=yWCMy5EiNkQ)
[https://people.freebsd.org/~brooks/talks/eurobsdcon2016-hell...](https://people.freebsd.org/~brooks/talks/eurobsdcon2016-helloworld/20160924-eurobsdcon-
helloworld.pdf)

~~~
usr1106
Exactly! The name of the speaker helps to find what I was looking for
[https://mirrors.dotsrc.org/fosdem/2017/Janson/hello_world.vp...](https://mirrors.dotsrc.org/fosdem/2017/Janson/hello_world.vp8.webm)

------
nebulous1
This seems to be concentrating on what the system loader is doing without
mentioning whatever the statically linked standard library gets up to before
calling the main function?

~~~
jwr
On microcontrollers, you often don't have a standard library, or if you do, it
is limited, and doesn't do things before main. The article describes pretty
well what happens on a microcontroller.

I would add that in some (most?) environments, the code that gets executed
before main() is easily accessible: it sits in files within the project
(written in assembly most of the time).

~~~
nebulous1
Cool, he is talking about the loader though, which would imply that this is
all language independent? Except for where he mentions things getting executed
before main in C++, which I'm assuming would be the doings of the C++ standard
library.

~~~
monocasa
It's still pretty language independent. Qemu for instance is a c codebase that
depends on static initialization functions that run before main to register
the different components.

------
nassyweazy
There articles are so incomplete and always make it to the top on HN... I need
some explanation.

~~~
saagarjha
It's not hard to take a look at doing this yourself: just run a program, break
at main, and check what's in your stack backtrace.

------
dvirsky
you can have pre-main functions in in C as well (as a GCC extension). If you
define functions with `__attribute__(constructor)` and they will be called
before main(). You can even set the priority on them, and set post-main
destructor callbacks. This works even for shared libraries loaded in run-time.

I've used this as a trick to automatically run unit tests, though, not for any
real work.

~~~
bakztfutur3
"Yeah, yeah, but your scientists were so preoccupied with whether or not they
could that they didn't stop to think if they should."

~~~
dvirsky
It can be useful for registering things automatically, like Go's init()
functions which I've used on occasion.

------
textmode
I am a neophyte x86 asm user.

I link using gcc and -nostdlib plus a very simple start.S:

    
    
       .globl _start
       _start:
       call main
       movl $1,%eax
       xorl %ebx,%ebx
       int $0x80
    

fasm 1.asm;gcc -nostdlib start.S 1.o lib1.a;a.out

"lib1.a" contains only the functions needed by a.out instead of, e.g., every
function in libc.a

I use ar -M < 1.mri to script changes to lib1.a

(Did I forget about crt0.o?)

~~~
textmode
Below is a sample "1.asm" source file which in one expt yields a 2.2K static
binary.

"main" has been renamed to "xyz".

    
    
        format ELF
        section '.text' executable
        public xyz 
    
        extrn _exit
        extrn write
    
        xyz: 
        mov eax,len00
        push eax
        push char00
        mov eax,1
        push eax
        call write
        call _exit
    
        section '.data' writeable
    
        char00 db "21 symbols", 0xA 
        len00 = $ - char00

~~~
textmode
(1.3K stripped)

------
jtchang
I love the XML question (whether your app does stuff with XML).

There have historically been some big security holes when parsing XML that it
is a security code smell now if you are working with it (especially in lower
level languages like C or C++).

------
jononor
The linker script is what runs the global constructors in C++. Nothing
preventing using a custom linker script in C to call some `premain()`
function.

~~~
baruch
At least in gcc you can use constructor attribute to run code before main.
I've seen this used to initialize modules and create a dependency tree of
modules to initialize before main even starts.

------
ddtaylor
Is it strange that C++ is able to do this but C itself cannot? Sometimes I
hear the argument that "C is lower level" and their is the idiom that "C++ is
C but better" (not that I believe these are true)

~~~
Retric
It's somewhat misleading when they say the C++ code runs before main. What's
going on is after the same initialization steps the first code of your program
is executed.

C and C++ have different syntax for what that code is, but it's pure semantics
as far as the computer is concerned and C++ gives you zero extra power.

------
x0054
That's funny. In Swift i needed to make sure something runs as early as
possible and I had to do the same trick. Put the code in the declaration of a
variable like this:

var loadTheme: Bool = { Style.loadTheme(); return true; }()

------
dandigangi
Found this article really interesting! It got me to write my first C program.

------
Myrmornis
> Command line arguments are received.

Nit: that’s not the best wording. if the executable is being executed by
something like `exec` then these are just “arguments”; the command line (i.e.
a shell) isn’t involved. Do embedded systems support something like `exec`?

~~~
saagarjha
They're still called command-line arguments out of convention:
[https://en.wikipedia.org/wiki/Command-
line_interface#Argumen...](https://en.wikipedia.org/wiki/Command-
line_interface#Arguments)

------
postalrat
This is like what happened before the big bang.

~~~
saagarjha
No, because what happens before main is observable.

------
jancsika
Did someone on HN downvote everything except the bullet points in this
article?

Seriously-- why is the font greyed out? What possible purpose does this serve?

~~~
slarrick
They have this in their stylesheet across the website

p{color:#999;line-height:1.4;margin-bottom:.75em}

so paragraph tags, sadly, have the ugly grey font color by default

------
classics2
Strange they looked into the topic and came to the completely incorrect
conclusion.

~~~
Osiris
Can you elaborate for those of us unfamiliar with topic?

~~~
monocasa
Doing some of that kind of super low level code in a C++ static initialization
constructor is almost certainly a poor choice. Like setting up memory and the
stack pointer. Doing hardware init there can make a lot of sense, but you have
to be careful from an architectural perspective.

~~~
tedunangst
The article is not advocating initializing the stack pointer with a C++
constructor.

~~~
monocasa
> * Memory segments are initialized. Memory segments such as .bss (for
> uninitialized data), .data (for initialized data such as static variables,
> global variables, local static variables, addresses of functions, and
> function pointers), and .text (where the actual code resides) are
> initialized and a valid stack is set up.

> * Command line arguments are received. This may not be relevant in embedded
> systems as in embedded systems we don’t usually call main() with arguments

> * The stack pointer is configured. This is necessary because the program
> needs to know where to start from. Note that some microcontrollers may
> require a start.c or cstart file that has the initialization code (whether
> this file is manually created or auto generated).

> Now that we know what happens before main(), we might wonder if there’s a
> way we can control what happens before main, or if there is a way we can
> write our own version of what happens before main(). In C, the answer is
> largely no. Whatever happens before main() is largely dependent on your
> architecture and compiler. However this is in fact possible in C++.

> One way you can do this in C++ is by declaring a global class object. Since
> global variables are initialized before main(), the constructor of the class
> you have initialized as global will run before main(). You can therefore
> place the code you want to run before main() in such a constructor.

Certainly looks like it is.

~~~
pjc50
You can't really initialize the stack pointer in a constructor, or indeed in
C++ at all; there's no syntax for it and the compiler may use the stack to
allocate local variables in the function prologue.

~~~
monocasa
You can if you're super duper careful and throw in a little inline asm,
depending on the architecture. ARM for instance is very likely not to need to
spill to the stack on tiny little leaf functions.

It's an absolutely terrible idea, but I've seen engineers be so afraid of asm
files that they'd try something like this.

~~~
makomk
The ARM Cortex-M series of chips is I believe kind enough to initialize the
stack pointer for you before your code even executes, by copying your chosen
stack pointer value from a special reserved location in the interrupt vector
table. So in principle you could write all your hardware init code in C.

~~~
thr0w__4w4y
Yes, exactly this. As someone who has written startup code in many flavors of
assembly (PIC, ARMv4, MIPS, PowerPC, AVR...), I appreciated what ARM did with
the design of the Cortex-M architecture -- they designed it so that you could
write fully-functional embedded software (firmware) without a line of
assembly.

Normally, the 2 places you can't avoid assembly are (1) the startup code
(because you're doing things like disabling interrrupts and setting the stack
pointer) and (2) interrupt service routines - usually there is a little bit of
magic on the front and back ends (for example on an older ARM7 chip, the CPU
didn't automatically push / save any registers onto the stack, you had to do
it yourself if you needed that).

With the Cortex-M, the CPU design and its microcode took care of all that, so
all of the messy assembly stuff went away. Now, as someone who started writing
6502 ASM as a kid, I kind of miss it, but as someone who has to build lots of
systems and ship products on deadlines, I like the change.

~~~
monocasa
Eh, on the M4s, they borked it. There's an errata that the floating point
spill doesn't take into account the divide pipeline, so it can not wait enough
time for the pipeline flush, and corrupt the register save. So you have to
write your own asm interrupt prologue anyway. : /

