Hacker News new | past | comments | ask | show | jobs | submit login
What happens before main() is executed in C and why is it important? (mymicrocontroller.com)
440 points by arash9k on May 4, 2018 | hide | past | web | favorite | 103 comments

One other way to observe these steps a bit more clearly is to look at the startup code provided by a chip's manufacturer or generated by an IDE that targets the chip, such as Keil.

Most will put this type of code in '.S' startup assembly files, which often also contain information like the memory addresses for hardware interrupts to use, and linker scripts for telling the compiler about the memory available on the chip.

For example, ST's 'CubeF0' package has some example projects for their simpler ARM chips:


I think that it is good practice with microcontrollers to tell GCC not to include generic startup logic in order to ensure that you are doing the right thing for your particular chip, by using flags such as --specs=nosys.specs, and even -nostdlib with -lgcc and -lc only if necessary.

It's not just "good practice", but instead the startup code is so specific to the platform that there really isn't "generic" startup code.

The generic compiler-provided startup code is fine for AVRs under most circumstances, since they're only produced by a single manufacturer and don't have fancy features like PLLs which need to be initialized. It's the more complicated chips where it gets a little hairy, especially if you have to deal with things like external RAM.

There actually isn't any reason STM or NXP or whatever ARM vendor couldn't provide AVR-like usability and awesome libc. For example the assembly blobs used on most STM32s are very simple.

It's simply that they want to push people to their propietary IDEs (to increase vendor lock in). STM used to provide an okayish libc implementation but then decided to bury it inside CUBE (MX).

If your design becomes complicated enough sure go ahead and change the startup code. But that's quite a small portion of all uses and for those we should have AVR like ease of use.

The choice of board support stuff like crystal frequency is stuff that needs to be tweaked per board really early in the boot process.

I guess I rarely see the case for an AVR when you're not prototyping on an Arduino. You can get chips that are an order of magnitude more powerful per $/watt/and any other metric you'd want to use.

Have you read "The Amazing $1 Microcontroller" [0], which might have been posted here before? It's a survey of available microcontrollers, with the intent of helping you choose one. If you have read it, are you in agreement with the article, for example on how AVRs compare to others?

[0] https://jaycarlson.net/microcontrollers/

Yeah, he put a lot of work into that, and I love that he published all of the information.

My biggest complaint though is that he expresses GPIO cycle time and interrupt latency in cycles, rather than normalized on real time.

Particularly when comparing a STM32F0 to an AVR, the perf looks almost even, until you realize that the STM32F0 is clocked 2.5x faster.

Good point. Have you, or someone else, mentioned this to the author?

Low latency IO and computational performance isn't always needed.

If it's not, you can drop the cost. There are AVR like 8051s that you can get for 10 cents.

I can't find any 8051 less than 30 cents and I digikey sells attinys for 17 cents.

I haven't confirmed what you've said yet, but it makes sense since 8051/8052 MCUs are generally used for their compatibility with legacy code.

I've been working on an agricultural spray controller that uses some silab part only because there is already assembly code written for it. The company that sells the controller doesn't want to spend the time rewriting it in C and moving to arm or pic.

> digikey

There's your problem. Digikey is only really meant for low volume.

That doesn't explain why you suggested a more expensive chip.

What's cheaper or not depends on your volume at that level.

It should be noted that relying on static initialization as suggested in the article, while at times useful, can open a new set of problems until initialization order is well-defined (it's out of scope for the article, so I imagine the author is aware and just opting for brevity). Some compilers provide an explicit ordering mechanism (clang and GCC) and the there are proposals in the C++ standards at the moment, but until then relying on static initialization is something one should think about potentially solving differently. Depending on the architecture, there tends to also be well-defined hooks for functions to run pre and post-main which is another consistent model for initialization/shutdown.

Edit: spelling

I had a really good experience designing an RTOS initialization system that both depended on static initialization, but didn't have issues with ordering either (and in fact didn't even have a main, post init just went right into the idle thread).

Basically I only let children of one class be statically initialized (and enforced this with a tool that goes through the generated binary and type information). These 'subsystem' classes then gets callbacks after all of the static initializers have run which allows them to find the other subsystems they depend on, and then that dependency graph is walked to initialize the full system (the dependency graph has to be a DAG or the system faults). Combined with a code review rule that static initializations only happen in one compilation unit in an anonymous namespace, and nothing else can happen there, means that no one can really touch other subsystems before they've all been initialized, and therefore the order doesn't matter.

I was really happy with how it turned out, despite being completely off the wall compared to how C++ normally works.

> despite being completely off the wall

embedded systems software in one phrase. :-D

> relying on static initialization is something one should think about potentially solving differently

This is such a notorious problem, it even has a name: Static Initialisation Order Fiasco (see: https://isocpp.org/wiki/faq/ctors#static-init-order)

The symmetrical Static Destruction Order Fiasco is also a very fun problem to deal with. The solution proposed for SIOF in the C++ FAQ does nothing to help you deal with this one, though.

Perhaps worse, lots of new languages, for example Go, copied this mistake from C++. :(

Is the "static initialization order fiasco" a thing that can happen in go? The order of package level variable initialization is well defined in the spec and because go doesn't allow import cycles it seems like the issue mentioned in the linked page is not possible.

Static initialization order may be defined, but that doesn't mean it's easy to reason about.

Actually, if you mean package initialization blocks, they were already present in Mesa derived languages.

In embedded you can usually modify every step from boot to main. They are just pieces of code you can interact with and make them do whatever you want and initialize stuff in whatever order you want, providing you know what you do.

Deviating with a war story now, I once modified some startup code so it self-detected its boot location and initialized the ram data according to this location shift. It was fun, but debugging code you relocated is a pain because you need to somehow relocate the symbols as well.

What was the project for where you wrote code that self-detected its boot location?

Well, a bootloader and the self-detection was useful for the bootloader self-update. The update bootloader would be copied in a different section and on reset would start and copy itself over the original bootloader section.

I also wanted to maintain full functionality even at shifted location in case the self-copy process somehow failed. Never did fail, at least to my knowledge.

In the embedded software context, you typically do your own static initialization in a startup.S file or equivalent. There should be 0 ambiguity as to when your DATA section is copied to RAM.

There's generally a difference between "static initialization" copying .data from ROM to RAM, and running static initialization functions out of the table. That table doesn't really have ordering constraints generally.

> Depending on the architecture, there tends to also be well-defined hooks for functions to run pre and post-main which is another consistent model for initialization/shutdown.

Is atexit the post-main hook to which you were referring, or is that just the posix hook into a larger class of post-main hooks?

Why not just looking at the source code?

I attached an example of Arduino Zero varients [1] (it is directly from Atmel's SDK.) It is quite straightforward to see what happends before main(). In embedded system, you can even change it easily if you want.

In this example, the stack pointer is updated at very first point after powering up the device [2]. And the undersocre-prefixed variables are defined in its linker script. [3]

[1]: https://github.com/arduino/ArduinoCore-samd/blob/master/boot...

[2]: https://github.com/arduino/ArduinoCore-samd/blob/master/boot...

[3]: https://github.com/arduino/ArduinoCore-samd/blob/master/boot...

Personally, I think global's with constructors are kind of an anti pattern. I've noticed a high correlation between crappy app's and app's with a lot of pre main() code. Especially if some of your global's have threads: because then you potentially have threads running AFTER main exits... I've seen a number of apps with random crashes at shutdown because a thread that was started in a global was still running and since main returned globals were being cleaned up and the thread used something that had just been cleaned up.

This article seems to be targeted toward embedded programming, so global state isnt that big of a deal because you have complete control of the chip. It's quite common to not even have threads or an operating system at all.

This is a real problem; i’m not sure rooting all mutable state on the stack will help in this case without language-level metadata restricting frees, passing references, and dereferencing.

It seems like the rust answer of making it difficult to share mutable state is a solid answer—you explicitly manage concurrent state, rather than implicitly allowing sharing. In addition rust has some interesting run-once constraints you can add to closures that work well with global initialization. I’ll admit I haven’t seen enough rust to see it work well in action, but the pieces are there.

> some of your global's have threads

Creating threads in pre-main code is generally a code smell, rather than running pre-main code.

To me one curious thing about the main() function without arguments is, these arguments are still pushed onto the stack. You can find them by dereferencing a pointer of another stack variable and playing around with the offset.

    // Just tested with mingw-gcc on Win8.1
    // Maybe you need to play around with the offset, +28
    int main() {
        int i; // Another stack variable

        printf("%d\n", (long long)&i + 28); // &argv
        printf("%d\n", *((void**)((long long)&i + 28))); // argv
        printf("%s\n", ((char**)*((void**)((long long)&i + 28)))[0]); // argv[0] 

        return 0;
Exclaimer: I don't know how this behaves in an embedded environment.

> Exclaimer: I don't know how this behaves in an embedded environment.

The handful of times I've written startup code for an embedded micro, I've always passed zero arguments to the program's "command line", not even its own program name.

Ie, from the perspective of _start, declare the application's entry point as int main(int argc, char argv), call it with main(0, NULL), and ignore the return value.

ARM passes the first several arguments in registers anyway, so your UB trick wouldn't reveal anything interesting.

Here's what works on my Mac, compiled with clang -m32.

  #include <stdio.h>
  int main() {
  	int i;                                // Another stack variable
  	printf("%s\n", **(char ***)(&i + 8)); // argv[0]
  	return 0;
Of course, this won't work at all on x86_64 because arguments are passed on registers instead of the stack.

On Windows this dependes on the calling convention. The calling convention determines whether function arguments are passed: In order or in reverse order. Who takes care of cleaning the stack: caller or callee? And even if the arguments are passed by stack or CPU registers.

In Visual Studio you can control this behavoir with the __cdecl, __clrcall, __stdcall, __fastcall, __thiscall, __vectorcall [1] calling convention function prefixes, but this also depends on the optimization options [2] of the compiler.

Passing arguments via CPU registers has a huge preformance benefit, but also some drawbacks.

[1]: https://msdn.microsoft.com/en-us/library/984x0h58.aspx

[2]: https://msdn.microsoft.com/en-us/library/46t77ak2.aspx

> Passing arguments via CPU registers has a huge preformance benefit, but also some drawbacks.

Like what?

On architectures like x86, the set of registers is limited and some are special, so there is some contention going on.

I don't know much about AMD64 but if the arguments are passed to registers then how it is possible to retrieve argc/argv later, as other functions likely to touch registers?

They're pushed on to the stack if another function will touch them.

Something surprising about C is that an empty argument list doesn't mean "no arguments", it means "unspecified arguments".


Though that is unlikely to be related to the arguments passed to main.

The C standard says arguments to main() should be defined as:

  int main(void)

  int main(int argc, char *argv[])

  (or equivalent, eg. int can be replaced by a typedef name 
  defined as int, or the type of argv can be written as
  char ** argv).
From http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf, Section Program startup)

EDIT: apologies for the atrocious formatting, but it wasn't clear to me how to put two asterisks in a sequence here on HN without preformatted text.

It actually has a whole other half to the sentence, following the semi-colon.

* http://jdebp.eu./FGA/legality-of-void-main.html

I took a detour at the footnote reference, and never came back. Interesting to note, thank you.

In embedded system, many manufacturer’s default startup code does not even implement argc/argv. In that case this won’t work, or you need to implement your own startup code.

totally unportable and undefined behavior

I have strong feeling that while this obviously works on i386 linux (and probably any unix for that matter) if it works on even amd64 it's just a pure coincidence.

It shouldn’t work on amd64 because the x64 calling convention puts the arguments in the registers first

In embedded applications a lot of things happen before you can even attach a debugger. Most microcontrollers go through a non programmable hardware initialisation sequence which initialises enough hardware for the processor to run. The processor then runs from a section of read only (or at least write protected) memory. This code performs a further stage of initialisation and patching. This can be used by the manufacturer to implement their own bootloader or to perform configuration of the hardware to get around certain hardware bugs. Finally this code jumps to a known memory/flash location where the code that the engineer has written is located. It is from this point that you can usually start debugging.

Depending on the platform, one can get one to three levels of loaders which have a some range of configuration items (pins/register config) and programmability before you even to 'application' code. (trying to recall if I've ever hit more than three...) And it's often accessible by the developers and not just the manufacturers. Different mfrs are open to openly annoying about the details of access at that level, but you can often get debugger access at those early bootloader levels, though most embedded dev environments sensibly default to later stages.

Sometimes strange bugs manifest due to misalignments between configs at early boot stages and what the later code needs, so you want to be able to review and change those stages.

Is this even public for us as developers, can we influence it and it influence us? If not, should we even think about it?

I had a lot of curiosity about how JTAG works and how I can build my own debugger for general targets, but is seems documentation for this is thin. I should look over openocd to see how those guys are doing stuff.

Most manufacturers do in broad terms disclose how the boot sequence works. Here is an example of the information disclosed about a Cypress microcontroller [1]. In some applications the startup time can be critical so this needs to be known. The code that runs before the engineers code doesn't usually have any effect and it puts the microcontroller into a known documented state so you don't really need to think about it.

[1] https://community.cypress.com/community/psoc-6/blog/2017/05/...

It depends heavily on the SoC in question. On an unlocked STM32F4, you can attach JTAG pretty much immediately even if it's way off in the weeds executing garbage, but BCM283x (RPi) has to setup the pinmux for the JTAG first from the ARM core, so you have to already be running some of your own code.

This is a misleading headline for the general audience.

What he writes is an introduction to bare metal programs for somebody who has worked under an operating system before.

In both cases a lot of code is run before main(). But very different code. In bare metal you more likely need to care what it does. In the operating system case the developers of the OS more likely have done more for you than you will ever need to know about.

A couple of years ago there was a conference presentation (FOSDEM?) about what a program does before reaching main(). The presenter (a Brit IIRC) went pretty fast, did not go into too much detail and it took him 30-40 minutes. I think the study was done on a BSD. Maybe somebody can provide a link, the video is online, but Google resists to help me...

That sounds like Brooks Davis' talk that he has given at a few conferences:

https://www.youtube.com/watch?v=yWCMy5EiNkQ https://people.freebsd.org/~brooks/talks/eurobsdcon2016-hell...

Exactly! The name of the speaker helps to find what I was looking for https://mirrors.dotsrc.org/fosdem/2017/Janson/hello_world.vp...

This seems to be concentrating on what the system loader is doing without mentioning whatever the statically linked standard library gets up to before calling the main function?

On microcontrollers, you often don't have a standard library, or if you do, it is limited, and doesn't do things before main. The article describes pretty well what happens on a microcontroller.

I would add that in some (most?) environments, the code that gets executed before main() is easily accessible: it sits in files within the project (written in assembly most of the time).

Cool, he is talking about the loader though, which would imply that this is all language independent? Except for where he mentions things getting executed before main in C++, which I'm assuming would be the doings of the C++ standard library.

It's still pretty language independent. Qemu for instance is a c codebase that depends on static initialization functions that run before main to register the different components.

In any other context, yes, you're missing C library predecessors to main(). But it's specifically referencing embedded systems, which is why hardware initialization and memory segments are mentioned.

There articles are so incomplete and always make it to the top on HN... I need some explanation.

It's not hard to take a look at doing this yourself: just run a program, break at main, and check what's in your stack backtrace.

you can have pre-main functions in in C as well (as a GCC extension). If you define functions with `__attribute__(constructor)` and they will be called before main(). You can even set the priority on them, and set post-main destructor callbacks. This works even for shared libraries loaded in run-time.

I've used this as a trick to automatically run unit tests, though, not for any real work.

"Yeah, yeah, but your scientists were so preoccupied with whether or not they could that they didn't stop to think if they should."

It can be useful for registering things automatically, like Go's init() functions which I've used on occasion.

I am a neophyte x86 asm user.

I link using gcc and -nostdlib plus a very simple start.S:

   .globl _start
   call main
   movl $1,%eax
   xorl %ebx,%ebx
   int $0x80
fasm 1.asm;gcc -nostdlib start.S 1.o lib1.a;a.out

"lib1.a" contains only the functions needed by a.out instead of, e.g., every function in libc.a

I use ar -M < 1.mri to script changes to lib1.a

(Did I forget about crt0.o?)

Below is a sample "1.asm" source file which in one expt yields a 2.2K static binary.

"main" has been renamed to "xyz".

    format ELF
    section '.text' executable
    public xyz 

    extrn _exit
    extrn write

    mov eax,len00
    push eax
    push char00
    mov eax,1
    push eax
    call write
    call _exit

    section '.data' writeable

    char00 db "21 symbols", 0xA 
    len00 = $ - char00

(1.3K stripped)

Why even have the overhead of a function call to main in that case?

Because you need a syscall to exit the process.

Is that not what this part of the code does?

  movl $1,%eax
  int $0x80

I love the XML question (whether your app does stuff with XML).

There have historically been some big security holes when parsing XML that it is a security code smell now if you are working with it (especially in lower level languages like C or C++).

The linker script is what runs the global constructors in C++. Nothing preventing using a custom linker script in C to call some `premain()` function.

At least in gcc you can use constructor attribute to run code before main. I've seen this used to initialize modules and create a dependency tree of modules to initialize before main even starts.

To be picky, the linker script lays out a sequence of calls to the global constructors so that the C runtime startup knows where to find and execute them.

Is it strange that C++ is able to do this but C itself cannot? Sometimes I hear the argument that "C is lower level" and their is the idiom that "C++ is C but better" (not that I believe these are true)

It's somewhat misleading when they say the C++ code runs before main. What's going on is after the same initialization steps the first code of your program is executed.

C and C++ have different syntax for what that code is, but it's pure semantics as far as the computer is concerned and C++ gives you zero extra power.

It is certainly possible to do this with C by editing a startup code. In embedded system, a source code of startup is provided and exposed by manufacturer's SDK so it is quite straightforward to do this.

Besides that, I can't see much difference between doing something before main() and just putting some code on the very top of main().

That's funny. In Swift i needed to make sure something runs as early as possible and I had to do the same trick. Put the code in the declaration of a variable like this:

var loadTheme: Bool = { Style.loadTheme(); return true; }()

Found this article really interesting! It got me to write my first C program.

> Command line arguments are received.

Nit: that’s not the best wording. if the executable is being executed by something like `exec` then these are just “arguments”; the command line (i.e. a shell) isn’t involved. Do embedded systems support something like `exec`?

They're still called command-line arguments out of convention: https://en.wikipedia.org/wiki/Command-line_interface#Argumen...

This is like what happened before the big bang.

No, because what happens before main is observable.

Did someone on HN downvote everything except the bullet points in this article?

Seriously-- why is the font greyed out? What possible purpose does this serve?

They have this in their stylesheet across the website


so paragraph tags, sadly, have the ugly grey font color by default

YMMD... seriously I opened the article only looked at the bullet points and than quickly discarded it...

Strange they looked into the topic and came to the completely incorrect conclusion.

Can you elaborate for those of us unfamiliar with topic?

Doing some of that kind of super low level code in a C++ static initialization constructor is almost certainly a poor choice. Like setting up memory and the stack pointer. Doing hardware init there can make a lot of sense, but you have to be careful from an architectural perspective.

The article is not advocating initializing the stack pointer with a C++ constructor.

> * Memory segments are initialized. Memory segments such as .bss (for uninitialized data), .data (for initialized data such as static variables, global variables, local static variables, addresses of functions, and function pointers), and .text (where the actual code resides) are initialized and a valid stack is set up.

> * Command line arguments are received. This may not be relevant in embedded systems as in embedded systems we don’t usually call main() with arguments

> * The stack pointer is configured. This is necessary because the program needs to know where to start from. Note that some microcontrollers may require a start.c or cstart file that has the initialization code (whether this file is manually created or auto generated).

> Now that we know what happens before main(), we might wonder if there’s a way we can control what happens before main, or if there is a way we can write our own version of what happens before main(). In C, the answer is largely no. Whatever happens before main() is largely dependent on your architecture and compiler. However this is in fact possible in C++.

> One way you can do this in C++ is by declaring a global class object. Since global variables are initialized before main(), the constructor of the class you have initialized as global will run before main(). You can therefore place the code you want to run before main() in such a constructor.

Certainly looks like it is.

The article is conflating two things:

* What happens before main()

* What can you do to make code run before main()

It suggests only the avenue of C++ global constructors to make code run before main() (as others have noted, __attribute__((constructor)) is basically the same thing for C). But there are other ways to make code run before main, such as by use of linker scripts and assembly files to put code in _start that eventually calls main() (note that it's this latter way by which all of the things they mention are done).

You can't really initialize the stack pointer in a constructor, or indeed in C++ at all; there's no syntax for it and the compiler may use the stack to allocate local variables in the function prologue.

You can if you're super duper careful and throw in a little inline asm, depending on the architecture. ARM for instance is very likely not to need to spill to the stack on tiny little leaf functions.

It's an absolutely terrible idea, but I've seen engineers be so afraid of asm files that they'd try something like this.

The ARM Cortex-M series of chips is I believe kind enough to initialize the stack pointer for you before your code even executes, by copying your chosen stack pointer value from a special reserved location in the interrupt vector table. So in principle you could write all your hardware init code in C.

Yes, exactly this. As someone who has written startup code in many flavors of assembly (PIC, ARMv4, MIPS, PowerPC, AVR...), I appreciated what ARM did with the design of the Cortex-M architecture -- they designed it so that you could write fully-functional embedded software (firmware) without a line of assembly.

Normally, the 2 places you can't avoid assembly are (1) the startup code (because you're doing things like disabling interrrupts and setting the stack pointer) and (2) interrupt service routines - usually there is a little bit of magic on the front and back ends (for example on an older ARM7 chip, the CPU didn't automatically push / save any registers onto the stack, you had to do it yourself if you needed that).

With the Cortex-M, the CPU design and its microcode took care of all that, so all of the messy assembly stuff went away. Now, as someone who started writing 6502 ASM as a kid, I kind of miss it, but as someone who has to build lots of systems and ship products on deadlines, I like the change.

Eh, on the M4s, they borked it. There's an errata that the floating point spill doesn't take into account the divide pipeline, so it can not wait enough time for the pipeline flush, and corrupt the register save. So you have to write your own asm interrupt prologue anyway. : /

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact