Hacker News new | past | comments | ask | show | jobs | submit login
A general overview of what happens before main() (2019) (embeddedartistry.com)
216 points by xept on Aug 22, 2022 | hide | past | favorite | 42 comments



If you want a Blinkenlights visualization of what happens from BIOS boot to main() then watch the second video under this heading: https://justine.lol/sectorlisp2/#emulation



with a twist that officially Rust doesn't have "life before main()". There's no user code running before main(). There are no static initializers with run-time constructors like C++ SIOF.


Related, how to print "Hello world" without libc: https://www.youtube.com/watch?v=b9OJYMFpHRU&ab_channel=Rubbe...


At 7:44 they go into the “magic numbers” used to define kernel syscalls like how Cosmopolitan [0] uses for building things like RedBean [1].

0. https://github.com/jart/cosmopolitan

1. https://redbean.dev/



Also see the "From Zero to main()" series:

https://interrupt.memfault.com/blog/zero-to-main-1


> For example, OS X only has dynamically linked applications

Statically linked code is permissible on Intel.


Please provide a source for a statically linked application on MacOS, Intel or otherwise. When I wrote this article, Apple's own documentation said it was not supported. libSystem is also only provided as a dynamic library.


IMO it's worth distinguishing between what the OS allows and what the OS vendor supports. The macOS kernel still has a lot of FreeBSD heritage, including the ability to make direct syscalls, but Apple only guarantees compatibility for syscalls made via libSystem.

This is a reflection of historical differences in opinion about whether the stable ABI for an OS kernel should be the kernel itself (ala Linux), libc (ala Solaris or OpenBSD), or a language-agnostic library (Windows NT's ntdll).

macOS seems to be in a transition period toward the Windows model, with libSystem providing trampolines into the kernel. However, they don't yet enforce this model like OpenBSD does, so if you're willing to risk the syscall numbers changing you can do without libSystem.

Here's an example, lightly adapted from <https://john-millikin.com/unix-syscalls#darwin-x86-64>:

  .data
   .set L_STDOUT,        1
   .set L_SYSCALL_EXIT,  0x2000001
   .set L_SYSCALL_WRITE, 0x2000004
   L_message:
    .ascii "Hello, world!\n"
    .set L_message_len, . - L_message
  
  .text
   .global start
   start:
    # write(STDOUT, message, message_len)
    mov     $L_SYSCALL_WRITE, %rax
    mov     $L_STDOUT,        %rdi
    lea     L_message(%rip),  %rsi
    mov     $L_message_len,   %rdx
    syscall
  
    # exit(0)
    mov     $L_SYSCALL_EXIT, %rax
    mov     $0,              %rdi
    syscall
Compile (assemble?) it to a static binary:

  $ as -arch x86_64 hello.S -o hello.o
  $ ld -arch x86_64 -o hello -static hello.o
  $ file hello
  hello: Mach-O 64-bit executable x86_64
It runs fine:

  $ ./hello
  Hello, world!


Hey John,

Thanks for commenting. I had read your article via another comment here. I had not thought about the syscall approach before, because as you note Apple does not (well, did not previously) guarantee syscall stability.



Don’t.



Thanks. It is a clever approach. But at best all I can do is amend the original statement to "only dynamically linked applications are supported". Apple does not guarantee syscall stability (as is evident by Go's big break, and eventual move to libSystem).


Apple actually guarantees syscall stability of x86-64 applications :)



Believe me, I know: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que.... But with Rosetta Apple seems to have committed to the ABI now.


TIL! I hadn't come across this change of heart yet.


I thought "statically linked" just meant that all of the dependency binaries were rolled into the main executable at build time - how could macOS even stop you doing that?

I assume my understanding of static linking is too simplistic though.


On some OSes (Linux and macOS for example) a dynamically linked executable specifies the runtime linker as its “interpreter”, and the runtime linker contains the initial entry point that the kernel starts the process’s execution from. A statically linked executable is then one that doesn’t specify any interpreter at all, and which directly receives control from the kernel.

This distinction is related to but not the same as statically or dynamically linking an individual dependency; only if all dependencies are statically linked can an executable then be statically linked.


One dependency would be libc, which is typically how you would call into the OS using system calls. macOS has not typically supported a stable interface for these.


It says _start (crt0.o) comes from libc. But note, if you run nm on a binary you can see _start is part of the binary, not libc.so.


The "libc" is more than libc.so. Your system headers, for example, must be shipped with your libc, even though everything that would change if you replace them with something nonsensical would be in "your" binary, not in libc.so. libc functions don't stop coming from the libc if you statically link them.

A C runtime is often provided by the same project as the C standard library (the other viable option is to have it provided by the compiler project).


> Your system headers, for example, must be shipped with your libc,

Not all. Typically stdarg.h is not. On Linux, <sys/*> is not.


It says "usually". It varies! But, in general, the libc implementations I have looked at handle this. This includes Newlib, glibc (they call it start.s), picolibc, musl (crt1), and my own libc.

The presence in your binary only means that crt0.o was statically linked (may be an expectation of the loader on your system). If you linked against a static libc, you would also see those symbols as part of the binary.


I think your last paragraph is wrong. crt0.o is static even when linking to libc.so dynamically. The ELF format has something in the headers for an entry point which will point at _start in your own binary, not in libc.so.

I'm pretty sure it works that way on windows too, but they call the symbol mainCRTStartup or some such.


What, exactly, is wrong?

The point I am making is that crt0.o usually comes from libc, even if it is linked separately.

I suspect that the loader expects the ELF entry point to be an address in the application, not a dynamically linked library, which is why crt0.o must be statically linked.


"nm" it's your friend here.


A minor quibble: the Mac operating system is no longer called OS X. The name changed to macOS in 2016, to align with the branding of Apple's other operating systems.


I'd just like to interject for a moment. What you're referring to as Linux, is in fact, GNU/Linux, or as I've recently taken to calling it, GNU plus Linux. Linux is not an operating system unto itself, but rather another free component of a fully functioning GNU system made useful by the GNU corelibs, shell utilities and vital system components comprising a full OS as defined by POSIX.

Many computer users run a modified version of the GNU system every day, without realizing it. Through a peculiar turn of events, the version of GNU which is widely used today is often called "Linux", and many of its users are not aware that it is basically the GNU system, developed by the GNU Project.

There really is a Linux, and these people are using it, but it is just a part of the system they use. Linux is the kernel: the program in the system that allocates the machine's resources to the other programs that you run. The kernel is an essential part of an operating system, but useless by itself; it can only function in the context of a complete operating system. Linux is normally used in combination with the GNU operating system: the whole system is basically GNU with Linux added, or GNU/Linux. All the so-called "Linux" distributions are really distributions of GNU/Linux.


If Stallman had not written gcc, gdb, his version of emacs, the basic gnu utils, and, most importantly, the GPL, there would probably be no "Linux". Torvalds would have had to write a complete OS, and he probably would have got a job and a life before he accomplished that.

It's too bad RMS got sidetracked with Hurd. But, the GNU system now runs with several kernels - the Linux kernel is just the most developed and best known one.


There are also operating systems using Linux that have no GNU at all. So it doesn't bother me at all when someone calls it Linux, and I'll continue to call it that myself unless there is additional context needed. Even then, I'd probably just reference specific GNU tools since there are other userspace tools on the system that are also necessary, but are not GNU.

https://www.glaucuslinux.org/

https://www.alpinelinux.org/


That ex X still has us perplexed.


marketing!


I'd just like to interject for a moment. What you're referring to as macOS, is in fact, Darwin/macOS, or as I've recently taken to calling it, Darwin plus macOS. Darwin is not an operating system unto itself, but rather another free component of a fully functioning Unix system made useful by the BSD corelibs, shell utilities and vital system components comprising a full OS as defined by POSIX.

Many computer users run a modified version of the Darwin system every day, without realizing it. Through a peculiar turn of events, the version of Darwin which is widely used today is often called "macOS", and many of its users are not aware that it is basically the Darwin system, developed by Next Computer.

There really is a macOS, and these people are using it, but it is just a part of the system they use. XNU is the kernel: the program in the system that allocates the machine's resources to the other programs that you run. The kernel is an essential part of an operating system, but useless by itself; it can only function in the context of a complete operating system. Darwin is normally used in combination with the macOS operating system: the whole system is basically Darwin with macOS added, or Darwin/macOS. All the so-called "macOS" versions are really versions of Darwin/macOS.

---

Apple's engineers still refer to the OS as Mac OS X. Ventura is technically 10.18, despite the 13 major number in their marketing.


13.0 is not just a marketing number. It is the number stamped in the binaries produced with the Ventura SDK. It is the number in Ventura SystemVersion.plist. It is number used in the availability markup in the headers provided by the Ventura SDK. It is the number you use for runtime version checks when you use `#available` in Swift or `@available` in Objective-C. You will not find 10.18 in any of those build or runtime contexts (or anywhere else) because it is not the version number of macOS Ventura.


> Apple's engineers still refer to the OS as Mac OS X. Ventura is technically 10.18, despite the 13 major number in their marketing.

No.


Your comment, probably meant as satire, adds nothing of value to the discussion and invites a pointless debate about naming.


It is satire. It is a spin on a famous quote by Richard Matthew Stallman (RMS) about Linux (or "GNU/Linux").


Its parent was even more pedantic.


I can’t tell if you’re being intentionally funny or just funny.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: