Hacker News new | past | comments | ask | show | jobs | submit login
Behind “Hello World” on Linux (jvns.ca)
76 points by HieronymusBosch 9 months ago | hide | past | favorite | 4 comments



Like I said in the blog post, I'd love suggestions for other things I've missed that happen when you run "hello world", _especially_ if there's a way to use a Linux spy tool to trace what exactly is happening.

Considering adding a few things that are happening in the kernel and trying to use `bpftrace` to trace them, but I need to figure out how to use bpftrace/kprobes and hunt down the relevant kernel functions.


> I’m honestly still a little confused about dynamic linking.

The software here is RTLD, the GNU RunTime LoaDer. In the ELF object there's a header labeled PT_INTERP which is short for the "program interpreter." Because yes, the python interpreter has itself an interpreter whose job it is to interpret the ELF file and make a block of memory that can be executed correctly (aka "loading" which is why this program is called the "loader" outside of ELF-world).

When you execve, after the kernel creates the program's address space it reads the contents of the file pointed to by PT_INTERP and writes it to the top of the program's address space, then calls RTLD_MAIN which is accessible by a pointer in the new address space - now the new process is actually doing something in user space. RTLD_MAIN sets up the rest of the loader state, reads the contents of the ELF binary its supposed to be loading, and only then does it start resolving dynamic library paths (if they exist). This is because the ELF object contains not only the libraries that need to be loaded alongside the executable, but also additional library include paths (via RPATH). The precedence order and cache behavior are documented (man ld-linux). When talking about LD_LIBRARY_PATH, LD_PRELOAD, rpaths, etc, it's important to remember that there's a distinction between linking (compile time behavior, used to create the ELF, Mach-O binaries that contain the references to other objects) and loading (run time behavior, used to control how the information in the objects are interpreted). As a nit this is why I don't like calling it the dynamic linker - it's a dynamic loader. The linking has already been done.

The loader maps the executable's binary into the rest of program memory. For external symbols from libraries, it resolves those symbols by searching the loaded dynamic libraries, loads that code into program memory, and then overwrites the callsites of those symbols with the now correct address of the symbol (which isn't known until this point).

(sidenote: even if you don't have a dynamically linked program, you still will need dynamic loading - modern statically linked binaries use a feature called Address Space Layout Randomization which (ab)uses the implementation of the dynamic loader to randomize where the symbols of your "statically" linked binary are going to be once the program is loaded into memory).

> I’m not going to talk about this because here I’m interested in general facts about how binaries are run on Linux, not the Python interpreter specifically.

Ah, but where is _start, and what happens before it? This is actually important to understand, because its ultimately up to the loader - RTLD from glibc is one loader and it does things quite differently than python3 - which is also a loader, and just happens to be loaded by the GNU libc loader if it was compiled and linked to, or a different loader (eg: musl libc's loader, or the kernel's loader). Because loaders can be loaded by other loaders, and "what is start" becomes important - and "what is being loaded" is deeply tied to the implementation language.

This is why there are those stray syscalls after loading: you're seeing what C programs are allowed to do before main. It's code inserted by the C compiler (for example, for the glibc runtime). Libraries are also allowed to insert code before main (for example, libpthread).


> modern statically linked binaries use a feature called Address Space Layout Randomization which (ab)uses the implementation of the dynamic loader to randomize where the symbols of your "statically" linked binary are going to be once the program is loaded into memory

Address randomization isn't a feature of the dynamic loader, but of the kernel's loader. As soon as you specify ET_DYN as the file type, the kernel adds up the fixed sizes of all the segments and the fixed gaps between them, and reserves space for them below the mmap base, which is randomized unless ASLR is disabled in the kernel. This occurs even if the program specifies no interpreter (e.g., if it's a dynamic loader being executed directly). The offsets between the program's symbols are fixed, even as their absolute addresses vary. If the program does specify an interpreter, then the interpreter has to obtain the program's entry point from AT_ENTRY in the auxiliary vector.

(Also, even non-position-independent programs of type ET_EXEC are partially subject to ASLR: the loadable segments and program break appear at a fixed address, but the stack's address is always randomized unless ASLR is disabled. Meanwhile, the kernel always attempts to randomize the address of the vDSO, except that if the stack is at the top of the address space, as it usually is without ASLR, the vDSO is placed below the mmap base, which is predictable but depends on the current value of RLIMIT_STACK.)


> (sidenote: even if you don't have a dynamically linked program, you still will need dynamic loading - modern statically linked binaries use a feature called Address Space Layout Randomization which (ab)uses the implementation of the dynamic loader to randomize where the symbols of your "statically" linked binary are going to be once the program is loaded into memory).

This is news to me. Wouldn’t a statically linked binary be just machine code with all the libraries and the addresses resolved at link time (i.e., no symbols to resolve at runtime) and bundled within the single executable file by the linker? Or are you referring to specific language implementations of “static” (since it’s in quotes) binaries?

I haven’t read in-depth about ASLR, but I thought it was mainly for libraries loaded dynamically (at runtime) where better security could possibly be achieved by the randomization of addresses. The OS has had the capability to load the (main) executable at any address even before ASLR.

Any pointers to deeper reading material on this?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: