Using main as entry point might not be the best option, as that way some initialization code is skipped and for instance not even the program arguments will be easily accessible. Instead, using _start from crt1.o should be the better option:
gcc -fPIC -shared -o test test.c -Wl,--entry,_start -D 'PT_INTERP="/lib64/ld-linux-x86-64.so.2"' /usr/lib64/crt1.o
It's interesting that libraries and executables always have been considered different things by the mainstream operating systems.
In Oberon systems object files are executable files. Every object file can have subcommands, like the git command has init, add, commit etc. but it also contains the library functions (like libgit has) in that same object file. In Oberon all those concepts are the same thing.
It's the other way around that can be suboptimal. If some application wants to integrate git it could execute in a subshell (which is frowned upon for good reasons) or use libgit instead. This works great in the case of git, but for all other useful stuff that doesnt have its own lib that provides that basic functionality you're out of luck.
> It's the other way around that can be suboptimal. If some application wants to integrate git it could execute in a subshell (which is frowned upon for good reasons) or use libgit instead.
What's wrong with invoking an executable or using libgit?
It seems you're totally missing the whole point. The problem was never that you could not consume libraries from an executable. The actual problem is that you're somehow expecting to consume a third-party package in a way that the project maintainers do not support, let alone maintain.
Adding a build flag that supports a convoluted and arguably useless usecases of linking with symbols shipped in third-party executables change nothing.
Maybe it looks weird now because libraries were collections of subroutines, then have evolved as sub-programs (some spawn their own threads to do their job), and then eventually got so big that they became the main program in which you plug your own moving parts: frameworks.
Moreover, it is interesting to note that in the same time, the opposite trend exists: some libraries used to be the "meat" of a program, but as connecting two programs in flexible ways is not always easy, that core was extracted and made available directly to other programs. Well-known examples are libCurl and zlib.
If I import a library, I probably don't want to spend the overhead importing a gui I won't be using. Code stripping could handle it but that assumes you know how to strip the libraries safely.
Another way to look at it is other needs outstrip the mild convenience of packaging library and exe code together.
What GUI? I think you're misunderstanding. No one is proposing changing how things link.
The concept here is that executables and objects are the same structures and there's no reason why we shouldn't be able to link to a routine found in /bin/ls as we would link to a routine found in libm.so.
This has nothing to do with "packaging library and exe code together."
I suppose I extrapolated a bit too far but I don't really see the point if not to package them together. If you wanted to link against an executable such that it was safe to do so, why not go further and pull apart a library for that executable and others anyway?
With executables you know they have one and only one entry point that conforms to the OS’s process execution conventions, which is a useful property to have be immediately apparent. Libraries can be seen as either a superset (any number of such entry points, most often zero) or as the separate category of binaries having no such entry point.
In the world of client-side rendering, one could ask a somewhat similar question of why HTML and JS files are a different thing, and why you can’t just load a JS file to run your SPA in a browser.
And in the opposite direction Windows executables can always be loaded as libraries. There's also rundll32 for libraries designed to be run like executables.
For classic .NET, yes, all executables are fully valid libraries. (For .net core, the executables actually compile to dlls with a native runner executable sometimes included alongside it. (Depends on exact build command used).
For win32, loading executables as code libraries is not officially supported. Now, it does sort-of work, but there are limitations that are totally undocumented. What is officially supported is dynamically loading executables to access resources.
ubuntu@ip-10-16-0-17:~$ /lib/aarch64-linux-gnu/libc.so.6
GNU C Library (Ubuntu GLIBC 2.31-0ubuntu9.9) stable release version 2.31.
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 9.4.0.
libc ABIs: UNIQUE ABSOLUTE
For bug reporting instructions, please see:
<https://bugs.launchpad.net/ubuntu/+source/glibc/+bugs>.
Also apparently libpthread, which I guess is part of glibc:
$ /lib/x86_64-linux-gnu/libpthread-2.31.so
Native POSIX Threads Library
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Forced unwind support included.
I'm going to echo the other comments here that Windows has this concept via rundll/32, and I believe the PE file format doesn't make much of a distinction either (both EXEs and DLLs can have both imports and exports.) My experience has been mainly on the Windows side so this may bias my perspective, but I've always thought how shared libraries work on Unix-like systems to be unusual. Perhaps it's a result of reusing the linking concept of static libraries and making it more like an add-on to a system which never had the concept of dynamic linking originally. As for PE vs ELF, the former is definitely a far simpler format than the latter (and Apple's MachO is even weirder.)
Ah, yes, the format that includes code for systems that haven't been modern in 30 years, and is built on top of like 5 different legacy formats, is much simpler...
Certain symbols are provided by the dynamic linker, such as __tls_get_addr for certain forms of TLS access, or the __rseq_* symbols describing the restartable sequences area maintained by the kernel.
It's even reasonable to call the loader a linker (a "dynamic" one), as its main job is updating addresses in jumps etc. to match the locations where it loads modules.
I've always liked the shared libraries that print version information when executed - it's a simple ergonomic convenience.
One of the central libraries in our system as, IIRC, its own main loop that's never run in normal operation - usually the process of injecting it into a target process also ensures it gets set to an entry point that'll actually do stuff.
But you can execute it and it'll just sit there spinning happily on its own like some kind of weird debugging singularity. There must be something we can use it for but I'm not sure what...
My guess is that you can fix both the "need to exit()" and all the other issues that you haven't run into but exist, by making your entry point _start instead of main (which is NOT the first thing that runs in your program; _start is!).
If you `readelf -h` an executable on your system and then look up the entry point it gives you in the executable's symbols you'll see it's indeed _start.
I have thought of an alternative to a C FFI by using a schema based binary serialisation format but protobuffers is useless because of the serialisation and copying overhead. Something like capnproto or flatbuffers would be much better.
The goal would be to use Capnproto RPC but rip the asynchronous part out by dynamically linking the library and rather than using a thread pool you just execute some sort of "message_receive" function with the serialised data in the library. Alternatively, regular capnproto RPC is used over TCP but in this case it wouldn't matter if the library is in the same process or a standalone executable on a different computer with which you can communicate over the network.
Actually, can you go the other way, using an executable as a shared library?
Here's my usage scenario: I want to test a subset of the code in a large binary. One thing I could do is to build a test program from a subset of the application's sources, and then call the required code for the test. But that gets laborious in a large codebase.
So could one hack around that, by somehow turning the binary into a shared library, and telling LD to ignore the fact that there's a main() in there, and replace it with a different one?
That's the first part of the article. Yes, yes you can as long as the symbols are exported and you do a little magic. So, yes with the standard C toolchain. For other languages/stacks/whatever you're using, it would probably depend on your tooling.
Offtopic, but I've been bitten often by linking problems caused by version mismatches in libc. Why can't the linker deal with newer versions of shared libraries which are backward compatible? Shouldn't the .so file format provide some way to express this?
The problem is not the .so, which has all versions, but the .o files, that don't have a way to represent the version they would prefer to use. C declarations also lack a way to specify those versions, and the last version is the only one that is guaranteed to correspond to what's declared in the headers (because the older versions could have a different ABI... in fact, if they did have the same ABI, a change in version wouldn't have been necessary in the first place).
The fact that some libraries have a PT_INTERP entry and that PIE executables are not of type ET_EXEC makes it impossible to detect whether an ELF binary is a library or an executable. Not a good design decision, in my opinion.
And if you double-click a PIE executable in nautilus (GNOME file manager), it won't run it. That's why Firefox didn't ship as PIE for a long time in Mozilla's pre-built tarballs (because a lot of users would run it at least the first time through a file manager). Work around: a dummy non-PIE executable that exec()s the PIE one.
Edit: tangent: the excuse nautilus developers have for not having fixed this bug (for probably more than 10 years) is that people should use .desktop files... which don't work to execute an application extracted from a tarball either (because IIRC they want either an absolute path to the executable, or one that is in $PATH, or something along these lines).
To what extent can these steps be replaced by running ld-linux directly? AIUI, even Windows has its RUNDLL/RUNDLL32.EXE that can accomplish the same thing.
Could someone who knows about computers comment: how much of this text is specific to Linux, how much is specific to C, and how much is true for all computers?
rundll32.exe is just a way of doing FFI from as a command -- there are few different options on UNIX-likr systems for that. The most obvious is ctypes.sh [0].
There are, though, even more sophisticated options than just FFI, like the Witchcraft Compiler Collection [1], which includes among other things an interactive shell.
I'm just surprised there isn't a standard, even POSIX utility to run shared libs but by calling specific functions. It would avoid the need for executables that just wrap a few api's in the shared lib.
Standardization seemingly has windows, which if missed, make it far less likely and harder. Even when a need is recognized - eg, python's years of struggle to create a central module repository. But especially when you hit "why would anyone want that?!?" and "we don't approach things that way" barriers.
I recall many years back... dlopen/dlsym can redirect through a lazily-set lookup table, so a seemingly obvious thing to want, is to reset table entries, for live reloading. Given a very simple lookup table, value setting didn't seem a bizarre ask. But "changing a binding?!?" and "there's no standard spec" and "intriguing, but why?" and... I'd not be surprised if it never happened.
I love to see a sketch of the "how hard is it to standardize something, and what you can do about it" space.
Not quite. When producing an executable (relocatable or not), link editors may assume that there is just one such executable in the process image. This enables optimizations, typically around global data access and TLS access. Two such executables cannot be loaded at the same time into the same process image.
This is hugely architecture-dependent. In CheriBSD for example - and probably also FreeBSD in general - the compiler doesn’t assume that, and that’s one of the features that make CHERI process colocation possible.
gcc -fPIC -shared -o test test.c -Wl,--entry,_start -D 'PT_INTERP="/lib64/ld-linux-x86-64.so.2"' /usr/lib64/crt1.o