Hacker News new | past | comments | ask | show | jobs | submit login
About Musl (libc.org)
218 points by peter_d_sherman on March 26, 2020 | hide | past | favorite | 159 comments



The musl community is one of my favorite things on the internet. It's obvious that many members of it could be making a generous salary at any number of companies but they're just hanging out on IRC because they don't give a shit and they love software.

I don't get the sense that they think they are infallible, just that they are right far more often than not. You can't achieve the sort of spartan aesthetic that you see in the musl codebase by handling things like everyone else. The ability to distill things to their essence betrays a very high level of understanding.

As they say in the streets, the dope sells itself.


The libc situation is pretty unfortunate. You can go with glibc for compatibility, but ideally you don't actually use the upstream version but copy the one from Debian that has crucial fixes that most of the world uses that they didn't bother to upstream because for a while there, glibc was maintained by the biggest asshole in all of software engineering.

On the other hand you have musl, and we can all appreciate its design goals, but then it is similarly maintained by people that believe they and their work are utterly infallible. Hence why to this date they have not made it possible to detect musl and would rather cite you a POSIX standards meeting from 1980 than fix an incompatibility everyone else handles differently. Just ctrl+F musl on http://landley.net/toybox/ for some insight.

The last time I felt like smashing the desk because of musl was when I realized you can't stacktrace a thread in a syscall because they refuse to annotate their 10 lines of assembler doing the syscall with the required CFI directives. They would rather wait for someone to write a 200 line AWK script that does it ([1]) than fix debugging on platforms like arm64.

1: https://github.com/bminor/musl/blob/master/tools/add-cfi.x86...


The inability to detect musl is indeed frustrating.

One example of this problem is in detecting C11 thread support. This is a feature of the C library, not the compiler, but it's the compiler that sets C11 feature flags. For threads, you can only detect that they aren't supported by __STDC_NO_THREADS__. But of course since this can't be defined retroactively in old versions of musl, you can't use it to detect C11 thread support in musl. And since you can't detect musl at all, you can't even just assume the version of musl is recent enough and turn it on unconditionally.

The musl team seems to advocate guessing at supported features by detecting with test compilation (a.k.a. configure scripts.) They don't believe in depending on specific versions of musl because features can be backported or disabled. Okay fine, but then why not give us explicit means to detect it? Why not just define MUSL_C11_THREADS or something so we can detect it at compile-time instead of during a slow, fuzzy, non-portable configure step?

Does anyone know of a workaround for this? Does musl define any kind of feature macro that can be used to detect C11 thread support?


Funny, GCC/glibc had this same problem (except you could work around it by detecting the version).


I've never had a problem with C++ on musl/Alpine. ZeroTier ports fine to that target. In fact we use Alpine/musl images to build static binaries for ancient Linux distributions that lack new enough distro-native compilers.


This is just C being generally bad at any feature detection. This is why the "./configure" exists, and you check for the existence of headers and functions there.


Depending on your compiler __has_include("threads.h") combined with a check for thrd_t or similar after should cover everything except the craziest cases.


Thanks, this looks like it will work.


Do you have a real system that uses threads if available, but can also work without them? What does the architecture look like?


BEAM, the Erlang virtual machine, could be compiled with or without threads until release 21 (from 2018).


Plenty of embedded libraries of optional thread support, though from what I've seen this option is explicitly enabled by setting a define specific to that library, and also passing in other needed info/function pointers since threading can vary so much on embedded systems.


There are many systems that I've worked on that do this in practice.

So firstly, a common use-case of threads is for improving performance. Using a job system is a good mechanism for this that doesn't strongly couple the architecture to the details of number of threads, or etc. Often this can be done with a fork-join model, where when you have a big chunk of work to do, you can create a bunch of jobs to do some work, then either wait on them immediately, or go off and do some other work while you wait, either approach works since it's possible to have a job system that will do work on the waiting thread.

At this point it's then pretty simple to either build in a fallback either directly into the job system, where you either do the work immediately upon launching it, or upon waiting on it. or it's trivial to detect presence of threads in the work code, and replace the job code with a simple for loop or something similar.

This is far from the only style of system I've seen that can work optionally with threads, but it's one I've seen working to good effect.


there are other threading options available than C11.


Adding alternatives (perhaps not usable now for no one but the interested reader): LLVM decided to start building their own last year: https://llvm.org/docs/Proposals/LLVMLibC.html


This great news, thanks a lot!


Looks like llvm-libc is being written in C++. Which is... odd.

https://github.com/llvm/llvm-project/tree/master/libc/src/st...


Why ist that odd? It makes lots of sense to manage resources via RAII for example. C++ has really good C interop too.


It forces you to include a C++ compiler in your toolchain, which might not be possible for older systems, etc.


Not a problem for LLVM since it's linker is always a cross-linker so if you're targetting a new (or only use the older platform as a compile target) then the initial cross-compile will be able to produce the binaries. If the older system can't support the LLVM toolchain memory wise or otherwise then it's not really a problem either.

https://lld.llvm.org/


You only need a C++ compiler to build llvm-libc itself. You do not need a C++ compiler to use it in your C program.


Every system has a C++ compiler nowadays, and if they do not, you have plenty options to compile yourself one.


No they don't.


Do you have an example? Keep in mind that we are not talking about the host platform here, but the target platform.


Microsoft did the same thing for the Universal CRT.

https://devblogs.microsoft.com/cppblog/the-great-c-runtime-c...


Why? libc is just an interface, you can implement it however you want.


Of course they did.


That musl refuses to implement dlclose() says it all, really. Musl's sin isn't all that different from old glibc's, really: both forget that the system serves programs, not the other way around. The musl author doesn't believe programs should unload code. Therefore, you're not allowed to do it.

Musl also removed access to Linux functionality because the author disagrees with the kernel's design:

"linux's sched_* syscalls actually implement the TPS (thread scheduling) functionality, not the PS (process scheduling) functionality which the sched_* functions are supposed to have. omitting support for the PS option (and having the sched_* interfaces fail with ENOSYS rather than omitting them, since some broken software assumes they exist) seems to be the only conforming way to do this on linux."

LOL. Do you really want to depend on a libc whose author thinks this way? Who removes functionality that the kernel provides because in his opinion the kernel's design is wrong?


> The musl author doesn't believe programs should unload code. Therefore, you're not allowed to do it.

Where did you get this? A rationale is given at [1], and it's basically:

1) dlclose() being a no-op is allowed by POSIX.

2) It's common for unloading and reloading libraries to trigger subtle/latent bugs, so making dlclose() a no-op leads to more reliable operation in the common case.

3) Managing thread-local storage in combination with unloading libraries is a complex problem, and arguably requires libc to either weaken error handling or leak memory.

This is certainly consistent with an opinionated-standards-lawyer-curmudgeon view of the world, but it seems pretty far from just saying that you're not allowed to do it because you shouldn't.

[1] https://wiki.musl-libc.org/functional-differences-from-glibc...


Yet the bumblebee flies: other libc implementations successfully implement code unloading and no amount of pointing at the standard will make me okay with musl not implementing code unloading, especially when part of the rationale is that other people write buggy code. It's not the place of libc to break functionality because the author believes other programmers can't use that functionality correctly.


Which ones? Last time I looked that list couldn't have included glibc, which while nominally supporting it had several, long-standing thread race bugs. The nexus of dynamic library loading and threading is a hot-spot for bugs in glibc.[1]

Excluding an interface is better than providing a buggy one, IMO.

[1] What glibc needs to do is finally merge libpthread into libc proper. The amount of stub functions, conditional logic, mutex reimplementation, and other chicanery glibc implements in an attempt to support runtime loading of libpthread[2] makes the code extremely difficult to navigate and debug. Which IME goes a long way toward explaining why there are years of unfixed races in glibc.

[2] By runtime I mean loading libpthread as a dependency of a shared library where the main binary wasn't linked against libpthread.


> nominally supporting it had several, long-standing thread race bugs

Name one.

And yes, pthreads should just be part of libc. Separating them is artificial and awkward.


I'm lazy so it's not specific to dlclose and unloading, here's one for dlerror I submitted in 2015 and which is still open: https://sourceware.org/bugzilla/show_bug.cgi?id=18192

I also point in that report to three other similar races which I didn't have the time to properly document (because, seriously, they're so numerous and the backlog so long, it's hard not to be cynical).[1]

Look, I like glibc. I applaud their focus on ABI compatibility. glibc is burdened by tremendous technical debt, much of it blameless. And the project doesn't see the investment and attention it should. musl by contrast is a breath of fresh air in terms of correctness and simplicity, but they achieve that in part by taking a different approach than glibc. And that approach isn't necessarily better, it's just different, and they each have their costs.

Pick your poison.

[1] And I don't like drive-by complaints. If I submit a ticket for a project I try to back it with substantial evidence and suggested remedies, such as this Python2 + FIPS OpenSSL bug analysis I submitted last year: https://bugs.launchpad.net/ubuntu/+source/python2.7/+bug/183... This is especially important for projects like glibc which are drowning in open bug reports.


> I'm lazy so it's not specific to dlclose and unloading, here's one for dlerror I submitted in 2015 and which is still open: https://sourceware.org/bugzilla/show_bug.cgi?id=18192

Okay, so the only bug you've managed to come up with during a discussion of dlclose has nothing to do with dlclose. I am very much unconvinced that dlclose has fundamental correctness problems.

> musl by contrast is a breath of fresh air in terms of correctness and simplicity, but they achieve that in part by taking a different approach than glibc. And that approach isn't necessarily better, it's just different, and they each have their costs.

Yes, musl has a different approach. And this approach makes musl unsuitable for anything I might want to do. The maintainer is too opinionated and thinks he can dictate software architecture to libc users. The glibc maintainers don't think so, not as much.


Okay, so here are some bugs which bit me in real life and at $WORK, around pthread_key_create + dlopen/dlclose:

1. libgobject (and anything depends on it, i.e. anything GTK) uses pthread_key_create to allocate thread local storage slots, but since threads can't be joined during dlclose(), there are no safe timing when we can call corresponding pthread_key_delete. It leads to bug reports like this: https://gitlab.gnome.org/GNOME/glib/issues/1311, and ultimately, GTK developers said they explicitly does not support dlclose()-ing their library because they can't see how is it possible: https://bugzilla.gnome.org/show_bug.cgi?id=733065. They suggest people dlopen() their library with RTLD_NODELETE, and good luck on those who indirectly depends on libgobject.

2. On Android, libcxx (the C++ runtime lib) does the same thing, it intentionally leaks pthread_key if you unload it. What's worse is the amount of available pthread_key is limited to 100, so after 100 iterations of dlopen()/dlclose() your process would simply crash. And, one of our significant customers insisted our library must pass their "while true dlopen dlclose" test, because hey, this is supported by libc, right? Mind you, Android does not use glibc, it uses bionic.

These two instance make me believe that it is not possible to support true dlclose() because, well, the two most used libc implementation tried and failed.


Apple has also considered making dlclose() a no-op on non-macOS (and the presentation is 3 years old so it may have happened by now)

https://news.ycombinator.com/item?id=20229723


No, dlclose still unloads the image if the reference count reaches 0: https://github.com/apple-open-source-mirror/dyld/blob/f033f5...


Pretty much 2) is saying "you're not allowed to do it because you shouldn't"...


Actually, it seems pretty consistent with the original point that "the system serves programs, not the other way around". dlopen() being a no-op means the system is more reliable, so it sounds like the right choice.


The sched_* functions, as implemented by glibc, do not do what POSIX says they should. On linux, it is not possible to implement them in a way that is compliant with the standard, so musl does not implement them. Given the principles specified in this webpage, that seems like a pretty reasonable thing to do.

musl does expose the extent of functionality provided by the kernel through the corresponding pthread scheduling functions.

> Who removes functionality that the kernel provides because in his opinion the kernel's design is wrong?

The kernel doesn't provide the necessary functionality. The functionality that the kernel does provide is still available in musl.

I can understand that this is probably frustrating (I'm sure I would be frustrated if I had to deal with it), but I can respect musl's decision not to have sched_* do the wrong thing.


> I can respect musl's decision not to have sched_* do the wrong thing.

Particularly because this actually ensures that programs linked to musl are 100% portable (modulo bugs), rather than introducing subtle platform incompatibilities due to slightly differing platform semantics.


> Who removes functionality that the kernel provides because in his opinion the kernel's design is wrong?

The functionality is still available via pthread sched functions as specified by POSIX, so I'd argue that functionality was not removed, just moved:

https://git.musl-libc.org/cgit/musl/commit/?id=1e21e78bf7a5

IMHO this makes sense for a libc that explicitly emphasizes POSIX-correctness. POSIX defines these functions, and it is against the spec to have process scheduling functions work as thread scheduling functions (which are a separate set of functions in POSIX).

edit: Just realized that this might prevent changing scheduling parameters of another process by PID, though.


> LOL. Do you really want to depend on a libc whose author thinks this way? Who removes functionality that the kernel provides because in his opinion the kernel's design is wrong?

musl can't remove the functionality provided by the kernel. You're still free to use syscall(3) or otherwise invoke the kernel routine yourself.

May I assume you abstained from using glibc prior to 2.30 on account of glibc not providing a wrapper for the gettid syscall? More apropos: do you dislike how glibc and musl both change the behavior of Linux setuid syscall to make it standards-conformant--process-global instead of thread-local? Should they simply provide the Linux semantics rather than the standard semantics? Has the kernel no culpability here for not providing the necessary interfaces for implementing the standard-compliant semantics?


You seem to either forget or not be aware in the first place that the purpose of libc and posix is to allow portability between different operating systems, so it makes complete sense that linux-specific capabilities are not exposed, it has nothing to do with disagreement about kernel design.


Except that the whole Linux ecosystem, starting with the kernel and going from there, simply doesn't really care about POSIX. Now, that Musl does is... well, would have been, long ago... a breath of fresh air. But the ship has sailed. The Linux kernel ABI has won.

Solaris, Illumos, the *BSDs, Windows -- all tried to emulate the Linux kernel ABI, and could not do a good enough job, so they all pretty much abandoned the project. The Linux kernel ABI won and that's that.

Well, if we could convince devs that static linking is bad... But even then, the Linux kernel ABI leaks for all sorts of things, starting with /proc.

And yes, static linking is bad, for a variety of reasons, the biggest of which is that static linking is stuck with 1970s semantics that suck because accidental symbol interposition happens all too easily. That doesn't mean that static linking couldn't grow ELF-like semantics, but if it hasn't over the past 40+ years... Still, even assuming that, there would be other reasons not to love static linking. If we could convince people to ship only dynamically-linked executables, then the ABI to emulate would be smaller, but still huge.


If the ecosystem doesn't care about POSIX it can hardly complain about choices made by a library whose purpose is to implement the POSIX standard.

It also has nothing to do with ABIs - the idea of a standard is to provide a common subset of functionality supported by all platforms that it targets.

Nothing about musl or any other libc implementation prevents developers from including the linux headers and using whatever linux-specific functionality they like.


> Solaris, Illumos, the *BSDs, Windows -- all tried to emulate the Linux kernel ABI, and could not do a good enough job, so they all pretty much abandoned the project. The Linux kernel ABI won and that's that.

I think the fact that they tried to emulate the ABI is a much stronger argument for the ABI than the fact that they abandoned doing so…


Well, it means they felt the need to support Linux executables on non-Linux kernels, and that the Linux kernel ABI turned out to be too big and too fast a moving target to manage the feat.


The Linux ABI is actually quite small and quite well-defined. Perhaps the most so out of all common kernel interfaces today.


It is most certainly not small.

Besides the system calls, the ioctls, and the fcntls, including all the various driver ioctls, there's also /proc. /proc is quite large an interface.


It is, but you can get quite far with emulating only parts of it.


Yes, but not enough to make a business of it.


No, that is not the primary purpose of libc. The primary purpose of libc is to run programs that the user wants to run, and removing random things make that harder. Whether the standard allows something is irrelevant: what matters is whether any given change is good for the user.


This says absolutely nothing, for any given change you can find a user who benefits from it and a user who doesn't, hence a coordination device called a standard has been invented.


> the chrt command is broken when built against musl-libc because that project's maintainer decided he didn't like the system calls it depends on, so he removed them¹ from his libc.

This software seems to be using Linux-specific functionality through the C library. Wouldn't it be better to use Linux system calls directly in that case?

I think it's weird how Linux has no user space library like the kernel DLLs found in Windows. Everyone assumes the C library is the interface to the kernel, even the Linux kernel manuals. Then people end up being burned when the C library maintainers quote POSIX standards at them or omit new system calls people want.

I wrote a liblinux² when I first realized this but it's currently unfinished. Eventually the kernel developers made an amazing nolibc.h file³ for their own tools but it doesn't seem to be used outside the kernel itself. Wish they'd publish it as an official Linux system call library!

¹ https://git.musl-libc.org/cgit/musl/commit/?id=1e21e78bf7a5

² https://github.com/matheusmoreira/liblinux

³ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...


Most operating systems define a libc ABI (or sometimes, merely API) as the interface to their kernel. Linux is the only OS--at least, the only major OS I'm aware of--that defines its interface boundary to be the system call layer itself.


I don't understand why they use libc for functionality that's specific to each kernel. A standard C library only makes sense for the POSIX stuff everyone follows. Even then people run into incompatibilities in practice because of implementation-defined or undefined behavior or bugs that have to be maintained.

In my opinion, Linux is the only operating system that did it right. The system call binary interface is clearly documented¹ at the processor architecture² level so it can be used from any language. Since everything in user space is replaceable, people could actually get rid of C and rewrite everything in Rust (or anything else) if they wanted to.

¹ http://man7.org/linux/man-pages/man2/syscalls.2.html

² http://man7.org/linux/man-pages/man2/syscall.2.html


> ... maintained by people that believe they and their work are utterly infallible. Hence why to this date they have not made it possible to detect musl ...

Yes, and they are every bit as fallible as anyone. Here is an example of a pretty pedestrian bug: https://github.com/build2/build2/issues/50

And that's a pity, really, since otherwise it looks like a reasonable implementation. But this dogmatic "we know better" attitude now triggers an allergic reaction.


I think we should cut the musl developers some slack; they're doing some really thankless work and giving it away.


This is what has always rubbed me the wrong way: (a) if it's given away for free, you do not have a right to complain about its quality, and (b) if you do complain, then why don't you go ahead and contribute. A gift horse, I know, but still...


I disagree, sometimes presence of something (even if free) can add more work for someone else.

For example imagine that you maintain a popular application, you start receiving issues that your application doesn't work with musl. You try to fix it, but musl doesn't provide a reliable way to detect itself. You try to submit a patch but it is rejected.

You absolutely have right to complain, lack of musl would prevent you from wasting time on making your application work with it.


If the quality is poor people have the right to say so, regardless of the cost.

If the projects existence is causing issues for some people, they have a right to complain, even if it is creating a free benefit for someone else.

Why would someone contribute to a project they think is inferior?

Basically, I don’t understand your objection.


I should have put (a) and (b) in quotes.


> ) if it's given away for free, you do not have a right to complain about its quality

What ? This does not make sense. Being free does not give any passes.


Considering the few lines that are mentioned as libc devs turning down feature requests, maybe they've tried.


Hypothetically, polishing turds and giving them away is also thankless work. Not saying musl is, don't know the first thing about it.


If it is, it's a turd people ask for.


Except that you don't have to do it if you don't want to.


> Just ctrl+F musl on http://landley.net/toybox/ for some insight.

Hahah wow that's some class A entertainment. I feel for you and yours!


> crucial fixes that most of the world uses that they didn't bother to upstream

If you are talking about eglibc, it's been merged back for years.


The specific anecdote I was thinking of is glibc not reloading resolv.conf automatically, making people think they have no internet connectivity for no good fucking reason for decades now.

Debian carries a patch since 2006: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=272265

Why is my Pidgin thinking I'm offline, 2007: https://developer.pidgin.im/ticket/2825

Firefox 2008: https://bugzilla.mozilla.org/show_bug.cgi?id=214538

Chef, 2015: https://github.com/chef/chef/issues/2894

Someone stumbled on this brokenness in Rust in 2017: https://github.com/rust-lang/rust/issues/41570

And then drums Wed Aug 2 19:12:20 2017 glibc 2.26 released with a patch. Maybe programs can rely on this behavior starting in 2027.


glibc's use of Intel TSX was also a bit of a tirefire, IMO.

It got turned on by default for all locks, broke quite a few of applications (which were using locks wrong, but still), then the instructions ended up being broken in every Intel architecture to date that has ever implemented them [1].

Most of the distros put in some patches to disable the use of TSX, but the 'no-lock-elision' flag provided by glibc didn't actually disable all attempts at lock elision. So the end result is that the glibc in Ubuntu 16.04 tries to use TSX for rwlocks, but not mutexes. Which is unfortunate, because the mechanism to opt-out of the use of TSX in glibc was only implemented for mutexes, but not rwlocks.

TSX only ever provided performance benefits in specific use cases, and at some point glibc upstream became sane and turned it all off by default and provided opt-in mechanisms instead.

[1]: skylake errata: "Using Intel TSX Instructions May Lead to Unpredictable System Behavior", https://www.intel.com/content/dam/www/public/us/en/documents...


TSX is supposed to fall back to an alternative implementation anyways if there's a transactional abort, right? What's the issue with keeping it in?


1. TSX has been broken in varying degrees in every architecture since haswell, and not always in a nice way like "always abort." In various steppings of haswell and early broadwell TSX was in various states of disablement, meaning that sometimes the instructions were advertised (in cpuid etc) to work but had subtle bugs, sometimes were advertised to work but were disabled such that any RTM instruction caused SIGILL, and sometimes correctly advertised to not work. The detection of the state of TSX was sufficiently unreliable that glibc ended up with a blacklist of CPU model numbers that were too problematic to even try using TSX.

2. The pthread implementation of mutex using TSX behaved differently from the prior implementation in cases of misuse. This was deemed acceptable because applications that were impacted were using mutexes illegally, but the result was that a ton of software in all the distros started crashing randomly (IIRC the scenario was a double unlock. pre-TSX glibc didn't complain, post-TSX the program crashed or something).

3. TSX was not generally beneficial. The glibc implementation did something like try using a transaction 3 or 5 times before giving up. The worst-case performance of starting the transaction, going as far into the critical section as possible, aborting, repeat 2x, then finally taking the lock normally is much worse than just taking the lock.


Are there other example than this one that's been fixed almost three years ago?


The reason you don't want to detect musl is the assumptions will break, say there is some incompatibility of some sort and the issue is brought up to the austin group which then clarifies the behavior which musl changes you now have a never ending number of versions you need to detect.


That might justify why you shouldn't detect musl, but it doesn't explain why you shouldn't detect features, a use case the musl devs are apparently also refusing to support.


> glibc was maintained by the biggest asshole in all of software engineering

You mean Roland? What did he do?


>> glibc was maintained by the biggest asshole in all of software engineering > You mean Roland? What did he do?

Ulrich Drepper


Out of curiosity: What did Drepper do? A quick DDG/Google search didn't turn up anything.

EDIT: Actually, searching for "Ulrich Drepper asshole" did turn up some stuff. I'd still be interested, as the stuff I found was rather general.


Here's one notorious example of Drepper's behavior:

https://sourceware.org/bugzilla/show_bug.cgi?id=4980

(2007) "gethostbyname() etc break for /etc/hosts with both ::1 and 127.0.0.1 localhost entries" Initially closed by Drepper as "WONTFIX" or "NOTABUG" (can't find state history and it's been a while!) with dismissive comments, continuing to do so after people patiently explained how this was a real issue. It was eventually resolved (I don't know by whom)


You're going down an end-less rathole...

Ulrich made tons of bad technical decisions, and mistreated everyone who disagreed with him, was the reason Debian forked glibc for a long time, etc. There are some famous bugzillas that are utterly unexplainable.


This seems that it could be upsetting to some people https://udrepper.livejournal.com/7326.html


Did you bother to read? It's about the combinatorial explosion that happens when you need to support lots of configurations. It's apolitical.


Yeah, that's one of the few things I've seen from Drepper that actually seemed reasonable and defensible.


I'd never seen that one. Wow!!


Am I missing the part that makes it upsetting? I was expecting something full of cusses and ad-hominems.



Must refer to Drepper.


Biggest problem for musl isn't actually a problem _with_ musl at all, it's this: the C stdlib isn't an ABI. What this means is, something compiled for glibc doesn't necessarily work with musl and will just crash. This problem goes much further than you'd expect. For instance, if you use .NET 3.1 with Alpine, you can't use Google's gRPC client (thankfully, you can use Microsoft's).


I see similar issues with devs trying to use pre-compiled (originally w/glibc) JVMs with Alpine. There are musl built JVMs, but the dev teams seem unaware musl is even there.


Distros that use musl make my life quantifiably worse for this reason.

It actually bit me on the ass a couple weeks ago when installing a rust binary instead of using cargo --install because it shaved ~10 minutes off running a CI pipeline. Had to fall back because the rust bin wasn't linked against musl libc when it was compiled for unknown Linux targets, and the authors don't distribute a musl compatible binary.


This actually goes further than that as different version of glibc aren't ABI compatible with one another.


They are backwards-compatible (which is all you can do of course).


Are they ?

If seen my share of old x64 binaries refusing to run on modern linux distro, spouting some sort of undecipherable glibc error message on startup (something along the lines of "version `GLIBC blah' not found)


That message happens when a new binary is run on an old glibc, not the other way around.


Yes, you'd think so indeed.

Unfortunately, I've seen the exact opposite in production: older binary running atop newer glibc and crashing with that kind of message.

glibc abi compatibility is and has been a joke.


Another thing is NodeJS. Ever tried using nvm to install a specific nodejs version (lts/carbon for example) in an Alpine CI image? It won't work and worse, it doesn't even show what doesn't work - the error message will simply be "node: file not found".

That this message implies that not the node binary was not found but libc.so is missing, is just mind boggling. Furthermore, installing libc-compat doesn't help, you will actually have to set NVM to download nodejs from the "unofficial-builds" repository. What a load of bollocks.


The reason for the funny message is that it’s not that it’s failing to find libc.so while doing dynamic linking; it’s the kernel failing to find the interpreter designated in the ELF image (ld-linux.so). Unfortunately the kernel responds with an ENOENT, which is indistinguishable to the caller from not finding the binary itself.


C, POSIX is a defined API though, the issue is glibc and the use of internal symbols or glibc specific api's in programs and for some reason things like rawmemchr.


The Microsoft client was probably moved to an all C# solution, which skips over this problem.


Excerpt:

"Attention to correctness

musl was the first Linux libc to have mutexes safe to use inside reference-counted objects, the first to have condvars where newly-arrived waiters can't steal wake events from previous waiters, and the first to have working thread cancellation without race conditions producing resource-leak or double-close. All of these are requirements of the specification that were ignored by other implementations, and getting them right was a consequence of careful reading of those specifications.

musl's entire development history has been a process of reading specifications, seeking clarifications when corner cases aren't adequately covered, and proceeding with extreme caution when implementing functionality that's underspecified."


So basically a lawyer view of software engineering.

It is very good to adhere to specs when possible, but not always.


So, as Einstein could have said it, "We should follow the specs as far as possible, but not any further."


I had some problems with musl last year and had to back out using it for an embedded system. Typically for example, 2 threads listening to their own socket bound using REUSEPORT wouldn't work; half the connection would arrive on a thread and the second one wouldn't get anything. That scared me a bit as I was on a deadline to deliver, so I had to quickly recompile my distro with glibc, and the problem disappeared.

I really wish it worked as I like the lean&mean&precise approach they've been using!


Other than #define'ing SO_REUSEPORT, libc has no role whatsoever in its behavior. You can see this yourself by grep'ing for SO_REUSEPORT in the glibc and musl source code. And both glibc and musl implement 1:1 threading, so it's the kernel making all the thread scheduling and inbound connection queueing decisions.

Your problem lay elsewhere, unless you were using a really old version of musl that lacked the SO_REUSEPORT definition.


I do not have an explanation either. I'm more of a kernel guy so I realized it "should" have worked but it didn't. I suspect perhaps a thread wakening issue of some sort, as the connections were definitely 'queued' on the listen socket. I know that that same code worked perfectly with glibc!


Alpine Linux uses musl libc, and SourceHut runs Alpine on all hosts and VMs. It's lean and mean and fast, and together with Alpine makes for a wonderfully simple and auditable system. I wouldn't dream of using anything else in production.


Dependency management can be a pain. Want to install numpy? You have to use alpine's version because compiling it from pip requires glibc.


Also, annoyingly, there is still no platform tag for python wheels for alpine/musl, so you cannot get binary wheels for it from pypi.


Using anything other than distro packages is a pain. pip is an anti-pattern and one giant security risk.


Disagree, although pip should assume --user option by default which installs the packages in user directory instead of system directory (some distros (was it Debian?) modify it to work that way).

The best way of using Python though, is to create a virtualenv (since Python 3, venv module is built in which makes it straight forward, python -m venv <directory>) and install packages inside of it. This gives you control to use exact versions of packages you need in your applications and makes your application not tied to the OS, so system upgrades and Python version updates are much easier to perform.


> This gives you control to use exact versions of packages you need in your applications and makes your application not tied to the OS

Is this true for drivers, too? like the coupling of cuda versions to tensorflow versions?


Unfortunately that's one weakness, but unless you package your application using system packager (which comes with its own set of issues[1])

If you have any dependencies that depend on system libraries there are two options:

- the libraries can be compiled statically, they call them manylinux wheels, this generally works well, except if your dependencies are overlapping with python or other packages' dependencies. Most commonly this happen if package depends on openssl. If the compiled-in openssl is different than the version on the system in certain circumstances python might crash. This is for example why psycopg2 was initially distributed as manylinux but now they opt for the second method (they still provide psycopg2-binary but they discourage its use; most time it works fine but it is a problem on certain distros)

- make python create bindings on installation, this makes installation process do small compilation to create bindings between system library and python, this makes the packages robust, but tied to the OS. In my experience this is not an issue, but it can be annoying, because you still don't have 100% control of dependencies.

This issue is what made me investigate Nix[2] package manager, because it gives you full control of all dependencies down to libc making everything fully reproducible, so you can control the exact version of python to use and all system and python dependencies.

[1] in one of my previous jobs on a team that I joined they were running their applications on CentOS 5 that was already EOL, because rebuilding RPMs to CentOS 7 (at the time the newest version) was a lot of work, another issue was that they were bound to python package version that came with the system, they could create their own RPMs, but no one did it because maintaining that was adding more work. I spent time converting the python code to be packaged using setuptools. Once that was done switching the OS was trivial. We finally also could use the latest versions of many of our dependencies.

[2] This is IMO good article which describes tooling that makes Nix much more enjoyable to use: https://christine.website/blog/how-i-start-nix-2020-03-08


by doing this, do you suggest also tying your dependencies versions to those packages with the distro version you're running?


Yes, or just shadow them with newer versions and run your own repo.


Can you do something like `pip install --no-binary :all:` (or equivalent pip.conf entry) ?


You normally will not get wheels with compiled binaries on alpine anyway because it is not compatible with any of the platform tags (manylinux* is glibc only). I think they're saying numpy won't compile with musl off the shelf, you need patches for alpine. Pip just won't help you there.


It might be lean and mean - but isn't generally glibc faster? I'd be curious to hear if you benchmarked cases where musl is faster?


musl, pronounced like the word “mussel” or “muscle”

Oh. I've been saying mew-sel this whole time.


I've been saying muzzle.


Alpine Linux is by far my favorite distro and I wish it had more support. Not only is musl nice, but it lacks a huge amount of obsolete cruft and the package manager is nice and simple and not over-engineered. It also lacks systemd.

It probably lacks a lot of the "enterprise" cruft in CentOS/RHEL or Debian, but that's a good thing for everyone else.


I use Alpine as the OS on my home server (it used to be SmartOS/Illumos) and I compile my entire stack myself using my own cross-platform (Solaris/Linux/OS X/FreeBSD/OpenBSD) build system similar to BSD ports. While there are some portability gotchas, by and large it's manageable.


I've been using musl on embedded systems for a while, and its simplicity is indeed quite a feature.

My biggest issue is the lack of ASan, or ever HWAsan. valgrind is nice, but quite slow. I know Rich has been working on a new allocator that will make ASan easier to implement, but it's still a few months away.


We were using Alpine Docker images. Every few months the docker build would fail with some random glibc / musl glibc-compat new issue.

The docker file is full of comments like the following:

# libc6-compat needed by outdated grpc-tools # we get # sh: node_modules/grpc-tools/bin/protoc: not found # can be removed if we remove grpc-tools and assume an existing protoc for dev # i.e. developers would independently intall protoc in order to generate # grpc-web # we install separate from the packages above to avoid a failure noticed on # 2019-11-01 # apk update && apk add protobuf grpc gcompat libc6-compat # ERROR: unsatisfiable constraints: # musl-1.1.20-r4: # breaks: libc6-compat-1.1.24-r0[musl=1.1.24-r0] # satisfies: musl-utils-1.1.20-r4[musl=1.1.20-r4] #RUN apk add libc6-compat

Bottom line is that it's awesome to have something lean like Alpine, but that can hardly justify the effort to maintain it. It feels like Linux two decades ago.


    # libc6-compat needed by outdated grpc-tools 
    # we get 
    # sh: node_modules/grpc-tools/bin/protoc: not found 
    # can be removed if we remove grpc-tools and assume an existing protoc for dev 
    # i.e. developers would independently intall protoc in order to generate 
    # grpc-web 
    # we install separate from the packages above to avoid a failure noticed on 
    # 2019-11-01 
    # apk update && apk add protobuf grpc gcompat libc6-compat 
    # ERROR: unsatisfiable constraints: 
    # musl-1.1.20-r4: 
    # breaks: libc6-compat-1.1.24-r0[musl=1.1.24-r0] 
    # satisfies: musl-utils-1.1.20-r4[musl=1.1.20-r4] 
    #RUN apk add libc6-compat

    
formatted that for you


Thank you! I missed this doc https://news.ycombinator.com/formatdoc


And for mobile readers ;)

# libc6-compat needed by outdated grpc-tools

# we get

# sh: node_modules/grpc-tools/bin/protoc: not found

# can be removed if we remove grpc-tools and assume an existing protoc for dev

# i.e. developers would independently intall protoc in order to generate

# grpc-web

# we install separate from the packages above to avoid a failure noticed on

# 2019-11-01

# apk update && apk add protobuf grpc gcompat libc6-compat

# ERROR: unsatisfiable constraints: # musl-1.1.20-r4:

# breaks: libc6-compat-1.1.24-r0[musl=1.1.24-r0]

# satisfies: musl-utils-1.1.20-r4[musl=1.1.20-r4]

#RUN apk add libc6-compat


This feels like more of a criticism of libc/glibc to me. In either case, it makes me glad that Go doesn't lean so heavily on libc/glibc (even though everyone complains endlessly about bugs like those that happened that one time MacOS made a syscall API change to the effect that users needed to update their binaries).


I think it was a good decision to cut out libc/MSVC on Linux and Windows, where they lead to dependencies that prevent you from freely building on one system and running on another. But for macOS this doesn’t make much sense; the facilities Go needs from libSystem are overwhelmingly backwards compatible and the macOS toolchain/ABI handles using weak linking to conditionally link to newer features very well. AFAICT using system calls directly on macOS is no better a way to accomplish those objectives than calling ntdll directly instead of Win32 APIs on Windows.


I guess I wasn't intending to advocate for hitting syscalls directly so much as I was advocating against targeting libc and calling it a day. Libc is a leaky interface (as previously discussed, things randomly break if the implementation isn't glibc which itself has lots of issues), and I like that Go targets each platform individually (even if there are issues with its macos bindings). It works out very well in practice, and even the issues with the macos bindings are nigh insignificant.


They could still target each platform individually even while using libc APIs where applicable only on macOS, since those are often the lowest level stable options available on Mac. My point is that there’s no reason you need to take exactly the same approach on each platform to achieve the same goals.


Yes, I completely agree—we’re saying the same thing.


Same with FreeBSD and most other systems really. It's very frustrating that the Go people decided that the Linux way is appropriate everywhere.

Directly touching syscalls instead of using libc sucks because a) it kills LD_PRELOAD hooks and b) it makes porting to new architectures extremely time-consuming. (e.g. I'm porting lots of stuff to FreeBSD/aarch64 — most languages were rather trivial to port, Go took months and several people)


That's the reason I don't use alpine for containers and just keep using debian.


Huh, nice domain name, is that new? It was musl-libc.org with a dash before..


It's fairly new, I remember there being a tweet about it but I can't seem to find it…



I've heard a lot of libc but not stdlib from other languages. I assume that most of languages that depend on C (such as Python, JVM) depend on libc as well. Is there any languages implements a runtime in C but implements its own low-level interfaces?


some k-family languages do that.

http://kparc.com/b/A.S


But why?


The k family likes to be the whole stack; they would view libc as an unnecessary huge[0] dependency.

[0] To k, every other language/library/program is too big.


Seeing that libc is undoubtedly already loaded in memory as a dynamic lib on all systems, that doesn't make much sense. Moreover kdb+/q links to libc, and it has very good performance and memory usage.


The bad thing is, it's pretty hard to compile Linux with musl. I don't remember the specifics, but it had to with the #define mess that glibc uses.


But the kernel itself doesn’t use glibc?


Definitely not. The kernel has its own implementations of any standard C functions which it needs.


Is that because of circular dependencies of glibc on the kernel, or something else?


>Is that because of circular dependencies of glibc on the kernel

That but also because lots of code in glibc would simply not work in kernel space


Circular.


I also stayed away from musl because of compatibility concerns. I see the appeal, and void even has a musl version. But I remember posts about software not being available, maybe sometimes just not being as easy to install. Thinking of steam there for example, though I'm sure there are solutions to make it run anyway.


You can just use flatpacked Steam if you want to use Musl Void. There might be still a few issues, but that's not because of Musl, but because of the Flatpack version.


Alpine (the docker distro) uses musl, but this means that you can't use many python wheels because they aren't musl based :-/


Alpine is not "the Docker distro". It predates Docker by quite some time and was used in lots of lightweight systems.

Python wheels compiled using the manylinux infrastructure actually do work on Alpine + musl.

That said I wouldn't recommend running Python on musl as I have seen incredibly weird behaviour including corruption of Python's low integer cache resulting in some truely awful consequences. i.e amounts of currency suddenly being 10x what they should be.


Didn’t Docker hire the person who designed / maintained Alpine?

Edit: Yes: https://thenewstack.io/alpine-linux-heart-docker/


It's also known to be slower to run python code: https://superuser.com/questions/1219609/why-is-the-alpine-do... I was able to reproduce and confirm these findings


Presumably those manylinux wheels do not have c bindings, when they do they are glibc based.


manylinux wheels are exclusively binary wheels with compiled parts, so they're all glibc-based (as the manylinux ABI is glibc).


Then, I can't see how they work on Alpine w/ musl.

@jpgvm Do you mean they work if you manually install them? b/c pip won't install glibc wheels on alpine.


There's no guarantee that they do and I certainly don't expect any random manylinux wheel off pypi to work on alpine. I maintain one that definitely doesn't. We build a musl wheel too but there's no musl platform tag so we don't have a good way to distribute it on pypi.


Wow, that's unfortunate. I've only seen mysterious crashes.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: