Keep in mind that many of the complaints in this thread were posted in the context of that original URL.
Like sure. Today, a lot of the historical reasons for things seem silly and irrelevant. At one point, they did not seem silly and irrelevant. For compatibility with stuff sticking around from those days, we get some performance penalties that are not strictly necessary. I don’t think anyone is doing that to be an asshole, so the oddly antagonistic tone seems unjustified.
And yes, Windows with a module-level namespace is cleaner in this regard, but Windows design is entirely different and has plenty of its own skeletons. ELF does not, to me, feel significantly more horrible than PE. And I’m not speaking from inexperience; I did at least write a couple of ELF and PE parsing softwares over time, most recently go-winloader.
Do we need to override symbols in the same library? Probably not... kind of. Your modules may in fact not need this. However, libc probably does. Take a look at what symbols libpthread exports on your system some time.
I hate to be the person to point this out, but please consider not approaching subjects from this position. It feels alienating, and I have no idea why it’s necessary to have such a tone.
Someone in a sibling thread said it's not bad to write like that for catharsis... I guess to blow off steam or something. But if the method of blowing off steam is belittling other smart people that don't always make perfect decisions then it's probably not a great way to go. If you need to write it for catharsis, go for it, but there's no need to publish it.
Otherwise, my questions on the technical side: Would this performance hit and the alternative option have been obvious at the time? If so, was there a reasonable trade off for why this approach was taken? Or was this choice only wrong in retrospect?
But that said, I don’t think dynamic linking is in the ELF spec. I believe that’s a de facto OS + dev tools thing rather than an ELF spec de jure thing. His points are still valid.
I'm not tone policing but contesting the premise that "aggressive tone" = "direct". For example
>(Windows took a different approach and got it right. In Windows, it's okay for multiple DLLs to provide the same symbol, and there's no sad and desperate effort to pretend that a single namespace is still cool.)
>(Windows got this right, where multiple DLLs can provide the same symbol)
There you go. Shorter, and not wasting 3 lines to express your feelings, and _you can still say Windows got it right_.
My feeling is that you can go in and describe a thing succinctly and to the point, and actually get your opinion across! It will be more effective, shorter, and your opinions are backed up with fact! No fluff needed.
It drives the point home though, and keeps the energy levels higher.
Personally for me, concise and information-dense writing makes me sit up and pay attention.
Is the argument too weak to do that on its own without the abuse?
Arguments are seldom weak or strong based on their technical details.
Heck, the technical details about this were already known to many people including several involved in build setups, but nobody cared anyway.
Being right is no excuse for being an asshole.
The attitude will appeal to some. It will strike many others in the wrong way and put them on the defensive.
There's no reason to write this way. A concise, well-articulated, non-combative post will appeal to everyone and still convey the same information.
Attacking that inanimate object is not without emotional repercussions to those related to that object.
Agreed that we should not be excessively abrasive, and I think the facebook post is leaning that way - but I don't think a world without criticism can work - at a certain point in any field you have to face reality, in which some things work and some don't, and to protect every person involved from "emotional repercussions" is impossible because generally people's beliefs and feelings are all over the map.
At any rate I'm not sure the author of LD_PRELOAD or ELF dynamic symbol interposition is scanning this thread - and after ~30 years distance might have different opinions about them, or at least a thicker skin :)
Essentially a lot of criticism of the original article seems to be of the form of “reading strong, angry opinions like this make me feel insecure so please don’t do that”. And that might be a good reason to avoid writing like that. But tone policing, and insisting on emotionally desaturated writing has a cost for the reader and the writer. I think it makes us smaller. And it keeps us in our heads rather than in our hearts. That’s just not the way I want to live my life.
You'll find somebody liking everything you can name. Even every attrocity has been done by somebody.
Not attacking the person is of course the correct thing.
But if you can't even attack the attrocity or bad choice, we've gone too far with this sensitivity thing.
> But if you can't even attack...
Why do we need to attack something? If we can't explain and support our view that something is wrong or a bad decision without resorting to attacks, perhaps our argument isn't really that strong?
And that's the thing. I don't think the author really presented a strong argument; he tried to convince me by verbally trashing the other side, while the actual logical, coherent argument is buried in a sea of disdain. I think it's still not clear what the default should be. Do we optimize for performance, or for debugability and tinkerability? I mean, I feel like that's one of the classic debates that we still -- and will probably never -- have no hard answer for.
Edited to add: I went back and read the linked Python bug tracker issue, which honestly I wish was what HN linked to. It's concise, explains the problem, explains why LD_PRELOAD isn't all that useful for libpython, specifically why this sort of performance degradation is even worse with a library like libpython, and makes sure to call out that this change only affects libpython and not any other shared libraries, where (implicitly) people might find LD_PRELOAD useful.
It's rather the opposite: if we don't resort to attack, comdent the practice, raise the tone, our argument will be weak.
That's because it's not enough to be right. It also need to be memorable and resontant. Else people's eye will just glaze over it.
That's why this post has 218 comments as of now, and you where involved and will remember it better tomorrow, than some purely technical explanation that probably wouldn't even have made it in the first page (or have 0-10 comments, typical of such posts).
(a) if you attack the practice is not abuse. Abuse is when you attack the person.
(b) if you don't have a colorful tone and strongly condemn something, for most the complaint wont even register
Maybe the argument isn't actually that good if you need to resort to theatrics to make the point register?
There's a big difference between 'clarity' and confrontational, negative language.
I found it not particularly clear at all. The entire thing can be boiled down to 2 or 3 sentences, or maybe as many as 10 if you want to include more background information.
When I was reading it, about halfway through I was thinking "god, when is this mediocre rant going to get to the meat of how to fix the problem?"
This means the writer is choosing to write for audience entertainment instead of technical advancement & improving the status quo. It's not impossible to do both at the same time, without the insulting tone.
I don't think that it would've been hard to maintain the exact same level of clarity regarding the subject matter. Perhaps the read was more entertaining due to the abrasive tone, but was it actually easier to read?
Ultimately, design decisions are made by people - if someone made such a takedown of ideas of mine, I would probably be somewhat discouraged, at least as long as I know they are being sincere and not just doing a bit. I don't think people should be flinching in their criticisms of ideas, but being fair to nuance and history really would be welcome too. It's one thing to point out dysfunctions in things, but it's different to pull out a sort of Angry Video Game Nerd-esque personality and drag how horrible things are through the mud.
Maybe this post is more in jest than not and I(/we?) simply did not pick up on the tone being purely for entertainment value. But that's the thing. When people read things like this, I think a lot of people take it too seriously and start to embody this attitude, and it leads to the kind of thought processes where things are either good, or stupid/evil/whatever, with no room for things that are just "not perfect, but overall fine."
Hell, I feel kind of bad due to how unnecessarily personal my comments regarding this feel. How would the author feel reading this? The fact that I may be right doesn't matter, because I'm not some kind of uncaring asshole, and I think most people are not if they are in the right mind.
> But that said, I don’t think dynamic linking is in the ELF spec. I believe that’s a de facto OS + dev tools thing rather than an ELF spec de jure thing. His points are still valid.
Like I had said initially, I do not take any issue with the factual content of the post, and agree that most people should be using these compiler flags. And yes, it is, however many years later from when this may have not been the norm, now clearly a good idea to make all of your symbols hidden by default. No disagreements from me. I just hope that people don't walk away with the idea that some morons from the past made some horribly stupid mistakes because they just had no idea what they were doing. I wasn't there, but it doesn't feel like that's what happened at all; it feels like as things panned out some things worked out well, and some things did not work out well. Some ideas are more clearly 'bad' ideas than they were. Even today, it would probably be unwise to assume we still know 100% what we're doing. Personally, I think it's hard to ever be absolutely sure you are taking the right lessons away from things that don't work out well.
But I do agree, it felt too thick. Still a very interesting topic regardless.
I've actually made it something I won't compromise on: I refuse to hire people who I suspect will have this attitude. It's one thing to express frustration at previous decisions that make current work more difficult. However, when I see that morph into an arrogance of "how could people have made such a stupid decision" (especially when some of those "people" may still work at the company), without even trying to understand the context of why that decision was made in the first place, it shows to me that person is not an engineer I want to work with.
1. PEP 445 makes the use case of LD_PRELOAD irrelevant.
2. A change like this would go under obvious code review and testing to make it into a released version.
3. The risk of a regression would still exist but that can either be caught by #2 or the existing unit testing already in Python.
(Disclaimer: I have contributed to the Python codebase)
Much of this is the cruft the author of TFA identifies.
But so much of it was just "nature of the beast" at the time.
Generosity toward these shortcomings is always in order.
> (Windows took a different approach and got it right. In Windows, it's okay for multiple DLLs to provide the same symbol, and there's no sad and desperate effort to pretend that a single namespace is still cool.)
There's no need for technical topics to be emotionally charged, it only detracts from communicating the importance, correctness or benefits.
Solaris offers a similar model called direct bindings.
It becomes a charged subject matter when one works at companies like Google and Facebook and gets used to navigating performance reviews.
The way things were at Amazon when I was there, posts like this would count against a Principal Engineer promotion.
One of the standards engineers are expected to meet is "Respect what has gone before". You don't know the full details of what was going on at the time, you don't know the trade-offs and why, you don't know what they did and didn't know about the situation and what couldn't have been foreseen at the time decisions were made.
Generally speaking people aren't idiots. They do the best they can with what they have, under the circumstances they're operating within, to meet the goals they have.
Almost no one sets out to make a monster impossible to maintain, or with diabolical performance.
Treat it with respect, even while you work to replace it.
Perhaps because the subject matter doesn't matter.
The really important takeaway message is more about the industry/community not paying attention to shitty defaults and winging it for decades, than about the potential speedup and/or this particular mechanism.
As it contains almost the same info without the rant and with better explanation.
For example Vim (if so configured), or Postgres PL/Python, or the old Apache mod_python.
A reference to the classic line, oft attributed to JWZ:
"Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems".
The point being, switching from CPython to PyPy does not simply give you the performance boost and that's it. It comes with its own tradeoffs, including:
(a) trading speed for much more memory consumption
(b) slower C FFI - important for all kinds of Python workflows (e.g. Pandas, Numpy, and so on)
(c) behind mainline CPython releases, and with subtle incompatibilities
(d) slower startup times (due to the JITting involved)
(e) different garbage collection model (and less predictable)
(f) less support (from companies, distros, etc), fewer ports, less manpower to port quickly to new platforms (e.g. Apple's M1).
The issue is discussed here . I also attempted to backport the fix to the official Python Docker image , but wasn't able to get much traction. I wish this would land since everyone using these images would get an instant speedup.
On Gentoo, the ebuild for Python and Ruby both have --enable-shared hardcoded.
Also, if you use prebuilt Ruby binaries from RVM, those were also built with --enable-shared. And if you build your own Ruby binaries with ruby-build, it also does --enable-shared.
So, there are still plenty of opportunities around for instant speedup.
They then proposed it for upstreaming here:
And that's the new default.
The link is to a facebook post by someone who neither discovered nor implemented this change.
> This article focuses on one specific performance improvement in the python38 package. As we'll explain, Python 3.8 is built with the GNU Compiler Collection (GCC)'s -fno-semantic-interposition flag. Enabling this flag disables semantic interposition, which can increase run speed by as much as 30%.
(not logged in to FB, so maybe TFA is a reference to this one?)
And having just read that yesterday, I was very confused to open this post and immediately see complaints about the author's tone.
The author is confused. no-semantic-interposition does not preclude unicity of symbols. There are things like -Wl,-Bsymbolic-functions to relinquish that (and lose strict conformity against the programming language you are using, by the way). When I read a virulent post at least I would like the author to have some deep expertise on the subject, and not mix everything they read about (including related but different matters). I won't even talk about the "interposition is useless anyway because clang has an historical bug" part.
As for elf shared libs providing a single symbol space, there are arguments for and against.
An argument for a single symbol space is that a process with segregated one per dynamic module is a beast the C and C++ standards know nothing about. And don't even get me started on processes with multiple different C or C++ runtimes simultaneously (take a look at the list of CRTs loaded in an instance of explorer.exe, it is horrifying). Another one is that you can easily move things around between dynamic libs or split them, etc. If various subsets are actually used by multiple executables, this can be quite useful (lacking that property, note that MS had to invent their own additional virtual/redirection layer to refactor the Win32 libs)
A practical argument against is that it is "hard" (read: virtually impossible) to dynamically link binary only modules maintained with different-enough toolchains. Likewise for different versions of the same lib (e.g. via transitive deps). But then the question of is this even a good idea shall be asked (well, if you want to load a plugin in a proprietary software I understand this can be wanted -- and this is actually similar to using libs of a proprietary platform; any other use case?)
Summary: Python is 1.3x faster when compiled in a way that re-examines shitty technical decisions from the 1990s.
ELF is the executable and shared library format on Linux and other Unixy systems. It comes to us from 1992's Solaris 2.0, from back before even the first season of the X-Files aired. ELF files (like X-Files) are full of barely-understood horrors described only in dusty old documents that nobody reads. If you don't know anything about symbol visibility, semantic interposition, relocations, the PLT, and the GOT, ELF will eat your program's performance. (Granted, that's better than being eaten by some monster from a secret underground government base.)
ELF kills performance because it tries too hard to make the new-in-1992 world of dynamic linking look and act like the old world of static linking. ELF goes to tremendous lengths to make sure that every reference to a function or a variable throughout a process refers to the same function or variable no matter what shared library contains each reference. Everything is consistent.
This approach is clean, elegant, and wrong: the cost of maintaining this ridiculous bijection between symbol name and symbol address is that each reference to a function or variable needs to go through a table of pointers that the dynamic linker maintains --- even when the reference is one function in a shared library calling another function in the same shared library. Yes, `mylibrary_foo()` in `libmylibrary.so` has to pay for the equivalent of a virtual function call every time it calls `mylibrary_bar()` just in case some other shared library loaded earlier happened to provide a different `mylibrary_bar()`. That basically never happens. (Weak symbols are an exception, but that's a subject for a different rant.)
(Windows took a different approach and got it right. In Windows, it's okay for multiple DLLs to provide the same symbol, and there's no sad and desperate effort to pretend that a single namespace is still cool.)
There's basically one case where anyone actually relies on this ELF table lookup stuff (called "interposition"): `LD_PRELOAD`. `LD_PRELOAD` lets you provide your own implementation of any function in a program by pre-loading a shared library containing that function before a program starts. If your `LD_PRELOAD`ed library provides a `mylibrary_bar()`, the ELF table lookup goo will make sure that `mylibrary_foo()` calls your `LD_PRELOAD`ed `mylibrary_bar()` instead of the one in your program. It's nice and dynamic, right? In exchange for every program on earth being massively slower than it has to be all the time, you, programmer, can replace `mylibrary_bar()` with `printf("XXX calling bar!!!")` by setting an environment variable. Good trade-off, right?
LOL. There is no trade-off. You don't get to choose between performance and flexibility. You don't get to choose one. You get to choose zero things. Interposition has been broken for years: a certain non-GNU upstart compiler starting with "c" has been committing the unforgivable sin of optimizing calls between functions in the same shared library. Clang will inline that call from `mylibrary_foo()` to `mylibrary_bar()`, ELF be damned, and it's right to do so, because interposition is ridiculous and stupid and optimizes for c00l l1inker tr1ckz over the things people buy computers to actually do --- like render 314341 layers of nested iframe.
Still, this Clang thing does mean that `LD_PRELOAD` interposition no longer affects all calls, because with Clang, contra the specification, will inline some calls to functions not marked inline --- which breaks some people's c00l l1inker tr1ckz . But we're all still paying the cost of PLT calls and GOT lookups anyway, all to support a feature (`LD_PRELOAD`) that doesn't even work reliably anymore, because, well, why change the defaults?
Eventually, someone working on Python (ironically, of all things) noticed this waste of good performance. "Let's tell the compiler to do what Clang does accidentally, but all the time, and on purpose". Python got 30% faster without having to touch a single line of code in the Python interpreter.
(This state of affairs is clearly evidence in favor of the software industry's assessment of its own intellectual prowess and justifies software people randomly commenting on things outside their alleged expertise.)
All programs should be built with `-Bsymbolic` and `-fno-semantic-interposition`. All symbols should be hidden by default. `LD_PRELOAD` still works in this mode, but only for calls _between_ shared libraries, not calls _inside_ shared libraries. One day, I hope as a profession we learn to change the default settings on our tools.
This often allows loading of multiple versions of the same dependency in the same program without ugly hacks. Which is grate if you have multiple dependencies which both have the same sub-dependency (each internal only to their dependent) but need different versions.
It's kinda a nightmare if you run into this problem in languages which don't support it.
Honestly, how can someone into tech post something like this on FB?
Someone saw it and shared it to HN; enough read it to upvote it this much.. maybe Facebook's more popular than I thought! (That sounds silly or sarcastic, but 'among HN users and similar' I'm serious.)
Is this about something I can do to speed up our 3.8 Python code, or about why Python 3.8 is faster than 3.7?
Eventually, someone working on Python (ironically, of all things) noticed this waste of good performance
But It would be good to know when & what versions.
I'm also not sure why this is "ironic". Who else but the experts on python would be more likely to discover this & resolve the issue? Which basically makes the whole thing a non-issue:
Python creators made a choice when creating python. A while later they realized they could improve performance by revisiting that choice.
The tone of the article makes it sound like this was an embarrassing mistake of massive proportions.
The article is talking about a bad decision in ELF and dynamic linking, not in Python specifically. The Python people just discovered that disabling that default behavior was useful.
Though my servers are already overprovisioned, and latency is probably more due to memory / IO and... oh nevermind.
Kudos to the python devs, looking forward to more speed improvements.
Furthermore this options still allow the thinks you need interposition for, for calls from/to external dynamic linked libraries like libc.
But most important gVisor is based around intercepting system calls (over simplified), for which you don't need symbol interposition.
This might be OK, if two versions were provided.
I could not replicate speed benefits except for heavy stack usage. Python 3.10 is 8% slower than python 3.8. Python 3.10 with said optimization is 3% slower overall than Python 3.10 without it except for stack usage.
python-speed v1.2 using python v3.8.5
string/mem: 1476.42 ms
pi calc/math: 1817.37 ms
regex: 1984.63 ms
fibonnaci/stack: 1085.79 ms
total: 6364.21 ms (lower is better)
Python 3.10 no optimization
python-speed v1.2 using python v3.10.0
string/mem: 1580.52 ms
pi calc/math: 1796.23 ms
regex: 2110.86 ms
fibonnaci/stack: 1337.18 ms
total: 6824.79 ms (lower is better)
Python 3.10 with optimization
python-speed v1.2 using python v3.10.0
string/mem: 1559.45 ms
pi calc/math: 1821.25 ms
regex: 2387.64 ms
fibonnaci/stack: 1299.8 ms
total: 7068.13 ms (lower is better)
Perl, Ruby, PHP, TCL, and Lua have definetly declined over the years. Python's biggest asset seems to be featured rich libraries, rather than the language itself.