Hacker News new | past | comments | ask | show | jobs | submit login
60x speed-up of Linux “perf” (eighty-twenty.org)
477 points by tonyg 42 days ago | hide | past | favorite | 215 comments



FINALLY AN ACTUAL ANSWER!! :D

I had done a bunch of research into this a while ago and filed an issue with the Ubuntu bug tracker, but was told it was due to ABI stability (which did not make any sense).

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894407

Understanding that there is actually an underlying license problem is like finally being told all of the reasons for everything that happened on LOST ;P.


So `perf` is super slow on Ubuntu as well?

I feel for distro maintainers, real thankless job trying to square the circle on a billion different things like this.


Perhaps you should add a link to this article as a comment in that ubuntu bug tracker. That way it's useful for others who have the same question.


> This bug affects 3 people (!)


I'm sure it affects more people than that. Not everyone who experiences this bug is going to create an account on the ubuntu launchpad platform and write a comment about it


I'm absolutely positively sure that the amount of people using perf on Debian and derivatives (Ubuntu, Mint) is >>3


Definitely, I always assumed perf was just super slow, (having only really used it in anger on debian). Glad the next time I get sucked down a performance tuning rabbit hole, I can make the process a lot less painful!


It is a tough feeling to see someone else's design for a library that would be literally perfect for my needs, but I'm unable to use it because of the license, so I have to spend weeks implementing my own inferior version while carefully avoiding making the code too similar to what I happen to remember.

I'm past believing that I'm smart enough to always be able to come up with/reimplement a competent enough solution for all my needs by myself every time. But copyright still has to be respected.

When I'm forced into that situation from licensing issues, it really makes me feel like my skillset primarily revolves around gluing together other people's well-designed code, and being left to my own devices exposes my weakness in implementing something from scratch.


This is the comment that points out the issue.

perf is licensed under GPL v2 only and libbfd is a GNU tool that is licensed GPL v3 and higher.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911815

Really wish the Linux kernel would start mandating new patches to be GPL v2 and higher and get buy in from the largest contributors. A decade later new code would like replace the smaller contributors code and we could consider the software GPL v3 compatible.


You might as well blame GNU for updating from GPLv2+ to GPLv3+, creating the problem in the first place.

Since, realistically, neither Linux nor GNU will budge, maybe the practical solution would be to look into integrating with the equivalent library in the LLVM ecosystem (assuming it exists).


It was always clear that gnu would create newer versions of the GPL and start releasing under it. The blame is pretty clearly on the people who editted the license text to be GPL2 only.


I think it's acceptable to not want an organisation to be able to relicense your software under arbitrary terms, which the GPL does allow the FSF to do.

The GPL 3 is a pretty benign improvement on the GPL 2 (though I know Linus objects to the Tivo clause). I don't think anyone who was happy with their software being used under the terms of GPL 2 would be unhappy with it being used under the terms of the GPL 3 additionally, rather than exclusively.

But let's imagine the next version of GPL 4 was "Additionally the authors may at their discretion use it under the terms of the CC-BY-SA license". Nothing wrong with it as a license, but that's a bigger leap in terms of license changes.

It's not hypothetical that they could exercise that kind of power - they _did_ with the GFDL with the "wikipedia can relicense as CC-BY-SA" clause in GFDL 1.3.

I think most people would agree that that was a responsible use of that power. but I can understand those that don't want to extend the trust that all future uses will be responsible.


Would that >= license be enforceable in court? It seems lame to create a license that forces you to agree to whatever future license changes come about.


It doesn't force anyone to agree with any changes.

It says "the recipient can use this under GPL v2 or any later version as published by the FSF", which means if they recipient is happy with the rights they got under GPL v2 they can keep using it under that version for as long as they like.

The only person agreeing to future license changes is the publisher of the code, and they're the ones that chose to publish it under "GPLv2+" or whatever.


It could go several ways but I can't see a judge wanting to entertain complaints that a new GPL version isn't to your liking if the changes are minor, especially so since it's always going to remain valid in GPLv3.

If the FSF goes rogue and changes the GPL to be incredibly restrictive (i.e., allowing proprietary redistribution, and I realise this can be considered permissive..) it might be possible to get it to be ruled invalid defaulting to the more permissive licence, especially if you have deep pockets, or if the FSF change the licence to be ridiculously permissive like 0BSD then it's not going to be legal in countries like Germany, either way any major change is likely to result in an international enforcement nightmare.


If GPLv4 were more restrictive, people would still be able to use the software under the terms of gplv2 or gplv3.

The scenario where FSF goes "rogue" would be if gplv4 became more permissive, e.g. if it stopped being a copyleft licence.


Sure, but that still means that the fault of not being in the ecosystem falls on the person who specifically edited their license to not be a part of the full ecosystem going forward regardless of whatever reasons they had for doing that.


Calling that a fault shows bias. You could just as well call it a choice.


A choice that results in consequences is normally how you assign fault for those consequences.


For those consciously omitting “and later”, the effect of not allowing that probably isn’t a fault, but a desired outcome.

“You can do A, B, and C with the software I wrote, or whatever X may at some future date decide you can do with it” isn’t something everybody is happy with. It certainly requires some long-term trust in what X will or will not do.


The assignment of fault and it being a desired outcome are rarely mutually exclusive concepts.


I get that the point here is that perf could relicence to GPLv2+ to resolve this issue (although this works both ways, libbfd could dual licence as GPLv2+/GPLv3+) and it could be left at just that, but I have to nitpick this:

>The blame is pretty clearly on the people who editted the license text to be GPL2 only.

They edited the licence, yes, but the FSF explicitly wants you to do this to make your intention clear(1). When you licence software under the GPLv2 (or 3, etc) you have a choice of 'GPLv2 only' or 'GPLv2, or any later version', however since the licence text only states 'Version 2' with the old short labels being just 'GPL-2.0' there's some ambiguity on whether you mean GPL-2.0-only or GPL-2.0-or-later.

The default assumption should always be v2-only, however as (at the time) the FSF were still recommending the short label of GPL-2.0 and the issue of using v2-only or v2-or-later wasn't really an issue you had a lot of v2 licenced software and patches using the default unedited licence with the FSF short label of GPL-2.0 and this is purely the fault of the FSF. It wasn't until the GPLv3 came around which some people didn't like (notably the Linux kernel, which is probably why perf is v2-only) that you got people editing their licences to make the intention clear, although for many projects they had no choice in the matter as changing to v2-or-later would require permission from every copyright holder that had contributed code to that project, again this is partially the fault of the FSF for not having enough foresight or making the choice of -only or -or-later more explicit and clear.

P.S. if you already know the history and context here this post probably seems a little patronising and I apologise for that.

(1) https://www.gnu.org/licenses/identify-licenses-clearly.html


While so far FSF have behaved responsibly it makes perfect sense for the Linux developers to not place their trust in an external organization.


The code was always going to be possible to use under GPL2, even if it said GPL2+. It's not really placing their trust with anyone since they wouldn't lose anything. It's more of a flag waving op to keep corporate interest -- corporate are notoriously disinterested in GPL3 projects, so Linux devs basically said "ok let's not let anyone fork as GPL3 and make significant work we can't benefit from". Then Linux stays relevant to corporate, who keep on supplying developers to the project.


This isn't true at all - Linus made the licensing decision in version 0.12 of the kernel, back in 1992! The certainly wasn't any corporate interest in Linux that needed to be protected back then.

http://lkml.iu.edu/hypermail/linux/kernel/0009.1/0096.html


TIL, thanks. Guess cui bono isn't a reliable method of deduction.


So you're telling me Chicxulub impact wasn't caused by small mammals?


What some call a feature, others will call a bug.

It doesn't make sense to assign blame on those who bug-fixed the backdoor in the license that clearly would allow a third party to change licensing terms. Even(!) if that third party was rms.


No chance of that.

GPLv3 wasn't just an update, it was a major change from GPLv2. Importantly, it limited developers own rights to use code of projects they contribute to how they want (including in devices that are locked or secured in various ways).

Plenty of people even outside of Linux are not going to be going to GPLv3.

The GPLv3 split also damaged the more copyleft side of things as I think some kernel devs predicted.

I think the momentum is currently more MIT / Apache - not sure if that could be where folks could be encouraged to release under to keep at least the open source part alive even if the copyleft part kind of goes away.

Anyone doing any stats on this? The more true open source players are going MIT / Apache style, the proprietary relicense folks are doing the (A)GPLv3 thing to drive licensing revenue given the risk aversion to GPLv3 that is out there. A lot of the GPLv3 codebases require contributor agreements so they can license outside of GPLv3 so they tend not to be true multi-contributor / multi-copyright holder codebases.


> GPLv3 wasn't just an update, it was a major change from GPLv2. Importantly, it limited developers own rights to use code of projects they contribute to how they want (including in devices that are locked or secured in various ways).

This is FUD.

- The developer's own rights to their own code are never limited. What is limited is the rights they get to other people's code.

- GPL does not limit use in any way, it limits distribution.

The limit placed by the GPL on distribution is that the recipient must also be given source code and the same legal rights to the code. The GPL-3 fixes a technical loophole where the recipient is given rights to the code but prevented by technical means of using their modifications to that code on the device it is intended for. And yes, tivoization is absolutely a loophole in the GPL2, i.e. against the spirit of the license - the FSF has always been about empowering users to modify their software.


Would you be willing to indemnify a developer who is a contributor to a GPLv3 project so they could use that project code in a locked down device if necessary without release keys etc?

If so, great, the claim is FUD.

Reality is GPLv3 folks have lied about almost everything - Ubuntu had to get special private communication from FSF to be able to ship a GPLv3 bootloader etc.

And no, the license is pretty darn clear to most of us, even if you are a major contributor to a project, you CANNOT use that project code how you would like. This is not FUD, this is part of the license design. That is a major change from GPLv2 which is what we are discussing.

A reminder that developers, not users, pick the license of code. That is also fundamental to copyright law. You can write a license that makes developers pay users $1,000. Users might like that. Developers may not choose it. That is what is happening here in many cases. Developers are choosing to avoid GPL for other options.

Again - this conversation would be helped if someone had some data. Anecdotally I'm seeing lots more MIT / Apache stuff than GPLv3 stuff these days.


> carefully avoiding making the code too similar to what I happen to remember

Is this really necessary? The GPL allows you to read and study the software. Is it really copyright infringement if you take your understanding and make your own program? Are people really forced to come up with convoluted new ways to solve the same problem just to avoid any similarity to existing work?


The issue is that the only way to guarantee no liability is a clean room implementation. It's why most open source reimplementations require no mention or examination of the source/machine code. Google got dinged over a trivial method they copied in their java implementation, even though it was extremely trivial. You can't risk any chance of copying, regardless of how trivial the code.


Patent and copyright trolls have shown that even a clean room implementation is no guarantee.


True, but at least you as the employee won't get fired since you can't be blamed.


No, what you're describing is not itself copyright infringement. There are actually two legal standards for infringement in the US:

- Striking similarity, or what most of us think of as infringement, where there's literal copies of someone else's code in your own. This is what things like content ID systems try to detect.

- Access plus substantial similarity. This is where you've looked at someone else's code (source or disassembled, doesn't really matter); and then produced something that looks a lot like that code if you squint a little.

Substantial similarity does not have a hard-and-fast rule associated with it, it's usually something that juries or judges decide. It's basically the "yeah just change it a little so the teacher doesn't notice" meme, but in legal form. If you were to read another program's code, and then make another program that looked an awful lot like the first, then you'd be infringing. But this is bounded by other exceptions to and rules of copyright - notably, functionality itself can't be copyrighted, and copyright over interfaces is largely prescribed by various fair use decisions (so emulation and re-implementation is largely still OK even under this standard).

If you read GCC, and then write LLVM, that's not substantial similarity. That's a different, legally distinct compiler design. The FSF could sue the pants off Apple if merely reading GCC source meant any compiler you wrote was infringing.

If you read GCC, and then write a compiler with the same internal representation as GCC; a parser that's structured the same way as GCC; and optimization passes that are organized the same way as GCC's; then you're closer to infringing.

Your responsibility isn't to create an entirely novel program by any means necessary. It's just to avoid doing the software equivalent of tracing over someone else's drawing.


It depends on what derivative work means, and finding out the specifics may mean spending time in court.

GPLv3 says 'To “modify” a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a “modified version” of the earlier work or a work “based on” the earlier work.'

If you read and study a work, and then write something new that does the same thing, that may be considered adapting or copying from (parts of) the original work.


"in a fashion requiring copyright permission" is basically the definition of "derivative" here, so the GPL is rightfully farming that question out to external law and not even trying to tell you the answer.


What I wasn't sure of is, as long as I never actually incorporate the original code, if basing your fundamental design on something counts as infringement. I find it hard to see the difference between taking some class definitions and changing the variable names and going off to use what you learned from other people only to realize you arrived at the same design as they did because doing anything else would mean the implementation would be worse than what you now know it could be.

I don't have much of an understanding of copyright when it comes to cases like those, but my worry is that declaring that your open-source codebase is merely inspired by codebase X with this specific code in the case of these specific data structures would still count as infringement in a court of law. However, I don't intend to infringe on the licenses of the original, and do not intend to directly copy any code.


Dunno! For what it's worth I build a lot of MIT and MPL software professionally and the lawyers at several employers have suggested doing exactly what the GP does. My assumption has been that proving derivation in a court is an expensive process however it turns out, so why risk it.


Think about it this way. If a panel of jurists who only half understood the expert testimony saw your code side by side with other code would it look like blatant copying or not?


Don't feel down about that. If someone's job is writing database engines or JSON parsers, you can't do a better job than them while also writing business logic. We should be glad some people chose to provide such code in a matter we can use it, and when we can't, a patchjob that does the trick is just as good.


For my personal work, I treat GPLv3 (and similarly licensed) projects as essentially the same as anything behind a commercial license -- they don't exist.

Rather than thinking about how to re-implement a thing that doesn't exist, I think about how to implement the tool I actually need. What I end up with may be less general, but it's actual functionality is often simpler and easier for me to grok/remember.


Curious, what license(s) prevented you from using a library?

Or maybe rather, what does "use" mean here?


GPL, and in some cases source-available codebases produced by decompilation.

"Use" in this case means to have some feature that's implemented by those libraries reimplemented in an open-source project (MIT-licensed).


GPLv2, which is viral.


GPL by some quite popular interpretations of what derivate work means. I do not personally agree but those interpretations are common enough to be a real concern and I do not think it has been tested in court yet (GPL has been tested in court but I do not think this particular aspect has).


I haven't blogged about this yet, but we saw a 1000x fold speed-up doing several things around symbolication. The more optimal approach we found was to use the gimli crate[1] directly & carefully optimize it to read in the data structures for the executable(s) you are symbolicating upfront & then issuing in-process queries. They also have a drop-in replacement of addr2line that outperforms it (both in symbolication speed & memory usage).

I am curious about how that compares with libbfd since that wasn't under consideration for us as it uses GPLv3.

[1] https://github.com/gimli-rs/gimli

[2] https://github.com/gimli-rs/addr2line


Steinar Gunderson (who suggested [1] the talk-through-a-pipe approach) just now compared [2] the `libbfd` variant to the pipe-to-`addr2line` version, and found them to take similar amounts of time.

This agrees with what I saw in `top` while testing: with the patch, I see `perf` using ~95% CPU, with `addr2line` using the remaining ~5%.

So speeding up `addr2line` probably wouldn't result in very much of an overall improvement for this workload.

[1]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911815#28 [2]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911815#38


The piping is a good chunk of the gain but the gimli library is still much faster than addr2line & uses less memory if used correctly. Since libbfd shows similar results, I'm thinking there might be a speed bump from switching to use it (some care has to be taken to preserve 0-copy semantics across FFI).


It'd definitely be interesting to find out!


Looks interesting. I've encountered some annoying bugs with latest addr2line when trying to get symbols for addresses, wonder how gimli performs in that sense. Also, it seems that addr2line 2.37 has some performance regressions compared to 2.36.1 which I didn't look at yet.


In my experience gimli also symbolicated things addr2line couldn't.


> Michał Sidor suggests building against libbfd, something that the Debian maintainers don’t want to do.

"don't want to" isn't quite correct - they can't.

Perf is licensed under the GPLv2, libbfd under the GPLv3. The licenses are incompatible, which makes the combination unredistributable - which is what Debian would be doing with it.

Debian is legally not allowed to do this.


Is it just me or is the incompatibility especially absurd given that perf is currently using libbfd, just 1 step removed via addr2line?

I wonder if a patch to addr2line would be accepted which allows the executable to stay running to accept multiple requests through stdin.


This is kind of the crux of a long history of these licensing arguments. That dynamic linking is problematic because it potentially creates a derived work. And then you might distribute that derived work.

But shelling out to something or in some cases having an automated script create or download the derived work automatically is sometimes acceptable.

It’s a bit of a silly line in some ways. But it’s a long story.


I'm skeptical that there's a coherent legal distinction to be drawn between using a dynamic library with dlopen etc and running a similar executable with arguments. It's functionally identical usage of some public interface.


Agreed, I am very skeptical too. I have a long time believed that dynamic linking vs running an executable with arguments will be regarded the same in a court. So either both are derivative works or neither.

My personal belief is that a court will only care about if your code is a wrapper (e.g, libgpg) or if it is its independently useful and just uses the library (e.g. PostgreSQL using libreadline) for a small part of its operation. But I am not a lawyer and this has not been tested in court as far as I know.


They don't seem to claim there is a distinction on the mechanics there? Rather their distinction seems to be "are these part of the same overall program, or two separate ones", however you measure that: https://www.gnu.org/licenses/gpl-faq.html#GPLPlugins


By claiming that using dlopen to load some code makes your program a derivative work of the library it's using, but shelling out to it doesn't, they're implicitly making a claim as to the distinction of the mechanics.


This has always been my perspective as well, but it goes against long-established norms—for example, it's for exactly this reason that the LGPL was created. I don't think either theory has been thoroughly tested in court, though. See https://sfconservancy.org/blog/2016/feb/25/zfs-and-linux/ for a countervailing position from the SFC and FSF.


License holder grants you permission to do Foo, but not Bar. For example, it may be legal to show a film at site A (e.g. Netflix), but not at site B (e.g. YouTube), because the license holder said so. Functionality of both sites are almost identical.


There's also interpretation that merely linking with compatible interface would be infringing even without distributing two parts together (curiously, used by Stallman): https://news.ycombinator.com/item?id=26606328


addr2line already does this, and it's exactly what TFA is making use of to get the speedup.


Oops. Literally the only line that I didn't read. I thought the author was providing a patch to link against libbfd for users to manually apply (which afaik wouldn't violate the licenses as long as no one redistributes the patched binary).


Kind of ironic that RMS campained against proprietary software to allow reuse of code. Now we have two islands of open source code ... .


RMS advises to use "GPLv2 or any later version", which is compatible with GPLv3, instead of "GPLv2 only" license.


One could also suggest that the BFD project adopt "GPLv2 or any later version" (like it had prior to 2008), which is compatible with the GPLv2, instead of the "GPLv3 or any later version" license.


"or any later" is controversial. It essentially puts your project under the agenda of the FSF, which not everyone wants.


Doesn't "GPLv2 or any later version" mean that you can use the software under the terms of GPLv2 or GPLv3 (or some hypothetical GPLv4)?

It can't possibly add any restrictions to the use of your project, because people can always just use it under the terms of GPLv2 if they want.

I guess it could remove restrictions, if the hypothetical GPLv4 was a total 180 and looked more like the MIT or Apache licenses, so I guess if that's a concern then you have a point.


If a new version of the GPL adds restrictions, that still can be undesirable for the writer if the original software.

If a third party forks a GPLv2+ project, adding features that are GPLv3+ licensed, the result is GPLv3+ licensed.

That means those wanting the GPLv2 licensed project won’t be able to use those new features.


At least you can legally build it locally, just not distribute the result. That's still better than what's possible with proprietary software.

As drran mentioned, I'm sure RMS would say the problem here is that Linux is licensed under v2 only. Linus of course would disagree.


There is proprietary "source available" software that I can compile locally but not distribute the result either. The GPLv2/3 disaster is not an improvement.


Two? Are the AGPL licenses compatible? How about GPL licenses with exceptions for some APIs as used by OpenJDK?


The AGPL licenses are explicitly compatible with the GPL ones. FWIW, this issue is essentially entirely caused by Linus, who refuses to use or accept GPL3.


> FWIW, this issue is essentially entirely caused by Linus, who refuses to use or accept GPL3.

That's a bit disingenuous. Linus has said no to the GPLv3, yes, but his reasons for doing so have merit.


The very fact that the GPL v3 is incompatible with v2 indicates that v2 has something the v3 takes away, no matter how well-meaning that was.


What the GPL3 takes a way is a loophole to violate the spirit of the GPL without violating the letter of the GPL2.


Linus's reasons for causing this issue having merit does not mean he wasn't entirely the cause of the issue: those are unrelated axes.


Unless perf has the standard clause allowing use of later versions of GPL. That would make it GPLv3 if linked against libbfd.

Unfortunately it does not.


That's pretty typical for things in the kernel ecosystem.


Debian could provide a tool which builds it for you, on your system, right? The resulting binary would not be redistributable but you could use it.

I've often wondered why more linux distros don't provide a similar tool to build ZFS into the kernel for you.


Debian nicely offers this with the zfs-dkms package.

Ubuntu just bundles ZFS into the kernel statically - not even as a loadable module - and damn the licensing implications(!).


So this means the distributor can fix this by building this package locally and use the result to link other binaries to? This isn’t uncommon right?


You're right. I should clarify. Thank you!


I’ve wondered about similar situations.

Take Yosys, an open source synthesis platform that allows external commands by adding shred libraries. It’s MIT licensed.

There is the Yosys-GHDL plug-in that allows using VHDL instead of Verilog compilation. It’s an independent open source project that can’t be merged into the main Yosys tree because GHDL itself is GPL 2.0.

Is the author of that plug-in violating the GHDL license? The plug-in glue code is GPL2 as well.

Can the GHDL authors ask the plug-in authors to take down their code (which has been forked many times on GitHub, of course.)


The plugin is potentially a derivative work of both Yosys and GHDL, but that's fine - it's possible to comply with the requirements of both the MIT license and the GPLv2 at the same time. The resulting work is GPLv2-licensed. (To be precise, you also need to preserve any copyright notices from the MIT-licensed project and the text of the MIT license. The MIT license has only one condition, but you do need to follow it, same as if you were building proprietary software with an MIT-licensed component.) If anyone distributes Yosys with the plugin, the combined work must also be distributed under GPLv2, which is also fine.

In this case, perf is (like the Linux kernel, in whose git tree it lives) GPLv2-only, and modern versions of libbfd are GPLv3-or-later, and it's not possible to comply with both at the same time. The GPLv2 has a "no additional restrictions" clause, and the GPLv3 asks for things the GPLv2 does not. So a distributor of a combined system including perf built against libbfd (like a Linux distro) cannot comply with the licenses.

(I suppose perf was written against the libbfd API back when libbfd was GPLv2, avoiding the question of whether perf is an illegal derivative work of libbfd. Or the backtracing API is de minimis, which means the source is fine but a compiled binary as par of a distro that also includes a modern libbfd is not.)

One possible answer here is to ask the perf authors to relicense under GPLv2 or later. Many years ago, at a startup that no longer exists, I wanted to reuse code from the Linux kernel "dm-verity" module in GRUB, which had moved to GPLv3, and I got a pretty quick answer from Red Hat (who was the only copyright holder for those files) saying that would be fine.


Your answer and the one from another commenter mention distribution.

In this spirit, is there any legal issue with downloading perf and libbfd separately, compiling it, and running using it for yourself only? I assume not?

What about using it then as a tool within your company? That might be seen as distribution, and thus not allowed?


There's no legal issue with using it yourself. Debian's problem is that, as a distro, they are creating a single product - the Debian operating system.

Debian could even ship you something that compiles perf against libbfd on your own system. This is the approach they take with ZFS and Linux, which are under incompatible licenses (CDDL and GPLv2): https://bits.debian.org/2016/05/what-does-it-mean-that-zfs-i...

A company is generally a single legal person and so there's no "distribution", but https://www.jolts.world/index.php/jolts/article/view/66/125 is a law review article that discusses the complexities here. If you employ contractors, or if the company as a whole is purchased, the answer is apparently blurrier.


Thanks!


My understanding is that there’s only a problem if the distributor combines the two things: if the main application is one license and the plug-in a different one, it’s not a license violation for the end-user to combine the two (unless they turn around and become a distributor in some way).


I am pretty sure that depends on the intent of your distribution. If your software is useless with the plugin and you go "wink wink nudge nudge use this plugin to make this work" this would probably be infringement. If your software is functional without this, and you just happen to have the ability to load plugins of which this is one of them, then it's much better.


its arguable when it comes to dynamic shared libraries (i.e. the ZFS argument).


How do other distributions handle this? Do they also have a slow perf? Or do they just link perf with libbfd regardless of the incompatible licenses?


DKMS style linking at install time might be an option, but I've never seen it used in practice


Is there not an exception in the license for "system libraries" which libbfd kind of is?


Is it? On my system the only things depending on binutils (which libbfd is part of) are gcc, clang and perf (seems archlinux doesn't care).

That makes it arguably not a system library, so it's fair that Debian would want to steer clear of that interpretation.


What is Clang using it for? I thought it had its own suite of that.


This... is some weirdness in Arch's clang package.

It depends on gcc, for some reason, and that depends on binutils?

(and I have no idea if anything here uses libbfd, since it's just a part of the binutils package and not split out)


Clang uses part of the host compiler toolchain; gcc is the default compiler on Arch included in the base package.

https://stackoverflow.com/a/38291698/4179075


Maybe it depends on gcc so it can use gcc's assebler for inline assembly?


no its for linking. basically clang unless configured in the source it does not know what link options to give. Actually even rust uses gcc or clang to do same thing.


Clang will also use libgcc by default under Linux to allow e.g. stack unwinding to work in mixed Clang + GCC applications.


If we are nitpicking, they can, there is nothing material stopping them. It is "just" at odds with copyright law.

In a society with different laws, there would be no problem. I wonder what would be greater - the productivity gain when we would not have to make workarounds due to licensing, and when we could just legally access the source code of everything. Or the loss, because people might have less incentive to innovate, or they would just keep the source hidden.


If only we were permitted to view the digital IP rights tyranny as another disruptable industry with latent energy that could be unlocked by a savvy founder...

The economic activity & value directly suppressed by IPR is unbelievable. Hard to even mentally scratch the surface. It's extremely economical (for society) to digitally copy and distribute useful information.

Most Western IP laws are draconic in the purest sense of the word.

Unfortunately their is a very influential sect of "corporate-law fearing" IP fanatics at the heart of many FOSS tech traditions (RMS, et al). Not all of it is pragmatic.


> Most Western IP laws are draconic in the purest sense of the word.

IP laws in most countries are also not decided by a democratic process but rather pushed in by trade deals, which also make it almost impossible to change those laws.


Yet another case of GPL virality causing more issues than it solves.

E: Downvoters, have you read the article? The program started a new process for each address lookup instead of using a library because that library is GPL.

Shall we have a discussion?


IMO it's not "GPL virality" that is at the root of the issue here. The authors' decision to license theis works this way causes the problem. You can still ask both libraries' authors for relicensing their work in a less restricted / more compatible way.

If they don't want to do that, then being authors and owning their copyrights, that's their right to do. That's nothing I would blame on the GPL.

I mean we should be grateful that these authors made their works available free of cost in the first place.


As I understand it relicensing in many open source projects can be a very complicated matter, since in some cases it requires getting permission from all past contributors (Which could number in the hundreds)


Can someone ELI5 to me where the problem is because both perf and bfd seem to be GPL. Why would the former not be allowed to link to the latter?


GPLv2 and GPLv3 are difference licences, despite sharing a name. The GNU people (FSF) intentionally made the GPL3 incompatible with GPL2, to enforce the virality of the GPL3 license (which they perceive to be superior to the older GPL2).


The purpose of the GPLv3 was to address the "tivo" clause, where a vendor sells you a device with some modified GPL preloaded software included. You can request the software under GPL, but if you can't load it on the device then the FSF feels that's not very good.

So GPLv3 bans that. That, by itself, is inherently an additional restriction not allowed by GPLv2. They don't need to go out of their way or have a conspiracy to deliberately make it incompatible. Especially since the GPL2, as distributed and suggested, includes the "or later" text which resolves this incompatibility to allow GPL3 software to use GPL2 code. If anything, this encourages GPL2+ as the default GPL license if maximum compatibility is your goal.

Others (e.g. Linus), are more focused on getting the code changes so if they want to use them in the original project they can, and don't feel being able to install it on the device it was built for is as high up on the priority list.


Its also interesting to note that none of the GPL licenses require giving code back, so what Linus wants from the license doesn't happen via the license, only via cultural standards. Indeed, there are many companies and persons violating those cultural standards by not sending patches back to Linus, as well as many companies violating the GPL licenses by not distributing source to their customers.


A couple of interesting posts from Software Freedom Conservancy make it clear that what Tivo did (breaking proprietary software when reinstalling Linux) is allowed, even by the GPLv3 and that allowing reinstall of libre software is a requirement of the GPLv2 license.

https://sfconservancy.org/blog/2021/jul/23/tivoization-and-t... https://sfconservancy.org/blog/2021/mar/25/install-gplv2/ https://events19.linuxfoundation.org/wp-content/uploads/2017...


Most GPL licenses permits their code to be redistributed as the same GPL version or any later version of the GPL, Linux releases as GPLv2 or later for example.

Perf does not permit the license to be redistributed as anything except GPLv2, so it conflicts with GPLv3 code.


“Linux releases as GPLv2 or later for example”

Linux doesn’t use “or later”. https://github.com/torvalds/linux/blob/master/COPYING:

The Linux Kernel is provided under:

  SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
Being under the terms of the GNU General Public License version 2 only, according with:

  LICENSES/preferred/GPL-2.0
With an explicit syscall exception, as stated at:

  LICENSES/exceptions/Linux-syscall-note
In addition, other licenses may also apply. Please see:

  Documentation/process/license-rules.rst
for more details.

All contributions to the Linux Kernel are subject to this COPYING file.


Sure, it's not the license itself. But this does read like a problem. It's difficult to imagine that the authors of libbfd licensed it under GPLv3 because they wanted to ensure that no one would link it against perf. Maybe they do have strong feelings about supporting that license.

But this is an example of the ugly side of open source licensing. A lot of people don't have strong feelings about ensuring the distribution terms of their code, and just don't care how it's used. And in those cases, it can be annoying to have e.g. opensource.org insist that GPLv3 is the best option.

Edit: To be more specific, I mean that this problem could be avoided if both softwares were released under a public-domain-equivelent license. But of course, that will never happen.


The difference in license only matters because of a broad interpretation of virality. The easy way to show this is to mentally replace GPL with LGPL and notice that the problem disappears.


> The authors' decision to license theis works this way causes the problem.

I'm not sure how you can rationally make an argument for a framework that literally limits the use of improved or more performant code/technology/understanding simply because it does not share a philosophical view of IP/ownership/sharing.

This is the core of the problem with IP in the real world as well, and I find it mind-boggling that we don't look at the SYSTEM and see it as problematic.


Actual IP law and how it conducts itself in the real world has plenty of issues to be sorted out, but there are tons of potential "inefficiencies" in the real world that could be exploited and rectified across a vast array of things, but aren't, due to combinations of social politics, law, philosophy, etc preventing it. That some potential efficiency is on the table in theory, but can't be reaped due to those things, is nowhere remotely close to "mind boggling"


It doesn't limit it any more than if it were proprietary and you couldn't afford it.


It seems odd that the takeaway is "GPL sucks because it got in my way" and not "both authors' explicit wishes were respected." Assuming that this is in error and this isn't what the authors wanted to happen then it's fixable by someone changing their license.


It's fixable with a license change, but in practice that's impossible for large projects. Without a CLA, every individual who's contributed is a copyright owner. If you can't get every single one of them to agree to change it, then the license is stuck. And it doesn't matter if the reason you can't get them to agree is because they've become uncontactable or died.


If they're dead then they're not going to stop you from changing the license.

What are they gonna do? Sue you from beyond the grave?


I think copyright is inherited by the heirs (in most cases relatives of the deceased) and carries on quite a long time after death. So the relatives may actually sue you, and any discussion WRT relicensing would have to be addressed to them.

I actually consider putting something into my will that states that all my open-source contributions are relicensed to "public domain" (CC0 [1]) once I die.

[1] https://creativecommons.org/share-your-work/public-domain/cc...


Its also possible to transfer your copyrights before your death. For example to the FSF or to Software Freedom Conservancy. I hear there are tax implications for copyright bequeathment, which is why transferring copyrights is preferred.

https://sfconservancy.org/copyleft-compliance/


Their heirs could do precisely that.


Yep. It’s how the Marvin Gaye lawsuits these past few years have been able to happen (despite him being dead). His estate is suing.


IANAL but I believe a Qui Tam could take place by surviving family. Or inherited copyrights, etc.


Inheritance exists.


> more issues than it solves.

Can you explain how you reach this conclusion? Assuming the GPL did not exist and the author would have made stuff proprietary, would that be better ?

Some people like me would share code with GPL3 or later, why the f* should we give it to you as BSD? are you running on your devices only BSD code or MIT code ? or are you running proprietary software but for some reason making all stuff BSD will make your job easier since you could mindlesly copy paste shit in your proprietary stuff?


> Assuming the GPL did not exist and the author would have made stuff proprietary, would that be better ?

That's a false dichotomy – there are many more open source licenses which wouldn't have led to the problem described in the article.


But respect the author decision, maybe the "problem" is what the author desired to happen.


  > E: Downvoters, have you read the article? The program started a new process for each address lookup instead
  > of using a library because that library is GPL.
  >
  > Shall we have a discussion? 
Yeah no, they simply had a bad alternative implementation which could have been just as fast as the licence incompatible library call:

  >  non-bfd, without patch:  7m59s
  >  non-bfd, with patch:       15s
  >  bfd:                       15s 
-- https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911815#38

The same could have happened due to other reasons for alternative implementation, that range from availability of more than one implementation with different advantages and disadvantages, different OS the code needs to run on, or simply preference.

Blaming the license in this case is just short-sighted and with the wording used it just shows a bias of yours against the GPL, but not actual will to participate in a meaningful discussion on the linked thematic.


I guess MIT and Apache would be better, so that company can make billions on the code and contributors would still be poor (see AAPL -> FreeBSD)


You understand there's nothing in the GPL that prevents companies from making billions on the code without compensating the contributors?


The thing is with GPL you have to contribute back, see how much Linux has grown in functionalities and how backwards are other kernels with MIT

--- You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following:

a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,

b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange;


The GPL doesn't require “contributing back”.

You can't just give out binaries without source, but you can choose not to release anything. The thing is, as usage models shift to include a lot of what the FSF very reasonably relabels SaaSS (Service as a Software Substitute), you can release nothing but provide the functionality over the network and reasonably expect people to use it that way. Since you haven't distributed the software itself to anyone, the GPL doesn't put you out of compliance for not showing them your code or letting them distribute it further.

This is the situation the Affero GPL (AGPL) was intended to address—it requires (or tries to require) that source code also be accessible to anyone who is given access to the software over a network. (Imprecisely speaking, that is; do check the license text if you want the specifics.)


> very reasonably relabels SaaSS (Service as a Software Substitute)

Heck no. "as a" already means that. Rewriting the acronym is far more petty than reasonable.


They both contain both elements, but the point is to sharply alter the emphasis. “Software as a Service” is a reasonable description in the salience model of the mainstream software industry, because it is something that you choose and integrate and use like software, except it just so happens to be a service, coming with various costs like subscription fees and very direct vulnerability to upstream changes, and various benefits like installation and maintenance functions being centralized far away where you don't have to directly manage them.

“Service as a Software Substitute” pushes the ‘service’ part to be the most salient. It depicts something which is fundamentally a service, where ‘substitute’ once more emphasizes that you do not, in fact, have the software itself, even though it is taking the role of software. The FSF considers this very important, because they wish users to be able to copy and modify the software they use, and pseudo-distribution purely as a service does not naturally allow for this. If that is not something you care about, then the emphasis will seem strange, yes.


I do care about it. That doesn't make it less petty, the same way M$ is petty.


The most recent exploitation of opensource code comes from Amazon & friends making their own paid, hosted versions of redis, elastic search, mongodb, and so on. And not making any sort of proportionate contribution to the developers - whose free work their profits entirely rely on.

And in this case I’m not sure how gpl helps. With gpl2 you only need to distribute source code if you distribute binaries - so they have no legal obligations there. And Amazon isn’t really making meaningful changes to elasticsearch and friends anyway, so having the license require them to opensource their changes is a bit moot.


That’s a shortcoming of the GPL: it doesn’t consider interactions over the network “distribution”. It’s the reason the AGPL exists. IMHO, GPL software that could be expected to run in SaaS form should be AGPL.


It's also worth noting that GPL become more accepted in the corporate world because SaaSS defanged it. AGPL is toxic to companies now, but when everyone was releasing desktop software, GPL was treated the same way.


AGPL-like clauses classifying usage through network as distribution would help there and many platform capitalism[1] companies are against AGPL[2] for that reason.

[1]: https://theceme.org/richard-godden-platform-capitalism-nick-...

[2]: https://opensource.google/docs/using/agpl-policy/


Can someone with legalese knowledge make a license that states that you have to donate back 5% of the revenue you get from using a software?


Not really, or at least the license would be very unappealing.

You'd need a method of determining revenue and then apportioning revenue amongst the many parts of a system. Then there's transfer pricing issues. Not to mention audit requirements. See also Hollywood accounting.

One of the ways open source gets adoption is because using open source with an acceptable license is often much less hassle than paying for commercial software and complying with commercial software license requirements.


Yeah it's frustrating, i would never work on a open source project as i don't really get why companies can make money using something from the unpaid labour of someone else, and for our society that's acceptable


I've done a few open source things, mostly bugfixes or minor enhancements. Most of those I was being paid by my employer for, and it was either something I wanted available more widely and it was worth going through the process or it was a pain to manage patches so it saved me time to get it accepted upstream.

Either way, I don't get paid a royalty for work for hire from my past employers, so I don't expect a royalty from anyone else. And I've not worked on a project basis either; so I'm getting paid for having my butt in the seat and anything that happens afterwards is a happy accident.

I've open sourced some personal stuff too, although I don't know that anyone has looked at it. That stuff is usually more like nobody should need to write this again. Not much commercial market for a PPPoE client that can handoff to a standby machine anyway, but maybe it will be useful for someone, some day.

I've got another project in the works, but it's mostly a bit of glue around other people's open source code. If I wasn't retired, I'd try to get an employer to pay me to write it (and it would get done faster!), but I can't see why anyone would pay for just the software. Consulting on the software, sure; but then again, if you were to rely on it, you'd probably want to cultivate in-house expertise to reduce dependency on outside help.


You would probably be more fond of Anti-Capitalist Software License, CoopyLeft Software License, Proseperity Software License or other Copyfair or Copyfarleft licenses.

List and more references: https://github.com/LibreCybernetics/awesome-copyfarleft


> The thing is with GPL you have to contribute back

Not contribute, but share the sources if they distribute a binary which uses code derived from GPL-ed one. Wireless router vendors used to share modified sources as an archive on some obscure ftp without comments and documentation (so you'll have a hard time building a binary from these sources). It's better than nothing, but this is not a contribution.


But companies wouldn't use it anyway since it's GPL.

Enter MPLv2, with the best of both worlds:

- not viral copyleft

- but users DO have to contribute back


MPLv2 is a fine weak copyleft license, but I'm not really sure it prevents people from not contributing back if that's what they want: just put your proprietary code in a different file, add minimal API's to the MPL code so you can use it.

So in that sense it's only very marginally 'better' than a fully permissive license.


Right, and GPL is making all of us open source devs rich. I can't fathom getting into open source and then getting mad when people use your tools to great success.


GPL didn't made me rich but it allows me to use high quality open source free software.


There's nothing that prevents me from using GPL code commercially...


With the "viral" name-calling you have already demonstrated that you don't want a rational discussion.


No, using the normal term does not demonstrate that.


No. That is the case of Linus Torvalds not using the standard "or later" clause. He has his (bad) reasons.

The incompatibility is caused by "lack of restrictions" clause. Without that, GPL becomes essentially BSD with all the corporate thievery that entails.


The "or later" clause is a backdoor. Not wanting backdoors in your license doesn't seem like a bad reason to me.


So what's the solution? You can use a permissive licence which has a huge "backdoor" allowing derivatives to turn up under any licence, including GPLv3 (or v4). Or you can use GPLv2 with no "or later" clause which means you get situations like this.


This incompatibility is a unique feature of GPL. Wherever it goes, it's incompatible with something (in this case, with itself). You could use MPL or CDDL and, as long as GPL isn't involved, you won't have problems with license compatibility.

However, in the case of the Linux kernel (which "perf" is distributed with) changing the license is not an option --- no CLA and, even if it was practical to ask so many people for permission, many contributors are dead now. So here the only option is using dependencies which are compatible with GPLv2. So maybe we need a new library to replace libbfd, which would be more permissively licensed.


It's not a feature of GPL, it's an unfortunate side-effect of copyleft. If you have a better way to "disable" copyright than copyleft then I'm sure it would supersede the GPL.


> If you have a better way to "disable" copyright than copyleft then I'm sure it would supersede the GPL.

In absence of copyright anyone could publish binaries build on GPL code without sharing the sources. So a central feature of the GPL would cease to be if copyright was in any way disabled. Saying the GPL is about disabling copyright is about as true as McDonalds being about healthy diets.


MPL and CDDL are copyleft licenses as well. It is a feature of GPL.


Well, I don't think I agree with him and would licence GPL code with the "or later" clause, but he has a very reasonable and convincing argument against it.

See this video of him explaining his take on GPLv3 and the "or later" clause at Deb Conf if you are interested: https://www.youtube.com/watch?v=PaKIZ7gJlRU


"or later" is a hack to get around the fact that the GPL is basically incompatible (I need a better word because this is overloaded) with itself if you change the name. If you want to use GPL and GPLRenamed libraries in the same project the only option is to relicence the project under both licences. This may be possible if you have few dependencies and few authors. But if you want to depend on a new library after some authors have become uncontactable you basically can't.


> That is the case of Linus Torvalds not using the standard "or later" clause. He has his (bad) reasons.

Linus does not think GPLv3 is a good or fitting license for the Linux-kernel and that it changes too much things to be considered a new version of the same license as GPLv2.

So he refuses to add the “or later” backdoor which would effectively relicense “his” kernel with a license he does not approve of.

Is that not his right?


It is, but we can still be petty about it.


What is a good source to get an understanding of all the Licenses and their relation?


Here's a brief summary of major licenses:

MIT, X11, BSD (2- or 3-clause), and more similar ones I can't name off the top of my head: these are the basic do-what-you-want license, the only requirements are things that every (good) license already has, such as standard limitation-of-warranty clauses and retention of copyright notice requirements.

Apache (v2): This is the next stage up, which means that the text is lengthier and somewhat denser legalese, but also covers more topics such as trademarks and especially patents. The patent clause here includes a provision that any patent licenses are revoked if you sue the authors for patent infringement.

MPL (v2): This is a weak copyleft requirement, which means that you must provide any changes to the source code when you distribute the binary and additionally the resultant code must be licensed under the same terms, although it only applies on a per-file basis. EPL, CDDL are broadly similar to the MPL, with a few differences in the legal minutiae.

LGPL: Weak copyleft again, except now it's on a per-library basis. [Although, to be honest, the definition of per-library basis isn't entirely clear for non-C/C++ code.]

GPL: Strong copyleft, which means that you pretty much have to use GPL if you reuse the code.

AGPL: Even stronger than GPL, you have to distribute sources to anyone who uses your code over the network.

With the GPL family, there's a distinction between version 2.1 and version 3 that retains relevance, because some people objected to the changes in GPLv3 (notably the anti-Tivoization clause and patent clause changes) and refused to move to GPLv3, with the Linux kernel being the most notable project to refuse to do so.

As for relation, well, any license more complicated than Apache includes lots of legal minutiae that makes it somewhat hard to render judgement if two licenses are compatible or not. In general, though, you can usually use an earlier license in this list in a project that uses a later license, but usually only if both licenses are the latest version (as the most recent updates added some compatibility escape hatches).


One thing worth noting about AGPL, is that it also changes the definition of "derived work" to also include "accessing code over the network", so that if you operate a SaaS that includes an AGPL service in your network diagram, your entire SaaS has to be opened.

IMO, the gist of the GPL boils down to:

> If you "distribute" a "derived work" of this code, you have to release your code as well (under the same terms.)

AGPL not only interprets "distribute" to mean "offer up as a service over the network", but it also interprets "derived work" the same way. This is why anyone who uses (say) MongoDB in a company basically has to pony up for the commercial license, lest they be required to open source huge parts of their company.


> The patent clause here includes a provision that any patent licenses are revoked if you sue the authors for patent infringement.

I don't understand this part. Why is this useful?


Facebook uses your code. Facebook wants to sue you for patent infringement. Doing that makes them no longer have the right to use your code, so they're deterred from suing you.


Nice writeup, and for all intents and purposes seems accurate for anyone wondering.


It isn’t easy, but these can help:

- https://en.wikipedia.org/wiki/Comparison_of_free_and_open-so... has a feature matrix for about 40 licenses

- https://joinup.ec.europa.eu/collection/eupl/solution/joinup-... has a comprehensive set of features you might want a license to have, and shows matching licenses.

You still would have to read up on what terms such as “trademark”, “copyright” or “copyleft” mean.


I've used this in the past. https://choosealicense.com/



The root cause is copyright. If it were to be abolished, no one would ever have to think about licensing ever again. None of this pain would exist.

Think about all the time that would be saved when people no longer need to think about all this lawyer bullshit.


Not sure if this is a pisstake on my argument, but a world without copyright (or something similar) would be a world of endless exploitation and stifled innovation.


Copyright as it exists today does an awful lot of exploitation-enabling and innovation-stifling in itself. (And that's ignoring the rest of the IP suite.)


This is like saying "cops today do a lot of nasty stuff, let's get rid of all cops". The result is pure, unmitigated anarchy, looting, murder, and lawlessness.

I don't find either to be useful lines of thinking.


Not really. Things would be pretty simple. You see something? You can copy it. Not a single legalese document in sight.

Don't compare this stuff to police brutality, looting, murder. They're not even in the same realm. Intellectual property is just ideas and absurd notions of ownership. It's like trying to own numbers. The real brutality is sending people to jail over this.


> a world without copyright (or something similar) would be a world of endless exploitation and stifled innovation

Says who? The copyright monopolists?

There's plenty of historical evidence to the contrary:

https://news.ycombinator.com/item?id=28330810

It seems obvious in hindsight. Less monopolies lead to more competition and better products. Even the US engaged in such infringement.


In the absence of copyright things like the GPL would have no power - a company could take your work and bundle it into their product and never contribute anything back ever.


And then you could copy what they produced the result without paying them, I expect the pirate sites would still exist without copyright.


In the abscence of copyright, copying some company's work would not be a crime either.


Oh no I hoped it was about speeding up perf record that is actually a big thorn on my side, one I wrote a specific tool for... Depending on the number of probes you use, perf record can induce large latency hits or reduced throughput. Batching/buffer disk writes solves the problem for me. But I had to redevelop a perf parser to record/compress smarter. And for network streaming (no touching the disk is even better...).


Neat! Is your tool open source?

It'd be nice if perf record had a fundamentally faster way of working. I found a nice description of how it works in the README.md for cargo-trace: "perf relies on perf_event_open_sys to sample the stack. Every time a sample is taken, the entire stack is copied into user space. Stack unwinding is performed in user space as a post processing step. This wastes bandwidth and is a security concern as it may dump secrets like private keys."

cargo-trace is apparently dormant now, but I found it really interesting. It does the unwinding via eBPF instead, which should be quicker while recording, not generate as much (sensitive) data, and not require as much post-processing. (Symbolization would still happen in post-processing.)


I'll ask about opensourcing the tool. But just in case, the recipe is to use pipe mode and pre-parse all frames, stream them as messages, sometimes to several targets (pub/sub) with some streaming-zstd, and also splitting the pmu/probes/Intel-PT streams and treating them separately. Stack-traces are analysed (precomputed cfg optimised structure so unwinding is faster) before storing in adhoc in-house format with all other system traces. Only annoying thing is changing perf-record settings (pid changes, need event X, new probe) means restart and I ran out of interns before we had no-loss switchover...


Sounds more specialized than I was imagining but a cool system.

The idea of a more efficient compressed encoding seems generally applicable. I imagine just piping through zstd would be an improvement over plain perf record directly to a file, but it sounds like your tool's splitting makes zstd more effective. It'd be handy to be able to just do perf record ... | fancy-recompress > out, and even better to upstream the format improvement into perf itself. I feel you on "ran out of interns"; there's always more to do...


Well it started working even better when i separated streams and compressed separately pmc and Intel-PT, and syscalls/dynamic probes.

But yes, in a pinch piping to zstd has far less overhead than writing directly to disk.


Sorry! :-) It was very much just scratching a particular itch I had related to cargo-flamegraph...


Perf report is indeed slow AF, especially on large files, you're right in wanting to speed it up! Thanks for sharing! This is an interesting tidbit that has thrown me down a rabbit hole of 'profiling the profiler'...


Apparently llvm has a drop-in replacement for addr2line: https://llvm.org/docs/CommandGuide/llvm-addr2line.html

I suppose you could make a library out of that.


For those who're wondering what Linux perf can do, please check this excellent examples and descriptions:

https://www.brendangregg.com/perf.html


is it the fork/exec overhead or the overhead of re-parsing the dwarf data that is responsible for the slowdown? process spawn thrash is obviously bad, but i'm curious how much it contributes to the issue here? forks are pretty cheap these days as i understand, and i think an exec may also be pretty cheap since the program image is already going to be sitting in the buffer cache.

reading/parsing dwarf data, on the other hand, is likely to be slow. not totally sure, but maybe i/o could be sped up by mmap'ing in the dwarf data and maybe part of the parsing could be cached?

fun story, i once solved a similar performance regression in a machine learning context, where calling code would serialize an entire model and pass it to inference code, which would then deserialize it all, make one inference, and then tear it all down. if online/streaming is not required, a huge speedup can come from just batching the work.


Parsing the dwarf data has orders of magnitude more overhead than fork/exec. (source: I'm the one who opened the linked Debian bug)


I suspect it's both. But I didn't measure, because the patch does away with both sources of overhead at the same time :-)


I would like to see a more general approach to solving the problem of "short lived processes are expensive".

There needs to be a way to dynamically link a binary into your own address space, and call it's logic repeatedly, without incurring all the process startup overhead all the time.

Pretty much, have all command line utilities be linkable like a library.


Afl-fuzz runs a program until main() and then forks it repeatedly. An empty main() can be called at least 10000 times per second this way.


With AFL++ you can even determine exactly where the fork happens:

https://github.com/AFLplusplus/AFLplusplus/blob/stable/instr...


I do know Boost.Stacktrace calls addr2line too. From the code(https://github.com/boostorg/stacktrace/blob/develop/include/...), it seems Boost.Stacktrace also shells out to addr2line for every address. But in practice, I found the overhead of boost::stacktrace::stacktrace() is not as horrendous as my crappy implementation, which calls addr2line, too. Is there something I missed in Boost's code?


Does anybody have working instructions to build a patched perf for Ubuntu 20.04? I built from kernel.org following https://michcioperz.com/post/slow-perf-script/, but somehow flamegraph pointed at the new build breaks totally -- after exiting the app, there's no "perf record: Woken up 181 times to write data" message; it just hangs.


Note that the patch does something other than what Michał's post ends up doing: instead of linking libbfd, it replaces lots of calls to addr2line with a single long-running call to addr2line.

If you want to try the patch on Ubuntu, I recommend using the Debianish technique of "apt source linux-perf-5.10" (or whichever version of the kernel you're running) and applying the patch. Then "make" in the tools/perf directory, and it should work...


In a separate context I discovered the slow performance of addr2line and wrote my own faster one as a learning experiment:

http://neugierig.org/software/blog/2012/03/heap-profiling.ht...


Silly rants about the GPL aside, why can't you use (LGPL) libelf, following its version of addr2line? The binutils-static rpm doc even says to use elfutils instead in new code -- no BFD? A quick look suggests RHEL's perf does so, but I'd need to check.


I wonder how many other packages suffer from this problem... Maybe building everything from sources isn't that bad after all.


perf-report's slowness is very frustrating. This slowness started around 2017 or so. However, you can get around it by passing --no-inline and the graphs will be exactly the same.


Why not build from source and link against libbfd?


That would have solved my immediate problem, but the result is not distributable, so it wouldn't solve it generally, for others suffering the same thing.


So all I am understanding is, we could make linux 10-100 times faster, but licensing-quirks don't allow it? Physical electricity, time and whatnot is wasted because of legal foo?


No, you're not understanding it. Perf is a profiling tool. It's slow on Debian because reasons.


The GPL requires that software linking to GPL projects are also licensed in GPL. This issue could be solved easily by releasing GPL software, that respects users' freedom.


Both packages involved here are GPL, but unfortunately have mutually exclusive version ranges (=2, and >=3)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: