Hacker News new | past | comments | ask | show | jobs | submit login
Transmission torrent client ported to C++ (github.com/transmission)
402 points by beepog on Sept 12, 2021 | hide | past | favorite | 212 comments

I think it's a sensible choice. I've seen way too many C codebases rewriting half of the STL or using clunky macro hacks (or worse, half-assed linked lists) when basically every single platform out there has a copy on the STL available which includes containers and algorithms with excellent performances and are well tested.

It's complicated but it's the only reasonable choice. You can then write your code C-style while compiling it as C++ and nobody will bat an eye.

STL has a lot of weird pitfalls. There was std::vector<bool>. Here you can see some pitfalls of std::unordered_map: <https://youtu.be/ncHmEUmJZf4?t=2220>. The whole talk is interesting to watch. In the beginning you can also see the puzzling design of std::unordered_map (puzzling because of the number of indirections).

I'd reach for abseil first: <https://abseil.io/>.

These pitfalls really aren't that bad. An extra copy here and there only matters in the tight loops. Some of the examples in the talk are really contrived. Am I going to insert the empty string many times as a string in a hash map? No, in most cases I am going to make sure that almost all of the keys that I generate are different. Sure, google can save on its electricity bill by doing micro-optimizations. For me trying that would be a bad trade.

Every language has its pitfalls and I tend to prefer C++ pitfalls to some other pitfalls. Lately I having been doing some python. Turns out a python -m venv is different from a virtualenv. Turns out pytest some/specific/test/file.py is different from python -m pytest some/specific/test/file.py. I am wishing quite hard this code base consisted of all C++. And of course mypying this is still on the TODO list so the type annotations that are there might well lie.

> STL has a lot of weird pitfalls

everything that's old and has legacy is doomed to have several. It's just a fact of life.

The C++ committee doesn't and can't just throw away stuff like `std::vector<bool>`, despite it being a whole fractal of bad ideas, because people have already used that piece of dung in uncountable projects. It would be nice to have a way to get a _normal_ vector of booleans (even though vector<char> just suffices most of the time), but that's life I guess.

People bring this point up every time C++ is criticized, but no, not all C++ shortcomings come from its age. In fact, most of the shortcomings it is criticized for today are new. The committee just keeps on adding new ones: initialization (mentioned by another reply to my comment), std::optional & std::variant (UB), concepts[1], std::thread (good luck debugging where the std::thread which crashes your program in a destructor came from), std::random_device()() being allowed to return 0 every time (which e.g. won't show up when you compile for your own machine, but will when you cross-compile) and probably many others which I don't remember off the top of my head.

[1]: <https://news.ycombinator.com/item?id=28098153>

But back to the point, my original comment wasn't about C++ being a bad choice. It was about STL. You can use C++ without STL. I pointed to one alternative: abseil.

It does indirectly come from age, because legacy and backward compatibility mean that you can't simply change some core parts of the language at all. For instance, std::optional uses UB because _all_ of C++ does the same, and making std::optional<T>::operator* different from std::unique_ptr<T>::operator* would have caused similar concepts to have a different behaviour - something that would have definitely translated into weird bugs.

Also, C++ hasn't features like built-in move semantics and borrow checking, so it's very hard to enforce certain things that just come natural in Rust. That's simply stating a fact. The real point here is that std::optional<T> is much less likely to cause UB than a plain, old C pointer, so it's a net improvement - even though it could still lead to UB in some circumstances.

> https://youtu.be/ncHmEUmJZf4?t=2220

Some good points, but also "calling find with C strings is slow", you are using C strings, don't expect speed from code worshiping strlen of all things. Also the issue with integer hash being its identity? That is the case for every language I checked (Python, Java, C#).

I mean, if Python, Java or C# were good enough for me as a C++ user, I'd use them.

The mindset behind C++ is not a relative "thing must be good enough", but an absolute "there must not be something better possible".

Of course, this is often not realized in practice, but it's the goal.

> "there must not be something better possible"

It maps all integer values to distinct hashes which seems rather ideal, what it doesn't give you is perfect performance in an artificial benchmark written to exploit this implementation.

Tangent, but I remember reading not too long ago that there were some cases where perfect hashes weren't optimal, but can't remember where..

after watching that video, I never want to touch C++ again

Several CPPCON talks in recent years have reinforced that for me as well. https://www.youtube.com/watch?v=7DTlWPgX6zs, for example.

Edit: I originally linked to the wrong video (https://www.youtube.com/watch?v=TFMKjL38xAI), which was a different talk by the same person.

Then someone has to do it for you, because for the foresable future there will be plenty of critical libraries, and compiler infrastructure, that won't be available in anything else as implementation language.

Hence why I keep my C and C++ skills sharp, even though they aren't my main languages any longer.

or it is time to replace these critical libraries with something that has the same features and is far less convoluted and far harder to make serious mistakes with.

In addition to having experts accidentally make mistakes with `ref ref` like in that first video that resulted in performance degradation due to unwanted copies, C/C++ is a major security risk. Countless vulns are from simple mistakes and complex misunderstanding.

This is a pipe dream. There are systems still running COBOL. It’s a real hard sell to convince those with the money to let someone rewrite something that’s working and has been for some time.

That is a noble goal, first LLVM and GCC need to be rewritten in one of those languages, then CUDA, Metal Shaders, DirectX, Tensorflow, Qt, WinUI, ....

So even though managed languages and better systems programming languages are making their replacements, somewhere down the stack there will be C++.

And then to top it, as long as people insist in using UNIX clones, C will be there as well.

But if that's the case couldn't there just be a library that provides all the functionality without having to switch languages? Something like boost but for plain C?

The lack of generics for C makes it difficult to write those sorts of libraries in an efficient way.

C has had generics since C11. Can't fault you for not knowing that, as they're a pain to use and I think folks would rather forget that the language feature exists.

_Generic is basically a switch on typeof(x). Despite the name, it's really a specialization/overloading mechanism.

Those aren't generics, they're more like overloading.

I wrote this (slightly crazy) generic vector for C: https://gitlab.com/nbdkit/nbdkit/-/blob/master/common/utils/... If you search for "string" in this page you can see how it is used: https://gitlab.com/nbdkit/nbdkit/-/blob/master/plugins/data/...

And, generally impossible to write C generics so they may confidently be used safely. There are reasons why C++ was necessary.

You’d use a code generator for this, similar to how you’d do it in golang.

C libraries can't be simultaneously as generic and as fast as equivalent C++ libraries without (or sometimes even with) the aforementioned "clunky macro hacks".

if "fast" was the only requirement we would be writing it in assembly

"Fast" is never the only requirement but often is a requirement. And the problem with a slow library is you don't know when rewriting the program to not use the library will be necessary to achieve the desired performance.

Nope, folks didn't invent higher-level programming languages (the first of which was arguably Fortran, see case in point [1]) because they wanted their code to run slower.

In fact, if “fast” is the most important requirement we'll probably write it in Fortran or C++. With C++ you just have to know which parts not to use.

[1] Fortran Web Framework https://news.ycombinator.com/item?id=28509333

If fast were the only requirement, we’d have vastly different processor and memory architectures and we’d probably write in higher level languages with compilers that would also generate the optimal ISA to be compiled into architecture-specific microcode along with the binary that runs on it with enough information the OS can pick which part of the heterogeneous machine is best suited to run the program.

No OS, it’s too slow. The binary needs to bootstrap the machine as well.

Yes, Gtk+ and Glib do that with the catch that C has no type polymorphism, so everything is done with casts and void pointers. It's even fully object-oriented.

The Glib approach is an immense monolith of nonsensical zealotry towards C. Reimplementing an Object-Oriented system in C using hacks and void pointers when GCC already had not one but two fully object oriented languages built-in was a completely pointless endeavor. There are a million different ways to shoot youself in the foot with Glib, and it definitely doesn't feel like writing C at all.

I understand the historical context that lead to the creation of Glib (no C++ standard in the mid-'90s, no decent open Objective-C libraries, Stallman and most of the FOSS movement at the time harbouring a deep disgust towards C++ for some reason, etc), but it can't be pointed at as a good example of how to code in C - on the contrary, it's IMHO exactly what your C project shouldn't do. You've picked C because it's simple and clear, don't butcher it up with macros and weird hacks just to get a worse Objective-C.

> understand the historical context that lead to the creation of Glib

Well, yeah, that was 23 years ago. Back then there was no GNUstep and no free as in freedom C++ or Objective-C compiler. Now GCC's main development language is C++ and GNOME is slowly becoming somewhat language agnostic and also has Vala which at one point was supposed to cover up that mess.

I guess another point is that they seem happy with it[1], so why should we care anyway.

1. http://planet.gnome.org

I’d love to have a good excuse to program in Vala… it seems like a very nice language.

Both g++ and the objective-C frontend already existed back then, so your argument isn't correct. I think the main reason back then was that Stallman and the FSF were very much against C++, and pushed very hard for everything under the GNU umbrella to be written preferably in C or Scheme. They only relaxed that after much insistence from the GCC developers, which wanted to be able to use templates and RAII.

I think that Glib was born from the same mentality which considered C as the perfect language, with the end goal of getting the same functionalities of modern languages without leaving it.

You're right, g++ was there by the end of the first year but as far as I can tell libstdc++ didn't show up until around 1998. I don't know when GCC started supporting Objective-C (another commenter says maybe by 1992) but GNUStep didn't 1.0 until 1998 (it was fairly incomplete before that.)

But yes, you're right, they could have developed Glib in either of those languages even if STL and *step didn't exist yet. I also found this e-mail[1] from Stallman that supports your claim. Do you know of more documentation? I'm interested in this history.

1. http://harmful.cat-v.org/software/c++/rms

The main reason why Gtk and GNOME exist was Qt's license and yes C above all.

Here is one of the original versions of the GNU Coding Standards from 1994,


> Using a language other than C is like using a non-standard feature: it will cause trouble for users. Even if GCC supports the other language, users may find it inconvenient to have to install the compiler for that other language in order to build your program. So please write in C.

I found an archived copy of the GNU guidelines from 2004 that well represents what was the GNU policy of the time:


As you can see, it basically consisted in "use C, unless you _really_ have too". It was also well known that C++ was heavily discouraged by Richard Stallman, so picking it could well end up in your project having a hard time being adopted by GNU.

The guidelines were relaxed afterwards, especially since GCC came to rely on C++ and every relevant C compiler was then written in C++, making the whole "we don't want to force people to install g++" even more nonsensical. In 2004 C++ was already way too ubiquitous to not have g++ installed.

> no free as in freedom C++ or Objective-C compiler

G++ was available in 1990. I don't recall when Objective C became available in GCC, but it was present in at least gcc-2.2.2 in 1992.

"Something like boost but for plain C" is, exactly, Standard C++.

In theory there could be such a library but in practice I don’t think there is. Maybe because maintaining something like the STL is a non-trivial task.

That’s pretty much what glib is. There are generic structures.

Does transmission use glib?

Yes, it's built on Gtk+ (which is built on Glib.)

thats (one of) the client(s)?

I wonder if Dlangs "betterC" would be a good fit, since you get real metaprogramming then.

Dlang's betterC would be a good option for just a better-than-C language but looks like it doesn't support D's standard library [1]. That'd mean being stuck with plain C container libraries. :/

Personally, I was impressed by an approach taken by the authors of Pixie (a fast 2d library like Cairo) [2]. They created a "library wrapper" generator for Nim code that supports creating libraries Python, Node, C, and Nim itself that they call Genny [3]. Haven't tried it but being able to use Nim and it's ref based gc but still export to nice API's in other languages is fantastic. Pixie's aim is to be a Cairo alternative so it makes sense they'd need this. I hope the approach takes off. Usually writing any cross language API's is a lossy operation.

Here's a sample of the API definition:

    exportObject Matrix3:
         mul(Matrix3, Matrix3)
1: https://dlang.org/spec/betterc.html 2: https://github.com/treeform/pixie 3: https://github.com/treeform/genny

(edited formatting)

The consequences of betterC are listed out here. https://dlang.org/spec/betterc.html#consequences

Notice that betterC does not exclude templates or destructors, so you could still use c++-style containers based around RAII, etc.

The argument against it is a smaller ecosystem and you’d end up having to roll your own containers etc.

>I wonder if Dlangs "betterC" would be a good fit,

You mean "Das C" ? :)

>"You can then write your code C-style while compiling it as C++ and nobody will bat an eye."

Very practical and reasonable choice.

C style code plus you can get some nice things like hashmaps without having to implement or find a implementation and instead use std library

You can also statically link libc++. Not much of a binary size increase from static linking because most is template headers.

> Not much of a binary size increase from static linking because most is template headers.

Unless you use anything that pulls in locale code (e.g. any stream), in which case you will get code for things like date and monetary formatting in your library even if it is never used.

Linkers are pretty good at throwing out dead code, so if you don't reference these functions there's a decent chance they won't be linked in when linking statically.

Seems like exception handling and exception safety would be quite tough to achieve if you are just sprinkling STL throughout a mostly C codebase, no?

A mostly C codebase using a bit of STL will almost certainly not use exceptions. So there's no need to worry about exception safety.

The STL can throw exceptions, but it's easy to avoid that if you use the right patterns (i.e., find() instead of at(), iterators, and so on). If you build your C code as C++, exceptions will flow through it just fine. Nothing will throw anything unless call throw (or you run out of memory).

> If you build your C code as C++, exceptions will flow through it just fine.

Exceptions will propagate up the stack, but won't clean up resources since the C code expects to do that manually. Same is true for non-RAII C++ code that is not written to expect exceptions.

Your main point is correct though: Large parts of the C++ stdlib do not throw.

Well, you could compile everything with noexcept and not use exceptions, but that would certainly make large chunks of the STL useless.

What? Everything in the STL works fine without exceptions. Only wrinkle is if one wants to handle out of memory in way besides crashing, although there is generally nothing better to do. Besides, the kernel will often kill processes rather than return malloc failures (see the OOM killer) anyway.

IIRC, STL constructors throw if they can’t run successfully. I suppose if you’re sure that OOM is the only thing that is going to cause constructors to fail, you can just live with allocation failures causing program termination.

> C++'s std:: tools are more useful than the bespoke tools in libtransmission's C code. Nearly every time I go in to fix or change something in libtransmission, I find myself missing some C++ tool that I've come to take for granted when working in other code, either to avoid reinventing the wheel (e.g. using std::partial_sort() instead of tr_quickfindFirstK(), or std::vector instead of tr_ptrArray) or the better typechecking (e.g. std::stort() instead of qsort()).

I think it's more to glib development entering an ice age, than a language issue.

Glib had an impressive roadmap list around 8-10 years ago, copying much of STL features, but of course nothing much materialised after GNOME 3 caused an exodus of GTK developers.

Patching up C so it can safely play where the big kids do is both pointless and doomed to failure. It is way, way less work to switch to using a better language, instead.

Any C program can be converted to build with a C++ compiler with no more than a day's work. Then, you can start in modernizing the active parts, deleting local, generally poorly-tested, cobbled-together utilities as you go.

I have been sounding that trumphet since 1993, but there are lots of very hard listeners.

The title is a bit misleading. This PR is just getting the lib to compile with a C++ compiler. Seems like the bulk of the work remains.

> This PR is the first incremental step of getting libtransmission to compile with a C++ compiler.

Which I guess makes it C++? Is there a line you can draw between the two besides “it uses x to compile” I dislike C++. But I will say it has come a longggggggg way in the past 20 years.

I think if it still compiles in a C compiler then it would still be C and we could think of C being a subset of C++ in this case (even if that's not strictly true in general.)

It clearly no longer compiles in a C compiler. With just a 30-second's worth of a quick look I found "auto" (type deduction) and "static_cast" in the PR.

Transmission is one of those pieces of software that just works. I've been using it for around a decade, and can't see that changing anytime soon.

I do not know why but for me it always is slower than qBitTorrent.

It connects to less peers when downloading or uploading.

I don't know libtransmission very well… but isn't that getting closer to libtorrent ? It seems to be the defacto C++ implementation: https://www.libtorrent.org/

Transmission has always sort of been the (or at least one of the) alternatives to libtorrent. I like transmission because they deliver a more cohesive approach be it cli, web based or GUI.

For me qbittorrent offers more bang for the buck and is just as stable and fast as transmission. The killer feature for me has always been the ability to bind it to a network interface that placed it a big step above transmission. I have used both quite a bit in the past.

Transmission supports binding as well I think it's just not exposed in the config GUI.

This would have been useful for me to know back when I was using transmission as a component to download content.

It only supports binding to an ip address and not to an interface like wlan0 or eth0 or wg0

Could people not spam github threads with "congrats on getting on HN"? This is so obnoxious.

There's one such comment on there, from someone who's made a donation to the project. I don't see the problem.

Gcc and Gdb went this same route, sometime back, without drama. Both programs have improved markedly since then. It is less work to achieve the same result, so more rewarding. People prepared to put in equal work get a better result.

As much as i dislike C++.. Std is invaluable in comparison to home baked C libs

Is there still no good semi-standard library for C?

That is why POSIX was born, the semi-standard library for C that ISO/ANSI did not want to have.

Yet, apparently it is not sufficient as people still have to migrate to C++?

Because C is still stuck in a PDP-11 mindset, without any proper strings or vector types, modules, namespaces,... there is very little help from POSIX there.

C++ has proper strings?

More than C will ever have, https://en.cppreference.com/w/cpp/string

u8string is meant to be UTF-8? So we now can use strings easily and safely, more or less the way we use them in Python (except still having to free the memory manually)?

Why do people still use C then? In what aspects is it better than C++? Is C++ less portable than C?.

For CLI fans, aria2c supports BitTorrent protocol.

Transmission also does have transmission-cli

qbittorrent-nox also exists if you like qbittorrent

I use qbittorrent-nox and make torrents with transmission-create i.e I put peanut butter in your chocolate.

I think the point of the good ducktective's comment was to offer an alternative to people who don't like waiting hours for C++ to compile.

    apt install transmission-cli

Sorry it doesn't take hours for transmission or qbittorrent or their cli versions to compile. It takes minutes even on old hardware.

I'm aware I'm taking the bait here but I think it's worth noting Aria2 is C++ as well. Not because of the compile time comment but because most expect such tools to be C and it's interesting to note in its own right.

I've been using aria2c for some multithreaded HTTP downloading stuff recently. I surprisingly found it to get bottlenecked on a single core.

For torrents, I tend to just use rtorrent, which has worked perfectly for me for probably 15 years now. https://github.com/rakshasa/rtorrent. Nice, fast, doesn't seem to use many resources to get the job done.


A change of 84 files and not a single code review comment. Approved and merged! Wow!

Looks like he and the guy he had been talking to directly account for ~95% of the line changes over the last 15 years. The next 2 highest account for ~3.5% but haven't been active in years.


Users should read through the changes before updating

https://trac.transmissionbt.com/browser/branches for pre-2016

https://github.com/transmissionbt from 2016

When we untar the source, the "Changelog" file is empty

  diff -U0 -r transmission-2.94 transmission-3.00|less
Look for compelling reasons to update

When statically compiled 2.94, stripped binaries were 3.4M; not too bad

Still had to make fixes by hand to get it to compile with musl. These were reported by a musl user on Github years ago. Oh well

Need to include bsdqueue.h instead of sys/queue.h in these two files


Needed to fix some libcurl problem in libtransmission/web.c, too


The need for web.c is debatable; "webseed" is not peer-to-peer, who even uses it

Wonder if its even worth the effort: https://trac.transmissionbt.com/log/trunk/libtransmission/we... "pulling my hair out"




Might have miscounted but looks something like

   Year Releases
   2006 6
   2007 8
   2008 19
   2009 50
   2010 21
   2011 10
   2012 11
   2013 7
   2014 2
   2015 0
   2016 1 +
   2017 0  
   2018 2  

   + In 2016 started using Github

Looks like the sys/queue.h problem got fixed in 3.00

Would be nice if there was a compile-time option to disable the BEP 17 BEP 19 feature

I agree with the positive comments in this thread. I too was a C++ hater, and I by no means understand it as well as C, but recompiling a small (~15KLOCs) project today with a C++ compiler caught a few issues.

And on another project, it was great to be able to use std::map instead of rolling my own equivalent. The only thing that scares me is exceptions.

As someone who is not a C programmer, but is aware of the fact that the Transmission codebase is 15 years in, can someone state whether it is normal that after 15 years of work on the code, that it is so terrible it has to be changed? I'd like to think that if I worked on something for 15 years that it'd be pretty stable by that point.

From the pull request, it was less that the C codebase is bad and more that the C++ STL has gotten better.

Not a C/C++ programmer either, but I got an example. InnoDB was first released in 2001, and was ported from C to C++ in 2011: https://github.com/mysql/mysql-server/compare/78f4351..3a455...

For a project that keeps being maintained, you want maintainable code that isn't a burden to work with. That's their rationale according to the PR. It's not that the code is bad, but it's more pedestrian than directly writing C++, so can lower the motivation of the involved people to actually work on the code and implement the things they want to implement.

Even if it is stable, it may have accreted many features over that time such that the code is now difficult to modify. Being able to leverage standard c++ stuff could be quite valuable, and could even result in less code (for them to maintain). They appear to be doing it very gradually, where the first step is probably little more than simply compiling the existing code as c++ instead of c.

Transmission probably should eventually be changed, so that support for BitTorrent V2 can be added. SHA1 collisions are only ever going to get easier, after all.

The question is too broad. It will depend on your situation and resources. If the code has been stable and secure for 15 years with a bit of maintenance? Why would you replace it? If you are adding new features or the code has become hard to maintain? The sure refactor and rewrite away. There is something to be said for using timetested code, but you should never turn it into a religion and remain practical and open to revising it if good reasons come along.

I wonder how much the size of the compiled code increased --- it's been my experience that "C++-ifying" tends to bloat the binaries quite a bit.

"Zero cost abstractions" are, in practice, quite rare.

Just built both versions for you. Edit: Please note the C++ code change doesn't actually use the STL, it really just changes the compiler and code style in a few places. So I don't think this represents any argument for or against C vs C++.

  -rwxr-xr-x. 1 rjones rjones 4167912 Sep 12 22:09 build-c++/gtk/transmission-gtk
  -rwxr-xr-x. 1 rjones rjones 3997408 Sep 12 22:12 build-c/gtk/transmission-gtk

  -rwxr-xr-x. 1 rjones rjones 3130352 Sep 12 22:09 build-c++/daemon/transmission-daemon
  -rwxr-xr-x. 1 rjones rjones 2959640 Sep 12 22:12 build-c/daemon/transmission-daemon

  total 12156
  drwxr-xr-x. 6 rjones rjones    4096 Sep 12 22:08 CMakeFiles
  -rw-r--r--. 1 rjones rjones    5653 Sep 12 22:08 cmake_install.cmake
  -rw-r--r--. 1 rjones rjones     289 Sep 12 22:08 CTestTestfile.cmake
  -rw-r--r--. 1 rjones rjones   13793 Sep 12 22:08 Makefile
  -rwxr-xr-x. 1 rjones rjones 3082448 Sep 12 22:09 transmission-create
  -rwxr-xr-x. 1 rjones rjones 3066256 Sep 12 22:09 transmission-edit
  -rwxr-xr-x. 1 rjones rjones 3182928 Sep 12 22:09 transmission-remote
  -rwxr-xr-x. 1 rjones rjones 3073952 Sep 12 22:09 transmission-show

  total 11496
  drwxr-xr-x. 6 rjones rjones    4096 Sep 12 22:12 CMakeFiles
  -rw-r--r--. 1 rjones rjones    5645 Sep 12 22:12 cmake_install.cmake
  -rw-r--r--. 1 rjones rjones     287 Sep 12 22:12 CTestTestfile.cmake
  -rw-r--r--. 1 rjones rjones   13725 Sep 12 22:12 Makefile
  -rwxr-xr-x. 1 rjones rjones 2917136 Sep 12 22:12 transmission-create
  -rwxr-xr-x. 1 rjones rjones 2895128 Sep 12 22:12 transmission-edit
  -rwxr-xr-x. 1 rjones rjones 3014872 Sep 12 22:12 transmission-remote
  -rwxr-xr-x. 1 rjones rjones 2901832 Sep 12 22:12 transmission-show

I think you’d have to strip the binaries to account for debug symbol differences.

size(1) exists to provide the actual information. File sizes are only a rough indicator.

     text    data     bss     dec     hex filename
   920576   11508    4152  936236   e492c build-c++/gtk/transmission-gtk
   912575   11644    4120  928339   e2a53 build-c/gtk/transmission-gtk

What compiler flags did you use? Perhaps -Os would even things out a little more.

Why would anyone want to optimize for space? On a halfway modern PC at least, some really embedded platform might be a different story.

Cache-friendliness; applies to any platform.

And it's "optimize for size", by the way.

Right, in certain cases. But here it's a networking application. So memory accesses won't be your bottleneck.

gcc-11.2.1-1.fc35.x86_64 with whatever defaults the upstream project chooses.

Did you run CMake with -DCMAKE_BUILD_TYPE=Release ?

No, with -DCMAKE_BUILD_TYPE=RelWithDebInfo

Right, and C++ is definitely going to have more debug info because of longer symbol names.

Test the two with Release instead. That will give you real results.

Thanks for the proof. Over 100k increase on average. Not surprising.

...and people wonder why software gets slower and bigger over time while doing the same thing. We even get HN articles about that semi-regularly.

Unless you're on a limited embedded platform 100k of size increase in a binary is basically nothing these days or any pc/mac built in the past 20 years. Also equating bigger to slower in binaries as a truism is usually a fallacy.

Are you actually being serious? Less than 3% increase warrants such a reaction?

Will it ever decrease by 3%?

The amount of increase is irrelevant if it continues in the same direction.

A question that I'd like to ask a C++ expert: If you use any STL container, is it always the case that the whole thing is templated, therefore effectively compiled and statically linked into the binary? Or will part/all of it come from functions in the dynamically linked libstdc++.so?

libstdc++.so (and libc++.so if you're bent that way) contain the standard stream objects, some standard string functions specialized on `char`, some threading and synchrony support and important parts of the C++ language runtime (eg. a default ::operator new, some of the exception unwinding and RTTI mechanisms). And that's it. It's actually fairly small and basic.

Pretty much everything else in the C++ standard library is template code in headers and gets instantiated when needed. You may end up having identical templated functions emitted into several different translation units and the linker will chose one at link time, which is one major reason why C++ is slower to build.

Only template functions that are actually used get instantiated. If you include a header with a gazillion templated functions (namespace level or member functions, it doesn't matter) you are most likely to end up with only one or two instantiated in your binary. Templates are like a cookbook, and just because Larousse has 10,000 recipes does not mean that's how much you're going to eat at every meal. Consider there is virtually an infinite number of possible instantiations for every template.

I guess it depends on your definition of small... VSCode recently mentioned a 10MB hit to statically link the libcxx library from Chromium to dodge issues running on older platforms.

Edit: libcxx from Chromium, not libstdc++


> The increase in bundle size is significantly small (~10MB).

That depends on how the standard library is implemented. I wouldn’t be surprised if libc++ or libstdc++ used explicit template instantiations for STL containers of primitive types, e.g. std::vector<int>.

Here, targeting Commodore 64 with C++17.


100k bytes = ~98 KB.

That's not material.

Most of that is probably in some symbol table used by the linker. C++ symbols include type names for parameters, C compilers only contain a function name, its linker literally can't tell the difference between a char main[10] and int main(int,char*){} because both are compiled to "main" .

This is important.

The size of the stripped binaries should be compared. ELF binaries contain both link-time sections and loadable segments, and the longer name of C++ symbols only affects disk space (for both the link-time symbol table and for debug information). The size of the runtime-loadable segments is really the only thing of interest, and stripping the binaries gives you a truer indication of this.

Probably a bit. Monomorphization is expensive in terms of code size. On the other hand - why would you care for a couple of 100kb for a use case like transmission?

uTorrent, a whole torrent client I remember using many years ago, was only ~100k in total.

uTorrent is on the same page as Transmission before this change, it achieves its small size by not using the STL at all.

All sources I've found indicate that uTorrent was written in C++, but perhaps its author was a bit more mindful of unnecessary abstraction and the like.

This is worth keeping in mind, but the real issue is the pressure on the optimizer and linker rather than size IMO.

It is a common mistake what you're doing by generalizing it out of the original scope, which is only performance wise. Any other aspect you might think is not encompassed into the original zero cost abstraction concept.

Zero cost abstractions are about performance, not code size.

Code size definitely affects performance, and has done so ever since caches existed.

I advise a talk from Chandler Carruth where he proves longer code can achieve higher performance due to the way computer architectures work.

Unfortunately I no longer remember in which conference he did it, maybe someone else can link it.

In microbenchmarks, on a long pipeline RISC, or similar microarchitectures like NetBurst (P4), I could see that being true. But we're long past that era now. It's the same misguided assumptions that leads to several-KB-long "optimised" memcpy() implementations whose real-world effects on performance are negligible to negative.

If you don't believe me, read what Linus Torvalds has to say about why the Linux kernel is built with -Os by default.

The longer code is typically generated because the compiler will generate vectorized code that provides enormous speedups in case of longer data sets. Take, for example, this code: https://godbolt.org/z/WEx3Gb5jr

At -O2 the assembly it generates is straightforward, and in line with what a human programmer would write. At -O3 it generates vector code that needs a lot more instructions (vector pipeline setup, code to deal with the remaining elements that don't entirely fill up a vector register, etc.) but the main loop takes 4 integers at a time instead of one, so that provides a nice 4x speedup. In order to achieve that it needs 25 instructions to set up the loop / finish the remaining elements, compared to 5 instructions for the -O2 code.

For very short loops the -O2 version will have superior performance, but for runs of data from around 8 integers (wild guess) the -O3 version will begin with pull ahead. So it really depends on the type of data your program is handling, whether it is better to optimize for speed or size.

My recent tests with -Os resulted in distinctly negative effects on performance.

But the main problem with -Os is that it is poorly exercised. The best-exercised modes are -O0 and -O2, so those are the ones to use in production.

> My recent tests with -Os resulted in distinctly negative effects on performance.

Err... yeah, because -Os means "optimize size". Not "speed".

> But the main problem with -Os is that it is poorly exercised

No, the main problem is that you don't understand -Os :) It works as intended.

Evidently you are unaware that use of -Os has on earlier (but quite recent) generations of CPU architecture resulted in notably faster performance.

And, that any compiler feature that is little used will necessarily receive less attention than commonly used features, and be less stable and reliable. Before trying to fix any bug in a program built with -Os, reproducing it first in -O2 will reduce premature balding.

> Before trying to fix any bug in a program built with -Os, reproducing it first in -O2 will reduce premature balding.

I have no idea what you're talking about. Debug your programs in -O0, which means "no optimizations". -Os is, and has always been, optimizing for executable SIZE. It has no guarantees wrt performance.

He did it on x64.

EDIT: This is the talk, if I remember correctly.


I remember this being talked about in a talk about the Coz profiler. Maybe that was it?

I can easily name several usecases, when increased code size leads to better performance (compile-time evaluation, architecture-specific optimizations)

I wonder how many C standards we would have to go through, before the people that are writing C+ (C with some C++ but no classes, RAII, etc)/ or straight C++ to be converted back to plain old C

Could be 1. Just adopt the C++ std as the C std.

This is good, and going over the code to port it will probably mean it gets modernised and a few small bugs will be found / fixed along the way.

" could've ported to Rust instead "

kthxbye ;)

That would have been a rewrite. This change is to get the existing C codebase to compile with a C++ compiler, which is orders of magnitude less work.

I wonder if Clang-tidy could be used to write a transformation for each pattern rather than converting by hand.

From my own experience writing transformation tools on top of libtooling: you can make it work but the sweet spot between the time it takes to write the transformation code and just manually doing it makes it uneconomical for most use cases unless you have a gigantic code base and an error prone transformation that you have a better chance of getting correct when a computer can process the whole syntax tree.

For 99% of cases, regex-replace is the better choice (as much fun as writing syntax tree transformations is).

On another note. I am glad people are still working on it considering BitTorrent is way pass its golden era and is not being used as much.

Hopefully the refactor ( which has been talked about and planned for many years ) will fix some performance issues. I constantly get 20-30% CPU usage just downloading.

Does anyone know where to get up-to-date windows builds?

Both the linked GitHub repo and the official site have up to date Windows builds. There are also builds of the master branch at any given moment provided by the CI.




I'm impressed, it's no insignificant feat.

Although, I'm more amazed this post has been up an hour without any rust comments..

> Although, I'm more amazed this post has been up an hour without any rust comments..

There are more comments on any given post about hypothetical obnoxious Rust comments than any actual comments of that nature.

Ah, this feels like a promising debate. Can't someone do a statistical analysis about it?

Empirically speaking, OP brought it up first...

It's almost certainly a much more tractable problem to get C code compiling with a C++ compiler, and then gradually transitioning to C++ and the STL over time. I would love to see greater Rust adoption (especially in things that deal with untrusted peers on a network!), but that project is likely orders of magnitude larger than what the author has chosen to take on.

Also it seems like the author is already well-versed in C++; learning Rust might not be a priority for them, and would certainly slow things down a lot.

And while there are certainly plenty of footguns and unsafe things you can do in C++ (even unintentionally), it's probably still safer than C, even if it's not as safe as Rust.

And just to be that overly pedantic nit-picker, when most people refer to the "STL" today, they are really talking about the C++ Standard Library[1], which is related to and based on the conventions of the old-school Standard Template Library (STL), but a different thing. When you #include <vector> on most modern development platforms, you are not using the actual historic STL implementation.

1: https://en.wikipedia.org/wiki/C%2B%2B_Standard_Library

I would go with a transpiler approach first (eg https://c2rust.com/) and then gradually transition pieces to use the Rust standard library and to safe code instead of the adhoc custom containers.

Still, that all would be predicated on having good Rust experience or using it as a learning experience and that may not be the motivation of the authors.

> gradually transition pieces to use the Rust standard library and to safe code

Is this realistically possible on a non-trivial project? I suppose that porting to safe Rust requires a radical rearchitecture of the code.

I have no experience converting a C or C++ project to Rust but have transitioned Objective-C projects to Swift and have found that even with Obj-C's newer additions to improve Swift interop, making the project as a whole safe is difficult until you've factored Obj-C out of all critical paths. It's just too easy for things to go wrong in the Obj-C half or in the Obj-C → Swift interop.

So almost unavoidably you end up with two refactors: first, the piecemeal rewrite in the the safer language, and then a second when the original language has been rooted out and you can take advantage of all of the features of the new language.

>"And while there are certainly plenty of footguns and unsafe things you can do in C++"

If I want to shoot myself in the foot I should have full rights and the ability to do so.

In a normal coding however modern C++ is very reasonable language and I do not recall stepping onto any landmines in a last few years even though my code runs high load / high performance business servers that also do a lot of calculations.

Unless you are a perfect programmer, you will sometimes make a mistake, and shoot yourself in the foot even though you didn't intend to.

Perfect programmers don't exist. Everyone will screw up sometimes.

With Rust, I know that my code will be free of memory errors and data races as long as I never use `unsafe`. No one can say the same about a C++ program, not with certainty.

If you don't care, that's fine (I guess), but please don't get all huffy with those of us who do. I think it's telling that you seemed to need to get all defensive even considering that my original post was in support of the libtransmission author using C++ instead of Rust.

Just because you can't rely on the compiler to tell you, doesn't mean memory errors exist. There's this weird idea Rust programmers have that since C++ doesn't force you to write less performant code by bowing down to a borrow checker that isn't smart enough to handle e.g. single threaded futures and locking properly, that it is IMPOSSIBLE to write memory safe C++ code.

A piece of code without memory errors is, by definition, memory safe. This whole idea of "every piece of code has bugs" is some new-wave tripe coming out of code camps and the web development scene.

" No one can say the same about a C++ program, not with certainty."

Formal proof is a thing, so I think you can.

"Hello world" would be easy to verify. Verifying transmission on the other hand, is a kind of big project on its own, though.

And I agree, that there are no perfect programmers around, but I would also agree to parents point, that if I want to have full power, I fully want to remain my right for footguns. It all remains a tradeof. The ecosystem of C++ is just incredibly bigger than rust, so a new game, I would start in C++, but for something critical I would probably also choose rust.


It is quite easy to steer clear of unsafe constructs, and is less work. The people who trigger footguns are mostly those trying to code as if they were still coding C. Their code is typically slower, too.

> especially in things that deal with untrusted peers on a network!

Are there any practical torrent clients that I can run in docker container?

Transmission itself. I highly recommend the image built by the linuxserver.io: https://hub.docker.com/r/linuxserver/transmission

> Although, I'm more amazed this post has been up an hour without any rust comments..

I've actually thought "WTF why porting a project to a memory unsafe language", but actually the PR title is somewhat excessive - it's more of a modernization, indeed described by the author as "the first incremental step of getting libtransmission to compile with a C++ compiler".

> WTF why porting a project to a memory unsafe language

Because proper handling of memory is actually quite easy to achieve in C++ assuming you're doing all the right things. We also have great tooling for this a la sanitizers and valgrind.

Also, C and C++ remain to be more performant than Rust in the general case, and don't require rewriting everything into a completely different paradigm.

I think there is a tool to automatically convert C to Rust, but that would probably be a terrible idea given what looks like a lot of custom data structures that would translate to unsafe Rust.

It would be interesting to try to make it recognize C++ STL usage but not much more of C++ next.

I confess that it crossed my mind. Torrents are obviously network-oriented and are often used to interact with, uh, untrusted content (the torrent files but also the trackers and peers that exchange data with the client). The surface of attack is rather large, using a memory-safe language doesn't seem like a luxury. I also can't imagine wanting to write C++ in this day and age, but that's my prejudice.

That being said I've been using transmission for years without complaints, so I trust them to do the right thing.

The PR doesn't do much if you look at the changes. It looks like it's mostly compiling as C++ and using things like 'auto' and C++-style casts.

Speaking as a C++ person - you're right that this doesn't _do_ much, but nevertheless this is still a significant effort. This kind of groundwork is _work_, and opens lots of doors for future improvements.

That said, there are definitely downsides here. Even if they manage to keep c++ entirely out of headers, consumers now have the headache of either building libtransmission themselves _or_ making sure they include a compatible runtime in their own projects.

As soon as c++ leaks out into the public interfaces, it's game over for precompiled binaries - everyone will have to build it whether they want to or not. Not the end of the world, but certainly it could be painful for downstream projects.

They should probably be like LLVM and just keep a C wrapper around new C++ interfaces for such purposes.

But agreed this is the right first step.

Already that little does much, as C++'s type system is more strict.

The rest can be (hopefully) increasingly migrated.

This was both the plus (and now Achilles's heel regarding fixing C++) of migrating C code into C++.

But someone needs to do this before the interesting stuff can happen.

...but you did make a Rust comment :-O

Said the guy posting a Rust comment.

I'm sure the thread is monitored and strike force is being dispatched as we speak.

I bet you like to poke wasp nests with sticks too.

Rust is good but the GUI support is far from perfect, especially Qt-based interfaces.

they use rlang (don't confuse with erlang) as their C++ compiler (rlang is a C++ compiler written in rust, giving the benefits of ownership to C++ programs for free)

Rust has jumped the shark.

Same impression. I was wondering when those Rustamans will descend ;)

Not trying to be rude but I don't believe this is a sensible choice if you care about performance (sounds like you were more concerned about missing familiar features and such so this comment may be irrelevant). Just curious if you were you able to measure the performance degradation after this change was made? In other words what is the price paid for obtaining familiar constructs (it certainly wont be zero).

What makes you think modern C++ produces slower code than C?

Do you have any real example that you been bitten by such conversion?

GP was overly broad IMHO but there definitely can be performance costs to C++ over C. Some are really subtle while others are blatant. I got real-world bit by try/catch exceptions. In a non performance-intensive app like Transmission I would guess any such things would be irrelevant, but I don't think GP deserved the downvotes for the speculation.

Exceptions are faster than error codes in the happy case where there isn't an error. The deadly sin with exceptions is throwing them as part of the normal control flow. Don't do that for any code where performance matters.


I'm guessing the author measured the overall performance and assumed that using error codes, rather than exceptions, is what made the sample code he provided slower. Turns out, the reason it is slower is because, in the case using error codes, a copy of a std::string is made. In the exceptions case, where the string is returned rather than passed as a parameter by reference, the copy can be elided because the compiler invokes NVRO. Has nothing to do with exceptions.

Maybe that author should use a profiler before making these types of claims? check the generated assembly? etc?

Far slower to return a value in RAX vs. generate a ton of code (blowing icache, etc.) is quite an interesting statement. I think it deserves a closer look, though.

Where do you see NRVO being relevant in the test code: https://gist.github.com/Dugy/2532c810bb232b8ff1603cfa679bdf2...

The only case where a string is returned in the exception-based code is Xml::getAttribute() where in both implementations there is exactly one copy (copy assingment in the error code case and copy construction in the exception case).

Being able to just return the type in the good case is however one of the main advantages of using exceptions, so it might even be valid to count such optimizations being possible in more code as a plus of using exceptions.

> Maybe that author should use a profiler before making these types of claims? check the generated assembly? etc?

Have you used a profiler before making your claims?

> Far slower to return a value in RAX vs. generate a ton of code (blowing icache, etc.) is quite an interesting statement. I think it deserves a closer look, though.

The cost for error codes is the branching at every point in the call stack vs. only at the error source with exceptions. And the generated exception code is marked as cold and not stored interspersed with the non-exception code so will not affect performance unless exceptions are thrown - which if you are usng exceptions correctly should be exceptional cases.

YO. I measured it myself just to be sure and since you were such a dickhead. (Copying and pasting the output from console) With exceptions: Parsing took on average 135 us With error codes: Parsing took on average 0 us

Maybe the dude forgot to enable optimizations altogether (not present in the command line options in the comment at the top of that file). I added /O2 since it is a PERFORMANCE test, remember? Hilarious.

Ok, cool.

Exceptions can be avoided with the use of `noexcept` whenever we can do avoid them, especially in sensitive areas that we cannot risk exception throws.

> but I don't think GP deserved the downvotes for the speculation.

That's why I asked @squid_demon for a real example that possibly got bitten by it; else, it's simply an emotional reaction for favoring one tool over another.

If aria2 [1] that is implemented in C++ is extremely fast, then I can almost guarantee that transmission's refactoring in C++ will get there too, sooner or later.

[1] https://github.com/aria2/aria2

I was attempting, perhaps poorly, to ask the authors if they measured the performance degradation or not with this change. That way we would have the data you are asking _me_, for some reason, to provide. The onus is not on me to do their work for them and prove to everyone on HN that, yes, I have seen massive performance issues when people "port" C code to C++ (in many many projects over many many years throughout my professional career). If the authors choose not to answer or to care at all (if they even see my post) that is perfectly acceptable as well. I'm just interested in the facts so all of us can better choose whether this decision was a good one or not!

Actually yes, in an open source ecosystem the onus is on you to show that a change is making things worse for you. You can't expect the developers to test on all systems.

I'm not asking to test on any system EXCEPT THEIR OWN. Did they test the performance of not? If not, fine. That's all I was asking. It's not a huge ask either unless they don't give a rat's ass about performance. Which, if they want to use C++, they most likely do not.

C++ is faster than C, and has been for a long time in practice. It also allows you to achieve that performance with many fewer lines of code.

The primary reason people use C today is extreme portability.


Applications are open for YC Summer 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact