
Comparing C and C++ usage and performance with a real world project - zingplex
http://nibblestew.blogspot.com/2017/09/comparing-c-and-c-usage-and-performance.html
======
userbinator
I'm going to be the first to point out the one major flaw in this comparison:
"plain C using GLib" is not comparable to "C++ standard library only" \---
what should be compared is "C++ standard library only" and "C standard library
only". Reimplementing pkg-config in pure C without GLib would be necessary for
that.

As for the "memory leaks" \--- I haven't looked at the source, but something
whose runtime is very short-lived, like pkg-config, may be very well justified
in allocating and never freeing, letting the process exit itself be the
"ultimate free". I've seen and done this many times myself.

I've seen projects that turned from simple and straightforward to buggy (and
harder to debug), slow, and bloated because someone decided they wanted to
"use C++" and would try to make use of as many "modern C++" features as they
could.

 _Converting an existing C program into C++ can yield programs that are as
fast, have fewer dependencies and consume less memory. The downsides include a
slightly bigger executable and slower compilation times._

My experience has been the complete opposite.

~~~
jgh
Reminds me of this anecdote I came across on the internets one time:
[https://groups.google.com/forum/message/raw?msg=comp.lang.ad...](https://groups.google.com/forum/message/raw?msg=comp.lang.ada/E9bNCvDQ12k/1tezW24ZxdAJ)

~~~
blub
I have heard of calculating a "memory budget" and pre-allocating that, but
calculating a "leak budget" and doubling that doesn't seem like hygienic
programming.

~~~
coldtea
Doesn't have to be hygienic, it's enough that fixing it doesn't justify the
time/money costs for the programmer.

~~~
blub
We wouldn't want the programmer to spend too much money/time on fixing errors.
After all, it's not like people would die if there's an error in the missile
software:
[http://www.gao.gov/mobile/products/IMTEC-92-26](http://www.gao.gov/mobile/products/IMTEC-92-26)

Not to mention they added even more HW to work around the leaks. No wonder
those projects always run over budget.

~~~
coldtea
> _We wouldn 't want the programmer to spend too much money/time on fixing
> errors. After all, it's not like people would die if there's an error in the
> missile software_

When exactly did pkg_config became missile software?

As they say in meme-land, "that escalated quickly".

Here's a novel idea: how about the appropriate level of effort/time/YAGNI-
stuff based on the domain?

Or do you write one-time scripts with MISRA rules?

~~~
blub
The sub-thread I replied to was linking to an anecdote about missile software.

But I still think not freeing is pretty lame even in short-lived tools. If one
is using a no-op deallocator at least the code is designed properly and could
be repurposed.

------
fpgaminer
The minor discussion on the C version's memory leaks reminded me of a neat
trick. If you're developing a short lived application, like pkg-config, you
can opt to never deallocate. i.e. leak everything. In lightweight, short lived
applications there's usually not a lot of incentive to deallocate; your
application will never use much memory anyway and the deallocations waste
time.

You can think of it like treating C as a garbage collected language, except
the garbage collection cycle occurs only once at the end of the program :P

It really can be an effective trick. Deallocation isn't free, and under
certain loads can be quite expensive.

The 1000+ leaks in the C version might actually be what's giving it the slight
run-time advantage.

~~~
andreasgonewild
Or take that even further and allocate a slab that's big enough to last the
entire runtime and send malloc on vacation. It's worth repeating, with todays
focus on web-frameworks, cloud-providers and dogmatics these ideas are
slipping into obscurity; which is a shame given how beneficial they can be if
your program fits the use case.

~~~
sebcat
And, instead of using pointers into said memory, use indices so that the slab
can be reallocated, or just mmap more pages following the slab if you own the
process address space. Also, align properly. And having guard pages is always
nice. Freeing on exit too, one single deallocation is pretty cheap.

I've seen programs building ASTs with hundreds of millions of nodes, where all
the nodes were allocated by a separate malloc call, and ref-counted... More
than one-third of the startup time (which was counted in minutes) was calls to
malloc and free. Some optimizations were made, but in the end we ended up
reducing the size of the AST instead of fixing the allocations.

~~~
pjc50
Ouch.

Last time I did any serious parser work I used a pool allocator so I could
free all the nodes at once, so allocation was just a compare + increment
operation. Although that was forced on me by the difficulties of error
recovery in yacc.

------
gens
> .. which is written in plain C using GLib ..

That's about as "C" as C++ is. Why not Gneural or libpng or even GNU make ?

[http://git.savannah.gnu.org/cgit/gneuralnetwork.git](http://git.savannah.gnu.org/cgit/gneuralnetwork.git)
[https://github.com/glennrp/libpng](https://github.com/glennrp/libpng)
[http://git.savannah.gnu.org/cgit/make.git/tree/](http://git.savannah.gnu.org/cgit/make.git/tree/)

------
rjzzleep
according to dan saks, who apparently to some people is famous c++ is faster
than c (well in the test setup he describes below)

[https://accu.org/content/conf2015/DanSaks-
Embedded%20Program...](https://accu.org/content/conf2015/DanSaks-
Embedded%20Programming%20Death%20Match.pdf)

    
    
        Language   Design Implementation  Relative Performance
        either any inline 1 (fastest)
        C++ polystate non-inline 1.56 x fastest
        C++ bundled non-inline 1.65 x fastest
        C polystate non-inline 1.70 x fastest
        C bundled non-inline 1.79 x fastest
        C++ unbundled   non-inline 1.82 x fastest
        C unbundled   non-inline 1.95 x fastest
    

He furthermore argued that the biggest mistakes C++ developers did to kill the
adoption of C++ for C programmers was to diverge from the previous line of
"C++ is a better C" to "if you're using C++ as a better C you're doing it
wrong"

[https://www.youtube.com/watch?v=D7Sd8A6_fYUI](https://www.youtube.com/watch?v=D7Sd8A6_fYUI)

(I have no skin in the game, I was just curious to see if it's worth looking
at rust for embedded when I came across that talk)

~~~
humanrebar
> "if you're using C++ as a better C you're doing it wrong"

As far as correctness and safety goes, this is still true. It's difficult to
scale systems-level programming to large teams. C++ gives the opportunity for
more explicit semantics and more aggressive compile-time checks. C _can_ scale
well and _can_ be used safely, but you need to do a lot more through
convention (always call xyz_Create and xyz_Destroy in pairs!) and through
runtime checks (calls to assert, unit testing).

D, Rust, OCaml, and a few other projects are interesting in this space since
they provide some of the same benefits as C++ with respect to correctness and
safety. Some are plausibly better in theory, though I'm not aware of huge,
say, Rust projects that approach the size of huge C++ ones.

~~~
pcwalton
> D, Rust, OCaml, and a few other projects are interesting in this space since
> they provide some of the same benefits as C++ with respect to correctness
> and safety.

Can you name a correctness and safety benefit that C++ has that these
programming languages do not?

~~~
humanrebar
I didn't mean to imply that C++ was safer somehow. I just meant that those
languages are also competing in that feature space in a way that C doesn't.

And, on a pedantic level, C++ competitors can't provide exactly the same
benefits of C++ because they took different approaches.

What's a key design difference among these languages? Well, C++ can mostly
just #include a C header file and go with it. The other languages provide FFI
mechanisms, but they each require declarations of the FFI to match the
compiled C code. So theoretically there's a little more room for errors in
that translation, though I doubt that's a big concern on the whole. Each of
those languages have more mature module systems, which should more than make
up for keeping FFI interfaces in sync with C headers.

------
sesutton
The aside about Rust is a total straw man. Obviously no one with even a
modicum of knowledge of programming languages would think Rust is the only
memory safe language. Googling the supposed quote also turns up no results but
this post.

------
wott
It is nonsensical to consider memory leaks as reported by Valgrind on a
program that uses Glib. It allocates and builds a whole context system and
never frees it regularly, which confuses Valgrind. I am pretty certain almost
all 'leaks' come from there. _libglib_ should be put in Valgrind _suppression_
file.

It was a very bad choice to choose a program based on Glib for this kind of
experiment.

~~~
bjconlan
I think what you have said here sums up the problem quite concisely. They
might as well add core-foundation, qt-core and stllib based executables to the
test for 'c' vs 'c++' to give a better cross section.

------
khitchdee
FWIW, Donald Knuth was a proponent of using C over C++ at the time it first
came out. He equated C++ with the use of frameworks in writing programs which
he thought were a bad idea for the profession as it would dumb it down. C++
does make code reuse a lot easier.

~~~
overgard
> C++ does make code reuse a lot easier.

Not really, with ABI issues and compiler incompatibility widely used C++ libs
are either header-only, or have an "extern C" version of the public API. Id
say C++ makes reuse much harder.

~~~
aidenn0
1) ABI issues and compiler incompatibility hasn't been a problem for 5-10
years (using two compilers for a single binary is relatively rare).

2) Being "header-only" is no impediment to code reuse.

~~~
elderK
Hey there Aidenn0,

I'm no expert on C++ and I've been considering using it for several projects.

An important thing for my needs is being able to define classes in one shared
object and create new subtypes of those classes in another, possibly defining
overrides on virtual methods and such.

A good friend of mine has said similar things as you - that the ABI issue has
not been a major obstacle for some time.

And yet, as much as I search, I still find the same-old advice: Don't use STL
types in your interfaces or throw exceptions across module boundaries.

If all the compilers used for a given platform follow the same ABI, would
using a separate and specific STL implementation (say, STLport) instead
alleviate that particular issue?

Sorry if this question seems a bit rambley but I'd really love to find out how
to use C++ in the way I've mentioned.

~~~
aidenn0
If you use the same compiler, ABI is a non-issue.

If you want to distribute dynamic-link binaries for windows, use MSVC.

If you want to distribute dynamic-link binaries for OS X, use Xcode.

If you want to distribute dynamic-link binaries for linux, you are SOL
regardless of whether or not you are using C++, but if you use the same
compiler and flags that the latest LTS version of Ubuntu uses, then it will
work on Ubuntu, and will be made to work anywhere that Steam works.

It used to be that there were at least two C++ compilers for each *nix
(typically GNU and something cfront based), so ABI was a much bigger deal.

When "Modern C++ Design" came out, famously none of the compilers could
correctly compile all of the sample code. Since then things are much better;
not that all compilers are bug-free of course, but they are sufficiently good
enough that if you report a bug, you can expect it to be fixed.

[EDIT]

"Don't use STL Types in your interfaces" is not advice I've heard in like 15
years; I more often hear "If you're using a C array instead of a Vector,
you're doing it wrong"

"Don't throw exceptions across module boundaries" seems similarly odd. Unless
your constructors are inlined, no modern code-base will follow that rule
because RAII relies so strongly on exceptions.

There are coding styles that are opposed to exceptions as part of an external
interface, but that's due to exceptions not being checked as part of the type
system, and is not what I would call a majority opinion.

~~~
elderK
Thanks for the response.

To clarify "module boundaries", I mean "separate shared objects."

As for Linux, I'm not too concerned with creating a single binary that works
for all distributions.

I'm more concerned with someone being able to build a set of shared libraries
on their distribution of choice and those shared libraries being able to
interact naturally regardless of which compiler s/he uses to build each of
them.

Say, LibA is built using LLVM. LibB is built using G++ and LibC is built using
ICC.

LibA defines several classes. LibB creates some subtypes. LibC instantiates
types from both LibA and LibB.

All the functions present in LibA, LibB, LibC make use of STL types such as
std::string, std::vector, etc. Some may throw exceptions, whatever.

With respect to MSVC, I've read that compatibility between Debug and Release
builds is kind of suspect, especially if you're using STL types. Not to
mention differences in MSVC version. Is this still a concern?

~~~
aidenn0
> I'm more concerned with someone being able to build a set of shared
> libraries on their distribution of choice and those shared libraries being
> able to interact naturally regardless of which compiler s/he uses to build
> each of them.

Sorry, but this is an unreasonable standard. Literally no language, _including
C_ supports this. With C it only works inasmuch as the C compiler authors work
really hard to make it works, and even then it sometimes breaks (if your
compiler inlines a call to malloc, and you free a pointer compiled with a
different C Compiler that inlined a different malloc implementation, it can
break horribly. Yes I've seen this happen.)

Some languages support cross-version linking (or whatever the language's
equivalent of "linking" is), but I'm not aware of any that specify a complete
ABI for unrelated implementations to support. IPC libraries do typically
support this though.

[edit]

I don't want to go on a shared-library rant, but I am fairly strongly opposed
to them (except perhaps in cases like how nixos manages it). You can take a
statically linked binary from 1997 and run it unmodified on your linux machine
today. It is a virtual guarantee that any dynamically-linked binary more than
2 years old will not work correctly. Linus puts a huge amount of effort into
backwards compatibility, and it is completely destroyed by dynamic linking.

------
humanrebar
Interesting project!

The C++ version uses many memory allocations. Using allocators some in the C++
program would certainly cut down on the number of allocations. It would also
be interesting to see if doing so also improved performance.

Similarly, it would be interesting to see if using the C++17 string_view (or
the gsl version if C++17 isn't available to you) instead of `const string &`
parameters affected performance.

Finally. I see that in most (all?) cases, objects are returned by value, not
returned through reference parameters or pointers. It's interesting to see
that that choice didn't compare poorly to a C implementation.

------
hedora
It is interesting that the C++ version does twice as many allocations. I
suspect this means there is some low hanging fruit for future optimizations.

------
dingo_bat
I knew C++ compilation was slow but 30x slower? I'm sure the compile-time
memory usage will also show a similar trend. It would be interesting if
somebody could explain the reason for this disparity.

~~~
astrodust
The problem, by and large, is that C++ is heavily dependent on header files to
implement the Standard Library. It's largely templated, which means there's no
way to make a pre-compiled version, the code generated varies wildly depending
on the types involved.

C has relatively simple header files, they usually contain structs, function
signatures, and a bunch of macros. They're easy to parse and apply by
comparison, plus don't tend to be as deeply nested.

If C++ ever adopts the Pascal-style "module" extensions that have been kicking
around in various proposals compile times could shrink by several orders of
magnitude.

~~~
aewnjfksd
> If C++ ever adopts the Pascal-style "module" extensions that have been
> kicking around in various proposals compile times could shrink by several
> orders of magnitude.

I'm skeptical. Modules don't avoid the need for template instantiation.

~~~
nly
Template instantiation surely only requires type substitution and re-running
some analysis though. What makes C++ compilation slow is reparsing headers
again and again and again because the C the preprocessor means that every time
they are encountered they may have new semantics.

The motivation for modules in C++ is similar to that of developing a Binary
AST for Javascript, discussed on HN recently.

~~~
dozzie
> What makes C++ compilation slow is reparsing headers again and again and
> again because the C the preprocessor means that every time they are
> encountered they may have new semantics.

Really? And I thought that this is why C and C++ headers are typically wrapped
in #ifndef-#define-#endif block, so they only produce whitespace after
preprocessing on second inclusion.

~~~
Matthias247
Yes, this happens inside a single translation unit (.cpp file). However if you
have multiple .cpp files which include the same header file you have to
reparse it each time. This is because before the inclusion of that header
different #defines might have been set (e.g. through other headers), and
therefore the content of the header file might be different.

------
72deluxe
"The C++ version has no pointers but instead uses value types. This means that
all data is stored twice: once in the array and a second time in the hash
table."

This is interesting. Are they using modern C++ and making use of moves and
perfect forwarding? Or are they just throwing std::strings around and doing
millions of copies (e.g. remember std::vector must support copyconstructable,
so copy constructors & operator=) in the process? That would explain the
allocations in C++ being higher perhaps, particularly if they're using the
"wrong" containers. Why not sure unique_ptr or shared_ptr?

It is worth remembering that move constructors and assignment operators only
get used in very specific places and you have to ensure that any constructors
you write yourself are explicitly noexcept.

------
milansuk
> Every manual resource deallocation call is a potential bug. This is
> confirmed by the number of memory leaks as reported by Valgrind. There are
> more than 1000 of them, several dozen of which are marked as "definitely
> lost".

You can't compare performance If one program doesn't free memory, which
obviously "saves" time. Valgrind can tell you where non-freed heap blocks have
been allocated and a fix should not be complicated.

~~~
mcguire
" _Valgrind can tell you where non-freed heap blocks have been allocated and a
fix should not be complicated._ "

In theory, theory and practice are the same. In practice, they aren't.

------
widdershins
This is quite a good comparison too, along with lots of advice on code
quality. The result was that the speed was almost exactly the same.

[https://www.youtube.com/watch?v=SIAAvv1O7Gg](https://www.youtube.com/watch?v=SIAAvv1O7Gg)

------
ausjke
In general there is no way a C program(apple to apple, comparing to its
similar c++ version) will be larger than C++, be it static, shared libraries
included, or whatever.

~~~
astrodust
Today, sure, but there's no assurance that this will be true in the future. If
more compiler-friendly extensions are added to C++ to help it generate
tighter, more nimble machine code because it's given more leeway in
optimizations, then the C++ code could be substantially smaller. C doesn't
seem as interested in adopting some of the C++ paradigms that could make
optimization better, tools like formalized iterators and such.

There's been various attempts at pre-compiling the headers over the years, but
the results have always been, for various reasons, less than perfect.

------
khitchdee
If you ignore the protection mechanisms and the class heirarchy built into
C++, then a C++ class is like a C struct that can contain function pointers.
For many programs this is all that's needed, and the amount of overhead
involved in using such an approach to creating objects is obviously lower. So
there's no question C will always be faster. It's only when you need
protection and class heirarchies that C++ benefits you. That benefit is mainly
one of better code organisation.

~~~
nly
> a C++ class is like a C struct that can contain function pointers.

No, it's not. Calling an ordinary class member function in C++ has exactly the
same overhead as calling a function in C. Even virtual functions in C++ are
not the same as putting function pointers in a C struct (they live in a
separate data structure called the vtable).

> the protection mechanisms and the class heirarchy

All C++ protection mechanisms occur at compile time and have no runtime
overhead. Non-virtual inheritance hierarchies have the same overhead as C
struct composition (because under the covers the memory layout is the same).

~~~
khitchdee
Doesn't a vtable imply an extra level of indirection? You have to find where
the vtable is in the object, then the function within the vtable, right? Is
that not slower?

~~~
mcguire
Yes, although (cache behavior notwithstanding) it's a single pointer
indirection that can frequently be optimized away by the compiler.

~~~
khitchdee
My point is simply this -- adding protection mechanisms and inheritance to
classes neccesitates adding more complexity to the structure used to represent
them (such as a vtable) which does add performance overheads. If you dont need
those features, you can go leaner and faster with a C structure that includes
function pointers to give you the basic packaging of data and functions that
an object has.

