
Why should I have written ZeroMQ in C, not C++ (part II) - rumcajz
http://www.250bpm.com/blog:8
======
wheels
I stopped reading after the first example, which once again (as in the last
post in the series) demonstrated that the author doesn't really know what he's
talking about. The code would be written as:

    
    
      std::list<person>
    

Not:

    
    
      std::list<person *>
    

...which avoids the double malloc and presents a nicer API to boot. Not to
mention that in that case, a C++ programmer would either use a struct or would
need to write accessors.

C++ has a lot of warts, and the author seems to have written some non-trivial
software in it, but his critiques of it reveal that there are a lot of gaps in
his knowledge of the language and a lot of his criticisms are based on
falsehoods. (However, a totally valid criticism of C++ is the amount of
idiosyncrasies the language has and how relatively difficult it is to master.)

~~~
tptacek
I had to fight the urge to stop there too; it's also more idiomatic and
probably both more economic and performant to use a vector in this situation;
"lists" in C++, at least on the projects I worked on, tended to be a code
smell.

I think this is a very big problem with C++: most of the funnel of new
developers for C++ are C programmers, and transliterating C idioms to C++
(and, worse, the C++ standard library) usually produces pessimal C++ code.

~~~
itsboring
I've been putting myself through C++ "boot camp". I got a copy of Bjarne's
"The C++ Programming Language" and I'm doing every exercise. I'm up to 302 at
last count, and only a couple more chapters to go.

It's been grueling, but very well worth the time. It helps me avoid a lot of
mistakes I would have made had I just jumped in from C and C#.

Unfortunately, the 4th edition (which presumably covers C++11) doesn't come
out until Feb.

~~~
abhijat
I recently assigned myself a project to learn C++ too, mainly in the interest
of learning an entirely different kind of language (I only know python very
well) but also to scratch a long pending itch of writing a few simple games
and doing some graphics related work.

What I have found, while going through the C++ primer by Stanley Lippman and
Barbara Moo (and having read most of accelerated C++ earlier) is that I have
not so far seen anything that I really dislike.

Maybe this is because I am not really experienced and so cannot see the
obivous pitfalls, or maybe those things will come in later in the book. But so
far I see a language which I can use in many places.

Also how is Stroustroup's book for someone who finishes the C++ primer (which
goes over just the basics).

~~~
itsboring
I recommend Stroustroup's book as it really leaves no area unexplored. It is
better to read it after having read some other material, as you have. I would
wait for the 4th edition, though, which covers C++11.

------
justinsb
In my opinion, this is just wrong:

1) He uses std::list<person*>, but std::list<person> is the equivalent to his
C example, and would produce identical results to C, with much less code. It
would also be safer e.g. in terms of memory leaks.

2) Templates are the C++ magic, which get round some of those theoretical
limitations of OO programming. The compiler can evaluate the composition of
the various objects, to see if any optimizations are possible. With C, this
would be have to be done by hand. Yes, it's slower than compiling C, but it's
much faster than having a human do it.

3) Making use of #2, it is then easy to swap out the implementation. Want to
try a pooled memory allocator (because # of mallocs seems to be the axis we're
optimizing on): just change the declared type. Want to try a different data
structure (e.g. sort by age for fast lookups by age): just change the declared
type. The compiler "expands" the templates, substituting in the various types,
and produces code that is reasonably optimized. For example, it will use
inline integer comparison for sorting by age, rather than calling out to a
comparator function.

I'm sure there are valid reasons to preferring C to C++, but as a user of
ZeroMQ, I wish they would spend their efforts documenting their protocol
rather than bashing their tools.

~~~
VikingCoder
1) No, it does not produce identical results.

Most of the STD containers use copying, not pointers, as the article actually
points out.

That doesn't make his larger point valid (I disagree with it, actually), but
your argument is not technically correct.

~~~
justinsb
Things that look like they use copying in C++ often don't use copying in the
final output. Values are usually returned by reference, and many C++ compilers
can often avoid a copy even when that's not the case (e.g. Return Value
Optimization).

Even if the compiler doesn't optimize everything away, copying is sometimes
faster, always safer and always easier than passing around pointers.

It's a big lesson I learned for C++ - write correct code first; profile; fix
any real performance problems, rather than obsessing about my preconceived
notions of what would be slow.

~~~
VikingCoder
Yeah, I pretty much agree with everything you're saying, but trust me,
std::list really does work by copying the items, so your point was not
technically correct.

~~~
ArbitraryLimits
Well, if we're striving for technical correctness, std::list really works by
invoking the assingment operator on the items. The original article is
strictly correct when it says:

> EDIT: Assume that the object to be contained in the list is non-Assignable
> as is the case with any non-trivial objects, for example those holding large
> memory buffers, file descriptors, handles etc. If the object is Assignable
> simple std::list<person> would do and there is no problem.

Except that normal C++ programmers would deal with that by overriding the
assigment operator on this class to copy buffer pointers or whatever as
appropriate.

~~~
rumcajz
Try copying a complex object with threads running inside it, open file
descriptors being used etc. What does it mean, for example, to copy a running
DB engine? Some objects are just non-copyable by principle.

------
onedognight
> Rather it is a deficiency in the design of C++ language.

It's a deficiency of the C++ standard library, not the language itself.

You can find an implementation of intrusive lists in C++ in boost.
<http://www.boost.org/doc/libs/1_51_0/doc/html/intrusive.html>

~~~
icefox
100% agree. It almost feels insulting to say that C++ programmer will only use
std::list and a C programmer will only use a hand made doubly linked list.
This almost feels like an interview question: > Which is faster and uses less
memory? A generic double linked list or a hand made double linked list?

This is a data structure, optimization, software development problem. The
reason we use any 3rd party generic (in C OR C++) doubly linked list is
because it helps code the problem faster. Any time you use a component you
need to be aware of its overhead. After profiling my C++ code if I found that
the std::list was taking up all my memory and all my cpu time I would then
evaluate the algorithms I am using and pick the data structure that best fits
that be that a custom link list or something else such as the above mentioned
boost library.

You even have to nitpick the example about erase being expensive. When using
std::list why isn't the code passing around the iterators rather than the
person object? And for the memory of std::list overhead why didn't he use
std::list<person>?

There is no problem using a hand made C style doubly link list in your C++
code for the core part of your application for performance reasons. Being a
C++ program doesn't mean you can't use C style code or even have asm snippets.
Following the same logic as the blog ZeroMQ should have been written in ASM
and not C because it could have been faster, or the reverse it should have
been written in Ruby because it would have been coded in a quarter of the time
(but at the trade of slower runtime)

As for the conclusion I would say that the inefficiency is still in ZeroMQ
code as the author doesn't fully understand C++.

~~~
rumcajz
Why not pass iterators? Consider the case where the object is contained in two
lists. You would have to pass a pair of iterators. What if it is in 3 lists?
Etc.

~~~
icefox
Agreed, but for the example the object only needs to live in one list (as
shown by the fact that it could have been a C struct that contained the
next/prev pointers) so passing iterator would have been sufficient and would
have solved the O(n) problem(s?) he was having.

~~~
dkersten
Exactly, the C example could not be contained in more than one list at a time,
so if this is a concern, the C solution wouldn't work either.

------
kingkilr
Ugh, the two solutions to this problem he proposes are _totally_ different.
The C version allows a person to only be a member of a single list.

~~~
DHowett
In addition to that, he's effectively comparing apples and steaks in the case
of std::list.erase() vs linked-list removal. If you want O(1) removal of a
random element, you use a data structure that guarantees O(1) removal of a
random element (caveat: not all lists are created equal, and not all lists
have O(1) removal.)

That said, it's not even removal that's O(n) here, though he may characterize
it this way: it's finding the element to remove, and list searches are nearly
always[1] O(n) - even with doubly-linked lists.

[1]: Except in the case of CFArray/NSArray, which is occasionally a non-list
masquerading as an array - this is, of course, not a true exception as it's
not a true list. (<http://ridiculousfish.com/blog/posts/array.html>)

~~~
ori_b
> _If you want O(1) removal of a random element, you use a data structure that
> guarantees O(1) removal of a random element_

Like std::list.

It's guaranteed that insertion and removal is done in constant time. It does
mean that you need an iterator pointing to the element you want to insert in
front of ahead of time.

If you want to be more specific, it guarantees is that given a list of
elements to insert, the insertion and deletion will be linear in the number of
elements inserted or deleted (ie, independent of the size of the list being
inserted into; the insertion or removal of each value passed to insert() or
erase() is required to be O(1))

(That's one thing that I like about the C++ STL: It usually guarantees a
certain complexity for the data structures it provides)

------
maximilianburke
Anything you can do in C you can do in C++. It is not that the author couldn't
implement more memory efficient containers in C++ but that he chose not to
even though he was aware of the drawbacks.

------
agwa
To get constant-time erasure of list elements, the author should be referring
to the Person elements with a std::list<Person>::iterator instead of a Person*
. Passing an iterator to erase gives you constant-time erasure.

An iterator is equivalent to a pointer (* and -> get you a Person), but it
also lets the implementation access the linked list node.

------
vampirechicken
This is a well known trade-off - your data structures affect the design and
therefore run time profile of your algorithm.

A tightly coupled data structure is more efficient in time and memory, but
harder to reuse, and makes it harder modify the code. A more general data
structure is easier to reuse and makes it easier to modify the code, at the
expense of time and memory at run time.

We use a combination of Moore's Law, and the cost of a programmer to justify
the less efficient general solution in the near term, and explain that we'll
optimize hot spots later. This mostly works. The Author has found himself
lamenting this choice in piece of high-performance software.

If he has few lists, then coding the doubly-linked list pointer directly in to
his structs is the way to go. If he has many lists, I'd suggest using the cpp
or m4 (or some other templating tool) to statically generate doubly-linked
lists at complile time.

~~~
rumcajz
My point was that the trade-off is not something unavoidable. A better
language could allow you to syntactically decouple the tightly coupled
objects. See the example at the end of the article.

~~~
dkersten
_A better language could allow you to..._

And yet C is not this language.

~~~
mpyne
And even if it was, C++ is nearly a superset of C. Write the part in C-style
code which works best in C and then move on with your life...

------
pmr_
What strikes me is the assumption that C++ is an object-oriented language.
Yes, there is some support for the paradigm but it is neither great nor is it
the main focus of the language. Developers have come up with reusable
intrusive data-structure solutions, well knowing that they are bad for
encapsulation and they are widely accepted. Most C++ developers wouldn't even
shrug when you write a C-style list (in absence of a good library) precisely
for the reasons stated in his article.

~~~
tjaerv
This brings to mind Alan Kay's quip that "I made up the term 'object-
oriented', and I can tell you I did not have C++ in mind."

~~~
pmr_
I was thinking the same after re-reading my post. I always thought it was a
negative statement but it simply confirms what Stroustrup has always been
saying about C++: "It's a multiparadigm programming language." Surely the
support for one or the other paradigm could be better but this just reflects
that things are shifting away a little from pure object-orientedness across
languages.

------
aidenn0
Okay, I hate C++ as much as the next guy, but this is ridiculous. There are
dozens of ways to get similar code to what the author mentions.

So lets say a decent C++ programmer writes it with std::list<person *> then
does some profiling and determines that heap fragmentation is becoming an
issue. They can then refactor the code to use a more intrusive list type.

A crappy C++ programmer won't even know its a problem, but presumably we are
comparing programmers of similar expertise?

~~~
qdog
A relatively new C programmer would probably come up with the C code. From the
discussion of this topic it appears the C++ solution needs quite a bit more
expertise to get right (I would not have known what exact C++ method to use,
but I don't pretend to be a C++ expert at this point).

Hmm, actually I would expect most of the C++ methods not to have a problem
with heap fragmentation, if they are copying the list when manipulating. The C
one doesn't know when it will alloc a new bit of memory, so you might
potentially have a cache miss for every single node in the C list, making
traversal the worst case scenario.

------
greesil
Funny, I was going to say the reason to not use C++ was to make sure no one
starts bloating up your code size by using boost.

~~~
lttlrck
It might bloat up a code base, and the resulting binary (if you are using it
just for the sake of it), and increase the build-time (if you aren't
careful/don't know what you are doing), but it definitely doesn't bloat up the
size of the code you are writing...

------
ch
Having no experience with C++ the next statement will just further show my
ignorance of the language.

I was under the impression that an optimizing C++ compiler would be able to
inline the container object and the contained object into one when working
with template code, so that you would end up with something exactly like the C
version but without the manual bookeeping.

At least I thought it could do this for certain types of classes, much like an
optimizing C compiler can selectively inline functions based on heuristics.

~~~
megrimlock
You are on the right track here, but consider what the contained object is:

    
    
        std::list <person*> people;
    

You're right that the instantiated list entry (what the article calls
"helper") will directly include a value, but the value here is a pointer to a
person, not a person.

~~~
zanny
There is nothing stopping him from having a list of person and avoiding the
memory overhead of references though.

------
mkhalil
For a person with OCD, this title really bothers me :/ I want it to say "Why I
should have written ZeroMQ in C, not C++ (part II)".

------
jbert
If you always reference the person in it's people list (i.e. the object is
"owned" by the list), you never need to pass around a ptr to the object of
type:

    
    
        person *
    

Instead you can pass an list iterator which references the "person in the
list" of type

    
    
        std::list<person *>::iterator
    

Then you don't need to do an O(n) walk of the list to find the entry for
erasure, you can just people.erase(it) directly.

~~~
rumcajz
I've modified the text to address this comment:

EDIT: A lot of people point out that iterator should be used instead of
pointer. However, imagine the object is contained in 10 different lists. You
would have to pass structure containing 10 iterators around instead of the
pointer. Morever, it doesn't solve the encapsulation problem, just moves it
elsewhere. Instead of modifying "person" object every time you would want to
add it to a new type of container you would have to modify the iterator tuple
structure.

~~~
jbert
If the object is contained in 10 different lists, wouldn't the C version have
20 ptrs (10x prev and next)? That seems to me to be roughly equivalent
complexity to your proposed "10x iterator" in C++, if I've understood
correctly.

Basically, something needs to track your references if you have multiple
lists. With C prev/next ptrs, the tracking is explicit in the object (well,
the adjacent objects in the lists, to be more precise). With C++ containers,
the tracking is iterator based.

With C++ you also have the option of using a smart pointer to do your usage
tracking, which is probably simpler than either of the two approaches.

~~~
rumcajz
Yep. That's the way it is done in C. See Linux kernel for example. No problem
with that as there's no expectation of well-encapsulated objects in C.

~~~
dkersten
So you're saying one should use C over C++ because C++ doesn't provide an easy
way do X and yet C doesn't even try? You seem to be complaining about a number
of shortcomings in C++ and then preach that C is better for this even though
the solutions you write in C have all the same shortcomings and more (not
well-encapsulated). You're comparing apples to oranges. I don't quite
understand the logic here at all.

The beauty of C++ is that you can use the features that you want and ignore
the rest (and only pay for what you use). There is nothing stopping you from
writing certain parts of your codebase in C-like ways where it makes sense and
still benefit from other C++ language features.

~~~
rumcajz
Yes. For me the biggest added value of an OO model is the encapsulation. If a
language is not capable of delivering it why use it at all?

------
kombine
Incapsulation is a very questionable concept as well. However what C++ does
provide is the generic programming: in C you will have to copy and paste
implementation of List for every data structure as well as the algorithms. As
suggested in the comment there are no inherent deficiencies in C++ and there
is an implementation of intrusive lists in Boost. You don't have to use
incapsulation if you don't want to.

~~~
bunderbunder
_Incapsulation is a very questionable concept as well._

One can live without it on a smaller scale. But for anything that will have
any lifetime whatsoever it's absolutely essential. Distinguishing between a
module's intended interface and its implementation details is what makes it
possible to amend the implementation should the need arise. Without proper
encapsulation every change must be assumed to be a breaking change, software
maintenance costs go through the roof, module interactions become
unpredictable, and that upstart kid who wears Chucks to work but knows how to
write testable code eats your Wheaties.

~~~
kombine
Well, incapsulating algorithms is great. But incapsulating of the data is not
always. Languages like Haskell completely go away without it.

~~~
heretohelp
Incapsulating, in this context, isn't a word.

You want encapsulate. Encapsulation. Encapsulating.

The "en" prefix means put into or on something. You're saying you're putting
something into a capsule or "isolating it".

The "in" prefix in English usually is the negative, or "not".

Intolerable: not tolerable.

~~~
kombine
Thanks for correction. I am not a native English speaker and in Russian(my
native language) the word Encapsulation starts with Cyrillic "и" to which the
closest analogue in Latin is "i" - hence the confusion!

~~~
heretohelp
> I am not a native English speaker and in Russian(my native language) the
> word Encapsulation starts with Cyrillic "и" to which the closest analogue in
> Latin is "i" - hence the confusion!

I figured, I'm a native English speaker learning Russian, as it happens.

Будем здоровы

------
fein
While I am an advocate of straight C programming, I wonder if his issues with
his person list in c++ vs a person dlink list implementation was an incorrect
comparison.

stl vector would have been the proper "optimized" way of implementing that
list of people in c++, as vectors don't deal with the heap. If your vector's
memory space can fit entirely in L2, we're looking at IMMENSE performance
increases.

~~~
tptacek
vectors don't deal with the heap? Huh? Yes they do.

~~~
astrodust
I think the comment was a bad phrasing of "a vector is a contiguous memory
structure, whereas a list is a series of independent allocations at various
points in the heap".

~~~
tptacek
Sure, but also remember that while vectors are better at locality, both
vectors and lists can trash the heap; both are making variable sized requests
on demand from the allocator.

~~~
tedunangst
Every time I've had to deal with fragmentation, it was due to a small number
of _large_ objects. Never large numbers of small objects. ymmv.

~~~
tptacek
Very definitely not my experience.

~~~
tedunangst
Didn't mean to imply it was always the case. Just a counterpoint to the
conventional wisdom that fragmentation is only a problem with lots of tiny
objects.

~~~
tptacek
Sorry, we always sound snippy on message boards. I'm not That Guy in real
life. Wait no I totally am.

------
X-Istence
What happened to using:

    
    
      mylist->remove(object);
    

Let the underlying algorithm take care of it. No need to write out the loop.
Also, this makes it easy to then switch to std::vector or a hash based list or
something else where walking the list may not be required.

------
YZF
std::list (used correctly) is IMHO a great example of why you should write in
C++, not in C.

Most large C projects I've seen end up with multiple implementations of lists.
You will have some combination of: void* for a generic list, performance
issues because everything is a function call to another C file which won't get
inlined, memory leaks, subtle bugs, thread safety issues because it's often
unclear what guarantees the homebrew version provides.

Do you want to spend your time on reinventing the list or on things that
provide value?

(Edited for formatting)

~~~
mangamadaiyan
Hrm. Check out sys/queue.h in any BSD (Linux too - though Linux kernel code
IIRC uses a different list implementation). You don't necessarily need void *
for a generic list, and you most certainly don't have to reinvent the wheel.

~~~
YZF
queue.h is an improvement. You still need to somehow know this is in BSD and
you'll need to copy it to build on different systems.

The thing is that in C, most people do reinvent the wheel, including the
person in the original article and as you say, the Linux kernel. That's been
my experience.

std::list is standard and cleaner IMO.

------
ChristianMarks
The level of disagreement and misunderstanding surrounding C++, its typing and
implementation is a sign that the language and its community openly welcome
extraneous cognitive load. Or its a sign of my own confirmation bias or both
(or neither). Other languages have their quirks as well, but C++ stands out
for me. I haven't done a study--perhaps there are studies on this, so my
opinion amounts to so much line noise.

------
jbp
At least for 2nd problem mentioned wouldn't

    
    
      std::unordered_map<person *, person *> 
    

make sense?

------
akkartik
I'm amazed nobody is talking about his 'private in' idea.

~~~
tedunangst
Perhaps someone could explain how it differs and compares to the friend
keyword.

~~~
akkartik
It has more to do with layout than with access control. Fields belonging to
one class get stored and laid out with another object. They're accessed as
part of the other object, but the locks they acquire are for the original
object.

I see faint echoes of ruby's open classes if I squint..

