Ask HN: Good C++ code bases to read?

artagnon · on Oct 28, 2020

Here are a couple of expertly-written C++17/C++20 repositories:

https://github.com/hanickadot/compile-time-regular-expressio...

If you've not written C++ code before, it can take a while to catch up with the latest developments in C++23. Start with C, and learn these, in approximately the specified order:

1. lvalue references.

2. Constructors, destructors, and inheritance.

3. Annotations such as const and noexcept on members.

4. Simple type templates, and value templates.

5. constexpr, std::move, and rvalue references.

6. Type traits and std::enable_if.

7. Concepts.

Once you learn the core language features, learning the various data structures/algorithms in `std` should just be a matter of looking them up in cppreference, and using them over and over again.

Good luck.

ChuckMcM · on Oct 28, 2020

This is excellent. FWIW I bought and read "C++ Crash Course" by Lospinso. I found that tremendously helpful with his approach of "assuming you really know C [I do], here is C++" and can easily recommend his book for anyone on this path.

That said, I like your ordering too.

The other confounding factor is that I have also written a lot of Java code (both during the development of the language and afterward for my column in Javaworld) which I really enjoyed, but it too "fixes" some things which James disliked about C++ (and as a long time C programmer I understood quite well).

As a learning experience, this journey has been confounding, enlightening, and painful in equal measures :-)

ncmncm · on Oct 28, 2020

People who come to C++ from Java frequently bring with them Java Disease.

If you find yourself reaching for std::shared_ptr and virtual functions more often than once in a blue moon, you are suffering from Java Disease, and are writing bad code.

Anything that seems to be something Java "fixed" is a thing they failed to understand. You will have to unlearn those, along with myriad bad C habits, many of which you don't know you have. It takes sharp observation to catch them.

ChuckMcM · on Oct 28, 2020

Okay, if somewhat harsh :-).

As someone with north of 15 programming languages that I consider myself "fluent" in, albeit some latency involved in ones I've not used for a while, I always look for the idiomatic expression of computation. It is the difference between "writing FORTRAN in <x>, and writing an algorithm in <x>."

One of main the reasons I posted the question is that I found that design patterns I expected to be there, were not there, or were not idiomatic, and for whatever reason I am reasonably good at figuring out when something is "not quite right."

That said, I think it is unfair to characterize the choices of any language designer as either ignorant or wrong. As someone who participated in the design of Java I can assure you that the team "understood" why C++ did what it did. During that development, if there was any confusion, Guy Steele, whose office was three down from mine, would eloquently explain it so that everyone could understand it. That is not to say there weren't different opinions about "better" ways and "worse" ways to do things, Bill Joy was never a big fan of the Java choices and often took the opposing view across from Gosling. The language that Bill was designing was called 'Self' and it embodied his design choices, just as Java embodied Jame's design choices.

Me, as a systems guy, could really care less if Boolean was a first class type or just a uint1. I just want to write code that works.

ncmncm · on Oct 28, 2020

I can assure you that anybody who makes members virtual by default does not understand the feature, even if that body is named Steele.

What was much worse was copying in wholesale the C crap that C++ had been obliged to adopt to maintain backward compatibility with C, when Java faced no such constraint.

ChuckMcM · on Oct 28, 2020

I appreciate that you are consistent.

Have you considered self ? https://selflanguage.org/

ncmncm · on Oct 29, 2020

I visited Craig Chambers at Stanford in the 80s... I never heard of Bill Joy having anything to do with Self.

ChuckMcM · on Oct 29, 2020

Cool, from the self web page -- "The project continued at Sun Microsystems Laboratories until 1995, where it benefited from the efforts of Randall B. Smith, Mario Wolczko, John Maloney, and Lars Bak."

Java was released in March of 1995, about 1/3 of the original Java group left after the 1.0 release (not in a small part because of Sun's attitude). I made the mistake when I went to work at Google of disparaging Self (which Urs Hoezle had worked on and was my boss's boss, oops!) But much of the original Java team had ended up there so it was kind of a fun to catch up.

So when you visited Craig at Stanford did you get along with him?

ncmncm · on Oct 29, 2020

Seems like 30 years ago. (Because it was?) He was busy, I was an interruption. But he was gracious abt it. I recall I was a lot more interested in tail-call elimination, at the time, than he was.

Self was seminal in two ways: prototypes instead of classes influenced Javascript, and code splitting presaged perhaps the most important JIT technique.

We might have been better off today if Eich had just picked up Self instead of throwing together Javascript. But honestly I don't know either well enough to say.

nocman · on Oct 29, 2020

> I can assure you that anybody who makes members virtual by default does not understand the feature, even if that body is named Steele.

You may be reviled by that design decision, and the ramifications thereof, but I'd put my money on Guy Steele having a sharp grasp of the feature, the alternative design decisions, and the ramifications of most possibilities. If I were designing a language, and could pick any of a handful of people to help with the design, Guy would be near the top of the list, if nothing more that for his ability to explain things clearly and succinctly.

ncmncm · on Oct 29, 2020

I can imagine Steele advising against it, and Gosling ignoring him and doing it anyway. I earnestly hope that is what happened.

I cannot account for Java having retained so many misfeatures of C despite having no backward-compatibility constraint, other than that nobody was paying any attention.

jhgb · on Oct 29, 2020

> anybody who makes members virtual by default does not understand the feature

Which is hilarious coming from C++ people who blindingly obviously didn't understand the features of Smalltalk.

ncmncm · on Oct 29, 2020

C++ is not cribbed from Smalltalk. Smalltalk is cribbed from Bjarne's thesis adviser's language, Simula. So, any differences you find are direct results of Smalltalk departing from Simula.

gautamdivgi · on Oct 28, 2020

As a sibling comment mentioned - somewhat harsh. I can relate to it. I've seen c++ code where they created an entire class hierarchy from "Object".

lillecarl · on Oct 29, 2020

Well this is what Qt does as far as my understanding goes. If you write Qt applications everything to be used in Qt must derive from QObject. I wouldn't call Qt bad because of that. Sure there's overhead with the vtables but that isn't a problem for everyone.

majewsky · on Oct 29, 2020

QObject is not a base class for everything like in those codebases GP probably thinks about. It's a base class for a very specific set of functionality that can be very roughly summarized as "the thing wants to participate in the signal-slot communication scheme". For example, QFile is a QObject because it has ready-to-read signals etc.

But there are also tons of classes in Qt that are not QObject, esp. all the value classes like QList or QPoint or QString.

If anything, the biggest sin of QObject might be its name. If it had been named QActor or something like that, people wouldn't mind that much.

ncmncm · on Oct 29, 2020

Qt was first released when C++ had no more organizational features than Java, so it gets a free pass.

QObject is not Qt's greatest sin. That would be its lower-case macros: "slots" etc. There can be no excuse for that. Even C coders knew better, even back then.

greysphere · on Oct 28, 2020

Welcome to modern c++, where everything is a header file and compile times don't matter.

higerordermap · on Oct 29, 2020

It's templates all the way down, until you meet specializations.

cpdean · on Oct 29, 2020

BearOso · on Oct 29, 2020

They are expertly-written, but I don’t think these header-only libraries are reflective of most C++ projects. There’s too many tricks and templates that make it hard to read and get a big picture of how it works.

giancarlostoro · on Oct 28, 2020

The JSON library is impressive:

https://github.com/nlohmann/json#examples

jeffreygoesto · on Oct 28, 2020

I agree with your list. However, I don't find the first library you referenced readable (i.e. https://github.com/hanickadot/compile-time-regular-expressio...), and I am a C++ user since the nineties.

artagnon · on Oct 28, 2020

What seems to be the problem with CTRE?

jeffreygoesto · on Nov 1, 2020

Expressing an algorithm at compile time is so very different from runtime when you do it in template metaprogramming. There is so much visual noise (for my personal feeling at least), writing such code is easier than reading and understanding. Nothing wrong with that special case of regular expressions here, more a general observation.

constexpr seems to try and step in, but there still is a large codebase around which is hard to maintain and extend. IMHO D and zig show that it can be better to have compile- and runtime-syntax much closer together.

ncmncm · on Oct 28, 2020

That they are regular expressions.

choppaface · on Oct 28, 2020

If you're new to C++11, Captain Proto (particularly the kj library embedded within it-- src/kj ) is a great read: https://github.com/capnproto/capnproto kj semi-re-implements several core C++11 features like Own and Maybe (actually Maybe / std::optional is still pretty new!) https://github.com/capnproto/capnproto/blob/master/c%2B%2B/s... Why did Kenton do this? He can speak for himself, but the core of Captain Proto is /roughly/ like a serializable / portable memory arena, so it was necessary for the design. Reading through kj and _comparing_ it with C++11 will give you some great initial insight into why both are implemented the ways they are. I'm not really advocating you use kj directly or adopt things like capnp's unique comment style, but the codebase is nevertheless very well organized and clear.

Some of the older glog code is pretty nice with regards to a very vanilla and portable treatment of macros https://github.com/google/glog/tree/master/src/glog

While I wouldn't necessarily recommend Boost as a model project / repo ( https://github.com/boostorg ), it's worth checking out to help understand why modern decisions were made the way they were.

kentonv · on Oct 28, 2020

Oh hello. Thanks for the kind words.

> Why did Kenton do this? He can speak for himself,

An incomplete list of reasons:

1) At the time I started the project, a lot of things that KJ replaces, like std::optional, didn't actually exist in the C++ standard yet.

2) A lot of the stuff in the standard library is just badly designed. Take std::optional, for instance. You'd think that the whole point of using std::optional instead of a pointer would be to force you to check for null. Unfortunately, std::optional implements operator* and operator-> which are UB if the optional is null -- that's even worse than the situation with pointers, where at least a null pointer dereference will reliably crash. KJ's Maybe is designed to force you to use the KJ_IF_MAYBE() macro to unwrap it, which forces you to think about the null case.

3) A lot of older stuff in the C++ standard library hasn't aged well with the introduction of C++11, or was just awful in the first place (iostream). C++11 really changed the language, and KJ was designed entirely with those changes in mind.

4) The KJ style guide (https://github.com/capnproto/capnproto/blob/master/style-gui...) adopts some particular rules around the use of const to help enforce thread safety, the specific philosophy around exceptions, and some other things, which differ from the C++ standard library's design philosophies. KJ's rules have worked out pretty well in my experience, but they break down when building on an underlying toolkit that doesn't follow them.

5) This is a silly matter of taste, but I just can't stand the fact that type names are indistinguishable from variable names in C++ standard style.

6) Because it was fun.

Do these reasons add up to a good argument for reinventing the wheel? I dunno. I think it has worked well for me but smart people can certainly disagree.

_dh54 · on Oct 28, 2020

> Unfortunately, std::optional implements operator* and operator-> which are UB if the optional is null -- that's even worse than the situation with pointers,

Dereferencing null optionals is UB for consistency with dereferencing pointers. All uses of operator* should have the same semantics and the C++ standards committee did the right thing by ensuring that with optionals. Checking for null in operator* would break consistency.

If you want to dereference an optional that may be null, use the .value_or() method. For the times when you absolutely know the optional has a value use operator*.

If you’re wondering why you would use an optional over a pointer. The idea is that optionals allow you to pass optional data by value. Previously if you wanted to pass optional data, you’d have to do it by reference with a pointer. This is part of c++’s push towards a value-based style, which is more amenable to optimization and more efficient in general for small structs (avoiding the heap, direct access of data). Move semantics are a part of that same push.

kentonv · on Oct 28, 2020

> The idea is that optionals allow you to pass optional data by value.

Yes, and kj::Maybe was doing the same before std::optional was standardized.

It's disappointing that the committee chose only to solve this problem while not also solving the problem of forgetting to check for null -- often called "the billion-dollar mistake".

> Dereferencing null optionals is UB for consistency with dereferencing pointers. All uses of operator* should have the same semantics

My argument is that `std::optional` should not have an operator* at all. `kj::Maybe` does not have operator* nor `operator->`.

> If you want to dereference an optional that may be null, use the .value_or() method. For the times when you absolutely know the optional has a value use operator*.

This is putting a lot of cognitive load on the developer. They must remember which of their variables are optionals, in order to remember whether they need to check for nullness. The fact that they use the same syntax to dereference makes it very easy to get wrong. This is especilaly true in modern IDEs where developers may be relying on auto-complete. If I don't remember the type of `foo`, I'm likely to write `foo->` and look at the list of auto-complete options, then choose one, without ever realizing that `foo` is an optional that needs to be checked for null.

In KJ, you MUST write:

    KJ_IF_MAYBE(value, maybeValue) {
      use(value);
    } else {
      handleNull();
    }

Or if you're really sure the maybe is non-null, you can write:

    use(KJ_ASSERT_NONNULL(maybeValue));

This does a runtime check and throws an exception if the value is null. But more importantly, it makes it really clear to both the writer and the reader that as assumption is being made.

jcelerier · on Oct 28, 2020

> It's disappointing that the committee chose only to solve this problem while not also solving the problem of forgetting to check for null -- often called "the billion-dollar mistake".

it's likely ~30 loc to wrap std::optional in your own type that checks for null. if std::optional checked for null and these `if` branch showed up as taking the couple nanoseconds that make you go past your time budget in your real-time system (especially if people were doing things like if(b) { foo(b); bar(b); baz(*b); }) then you have to reimplement the whole of it instead.

Don't forget that you can still use C++ on 8mhz microcontrollers.

kentonv · on Oct 28, 2020

Again, I'm not arguing that operator* should check for nullness, I'm arguing that it shouldn't exist.

With `kj::Maybe` and `KJ_IF_MAYBE`, using the syntax I demonstrated above, you check for nullness once and as a result of the check you get a direct pointer to the underlying value (if it is non-null), which you then use for subsequent access, so you don't end up repeating the check. So, you get the best of both worlds.

> it's likely ~30 loc to wrap std::optional

It's even easier to replace std::optional rather than wrap it. The value of std::optional is that it's a standard that one would hope would be used across libraries. But because it's flawed, people like me end up writing their own instead.

jcelerier · on Oct 29, 2020

> So, you get the best of both worlds.

I would really not call code that looks like this "best of both worlds"

    KJ_IF_MAYBE(value, maybeValue) {
      use(value);
    } else {
      handleNull();
    }

when compared to

    if(!value) 
      handleNull();
    use(*value);

kentonv · on Oct 30, 2020

Kind of unfair that you didn't include the `else` or the braces in your version, just to make it look shorter.

monocasa · on Oct 28, 2020

The ideal situation is then to not use std::optional for those cases rather than to make std::optional next to useless for it's stated case.

If it gets in the way of your goal on an 8Mhz controller, take the optionality check out of your tight loop and convert to a null pointer safely where it doesn't matter.

Deeply embedded is already used to picking and choosing features, or explicitly running with the bumper rails off in the subset of cases where it matters. We like the normal primitives not being neutered for us because we still use a lot of them outside of our tight loops.

_dh54 · on Oct 28, 2020

> than to make std::optional next to useless for it's stated case.

This is such a ridiculous and obviously false assertion that it’s indistinguishable from satire. Optional is widely used and was modeled from a pre-existing boost class which was itself widely used. Do you actually write C++ professionally?

monocasa · on Oct 28, 2020

Yes.

And I think (hope?) that std::optional is going to make it's way into the dust bins of history like auto_ptr.

When the answer is "just don't deference it if it's null", then why not just use a pointer in the first place?

_dh54 · on Oct 29, 2020

Again, optional is different from pointer because it offers value semantics.

monocasa · on Oct 29, 2020

I know they're different (and the construct unfortunately named optional has some occasional uses); it's just that the semantics of optional don't help it be used as a classic optional type.

jcelerier · on Oct 29, 2020

> it's just that the semantics of optional don't help it be used as a classic optional type.

what do you mean "classic optional type"? boost.optional has worked like that for something like 20 years - it's been in C++ for longer than Maybe has been in Haskell.

tome · on Oct 29, 2020

> it's been in C++ for longer than Maybe has been in Haskell.

Tangentially, how did you conclude that? Haskell has around since 1990 but boost only since 1999, as far as I can tell.

https://en.wikipedia.org/wiki/Haskell_(programming_language)

https://en.wikipedia.org/wiki/Boost_(C%2B%2B_libraries)

monocasa · on Oct 29, 2020

It's been in Standard ML since the 80s.

_dh54 · on Oct 28, 2020

> My argument is that `std::optional` should not have an operator* at all. `kj::Maybe` does not have operator* nor `operator->`.

That’s fair.

>> If you want to dereference an optional that may be null, use the .value_or() method. For the times when you absolutely know the optional has a value use operator

> This is putting a lot of cognitive load on the developer. They must remember which of their variables are optionals,

Not really. Teams with programmers that are bad at keeping track of the state of their variables can simply have a policy to always use .value_or()/.value()

C++ doesn’t impose this on its users because it generally assumes its users are responsible enough to make their own policy.

> The fact that they use the same syntax to dereference makes it very easy to get wrong.

I disagree, the operator* has the same semantics as pointers did, making it no more semantically hazardous. There exists other methods on optional that have the behavior you want.

kentonv · on Oct 28, 2020

> always use .value_or()/.value()

But neither of these solve the problem either. Neither one forces the programmer to really confront the possibility of nullness and write the appropriate if/else block. Throwing an exception rather than crashing is only a slight improvement IMO.

> I disagree, the operator* has the same semantics as pointers did, making it no more semantically hazardous.

It was already severely hazardous with pointers, that's the problem.

pierrebai · on Oct 28, 2020

Both problem could have been solve looooooong ago by introducing a type modifier akin to const that carries if a value is verified (or safe or non-null or other. Pick your synonym).

   int * p; // maybe null!
   int * verified p; // guaranteed non-null!

A looooong time ago (circa... 1994-1995) I designed a hierarchy of smart pointers and had a variety for non-null so that you could declare a function like:

   void foo(non_null_ptr<T>& p);

And know that you don't have to verify for null. All enforced at compile-time. (via the a function on ptr<T> returning a non_null_ptr<T>).

With language support around if() and others, C++ could have mde it even more convenient. Even C could have introduced such a tyupe modifier. Whenever I read about pointers being unsafe and how optionals and maybes are the solution, I roll my eyes, because non-null-ptr do the exact same thing.

The funny thing is C++ has a non-null ptr (with no language support guarantee though): references. Unfortunately, the language made them not resettable, which makes them unusable in many scenario when you'd want them to change value over time, like in most classes members.

_slyo · on Oct 28, 2020

Are you aware of std::reference_wrapper? https://en.cppreference.com/w/cpp/utility/functional/referen...

_dh54 · on Oct 28, 2020

Isn’t “non null optional” the same as just passing the base type?

pierrebai · on Oct 28, 2020

By reference? Yes.

But the idea of a verified type can be extended by using the verified modifier on your own type. For example, you could have a verified matrix type, where the matrix is guaranteed to be valid, non-degenerate. You can apply it to:

   - matrix
   - vector
   - input data of any sort

And if teh compiler allowed the programmer to declare their own type modifier, the world is your oyster: you could for example tag that a matrix is a world matrix while another a local matrix and provide a function that converts from one to the other...

I wrote a small blog post about the idea:

https://www.spiria.com/en/blog/desktop-software/hypothetical...

_dh54 · on Oct 28, 2020

>> always use .value_or()/.value()

> But neither of these solve the problem either. Neither one forces the programmer to really confront the possibility of nullness and write the appropriate if/else block.

.value_or() actually does and you can certainly add a lint check against dereferencing optional or using .value() if you’d like. C++ does not Yet provide Case-style syntax for handling variants like rust, outside of macros and the standard library will certainly not define macros.

I think what you have done for your codebase makes sense based on your preferences but I think the standard optional works pretty well given the variety of code based and styles it’s intended to support.

> It was already severely hazardous with pointers, that's the problem.

So then don’t use the dereference operator.

kentonv · on Oct 28, 2020

kj::Maybe has an `orDefault()` method that is like `.value_or()` but I find that it is almost never useful. You almost always want to execute different logic in the null case, rather than treat it as some default value.

_dh54 · on Oct 29, 2020

Then you can make a quick helper subroutine that adds a monadic interface to optional and you can lint away all non-conforming uses of optional.

ori_b · on Oct 28, 2020

> Dereferencing null optionals is UB for consistency with dereferencing pointers

The point of optional is to avoid being consistent with the bad parts of pointers. And making it undefined rather than a guaranteed crash is even crazier.

Ford doesn't sell cars that burst into flames for consistency with the Pinto.

_dh54 · on Oct 28, 2020

And usage of the dereference operator isn’t intended for uses that would cause things to burst into flames. If you don’t know the state of your variables or you don’t trust your coworkers to know the state of their variables, you can enforce the use of value_or() in your own projects. You don’t get to force superfluous branch stalls on the general C++ user base.

emtel · on Oct 28, 2020

I think your replies in this thread show a complete misunderstanding of what std::optional is for (or at least, what it should be for, in my opinion).

std::optional is for modeling a value that may be null. If the value may be null then you must check if it is null before you dereference it. There is no "forcing of branch stalls", because if used correctly (and designed correctly, which std::optional is not, sadly) it is merely a way for the programmer to use the type system to enforce the use of null checks that are necessary for the correctness of the program anyway.

If you and your coworkers find yourself in a situation in which you know that the value of a pointer cannot be null, then you should not model that value with an optional type, and then you will not only not be required to add a null check, it will be immediately obvious that you don't need one.

_dh54 · on Oct 29, 2020

> If you and your coworkers find yourself in a situation in which you know that the value of a pointer cannot be null, then you should not model that value with an optional type, and then you will not only not be required to add a null check, it will be immediately obvious that you don't need one.

Hmm I think you’re suffering from a lack of imagination and real world experience with efficiently storing data in C++.

There are certainly cases where it makes the most sense to instantiate your value within an optional wrapper while at the same time there being instances within your codebase where that location is known to be non-null. I’m surprised I even have to say that.

An obvious case is when using optional as a global. Other cases are when you’ve checked the optional once at the top of a basic block to be used multiple times after.

emtel · on Oct 30, 2020

> An obvious case is when using optional as a global.

Well, ok, although I think we were doing fine storing such values in unique_ptr. Now you're going to come back and say that you can't ever afford a double indirection when accessing globals, and if so, fine. But you still could have very easily written your own wrapper that suits your needs without demanding that std::optional be relaxed to the point where it cannot provide compile-time safety guarantees.

> Other cases are when you’ve checked the optional once at the top of a basic block to be used multiple times after.

Disagree. The way optional types are supposed to work (and the way I have used them in real code) is that you check it once, and in doing so, you obtain a reference to the stored value (assuming it the check passes). Further accesses to the value thus do not require checks. The type system is thus used to model whether the check has been done or not, and helps you write code that does the minimal number of checks required.

You seem to think everyone else in this thread is an idiot, but I promise you I have written real code with very strict optional types (similar to kj::Maybe) without introducing unnecessary branches.

beached_whale · on Oct 28, 2020

optional has a bunch of reason's why it is better than a pointer too(it cannot be incremented, it's not a failure to initialize, it doesn't implicitly convert to things readily...). Unfortunately we don't have a monadic optional or an optional with a reference member. Both would be very useful. Value or suffers from having to evaluate the or part in all cases, but if we had .transform/.and_then/.or_else members would be really nice. Optional of a reference would allow for a whole swath of try_ methods instead of the idiomatic at( ) interface for checking bounds and retrieving in one call. at( ) suffers that it forces the choice of an exception or checking the index for being in range outside the call and then operator[] is what you want.

_dh54 · on Oct 28, 2020

Some monadic method would be nice but as you probably know, it’s trivial to implement one yourself.

You can store a reference in optional using reference_wrapper https://en.cppreference.com/w/cpp/utility/functional/referen...

beached_whale · on Oct 29, 2020

reference wrapper breaks in generic contexts.

_dh54 · on Oct 29, 2020

Could you provide an example?

clappski · on Oct 29, 2020

You have to add special handling for the case that T == std::reference_wrapper<T> so you can call .get() on it to expose the underlying value. In the case of std::optional vs a pointer type (raw or smart) you can consistently use operator* to get to the underlying value. I think this is what was meant.

beached_whale · on Oct 29, 2020

also if you are doing something like decltype( *val ) to get at the underlying type.

gpderetta · on Oct 28, 2020

well, representing an optional as a pointer, while nice syntactic sugar, is itself a design decision and adding UB there is not nice.

In practice I believe that all standard libraries have a lightweight check mode that can assert even in release mode.

choppaface · on Oct 29, 2020

Wow thanks for the extra context here! My read of capnp was that you probably couldn't write capnp with std::unique_ptr and STL streams as-is (or relying on STL would make it really hard), and thus the capnp design necessitates the core of kj, and once you start there you have to add most of the other stuff in kj. I do very firmly agree that C++11 had holes in either features or support when capnp first rolled out, though C++ has caught up over the years.

I still think if you wanted to re-write a capnp library today, you'd still need kj, or at least most of it, simply for the memory control. The added benefit of kj is that you don't have to deal with C++ STL bugs and quirks. E.g. I believe C++ spec didn't require std::optional to use in-place new until recently ...

Also curious if you have any comments on this read of kj from a software management perspective. I imagine trying to sell the investment of writing something like kj at a company and it being a tough sell, even if capnp was approved. You clearly knew what you were doing from the outset and certainly nobody could have done it better. But I believe capnp sits close to the decision boundary of where many companies decide to invest in greatness or not, and reflection might shed light on why some managers make the wrong choice on something like this.

kentonv · on Oct 30, 2020

No, Cap'n Proto does not rely on memory layout of KJ types or anything like that. You could build it on the standard library approximately just as easily.

In 2013, the C++ standard library was sorely missing a string representation that allowed pointing into an existing buffer, which was important for Cap'n Proto to claim zero-copy. C++17 introduced std::string_view, which would fit the bill, but that wasn't there in 2013, so I wrote my own StringPtr. I added Maybe because I needed it all over the place (and std::optional didn't exist), and then for RPC I needed to create the Promise framework (std::future and such didn't exist yet). At that point I looked at these things and said "these are generally useful outside of Cap'n Proto, I should turn them into a library", and that's how KJ got started.

> I imagine trying to sell the investment of writing something like kj at a company and it being a tough sell

Well, many companies have things "like KJ". Google has Abseil, Facebook has Folly. But just like KJ, these things didn't start with someone saying "Hey I want to make a new C++ toolkit", they started with people writing utility code that they needed for other things, and then splitting that code out to make it reusable. Eventually the utilities accumulate into their own ecosystem. I generally don't add anything to KJ unless I explicitly need it for something else I'm working on. I actually would argue that it would be a bad business decision to spin up a project to create something like KJ or Abseil or Folly from scratch; such projects are likely to spend too much time solving the wrong problems. The best toolkits and platforms come from projects that were really trying to build something on top of that platform, and let their own real-world problems drive the design.

That said, arguably, Cap'n Proto itself is a bit of a counterpoint. I started Cap'n Proto after quitting Google. I did not have any real business purpose, I just wanted to play around with zero-copy, which I'd thought about a lot while working on Protobuf. That said, I did have the previous experience of maintaining Protobuf at Google for several years, which meant I already had a pretty good idea of what the real-world problems looked like, and I stuck pretty closely to Protobuf's design decisions in most ways. And then starting in 2014, I started working on Sandstorm, build on top of Cap'n Proto, and further development was driven by needs there. (And since 2017, Cloudflare Workers has been the main driver.)

I am not sure if the time I spent starting Cap'n Proto in 2013 would have made sense from a business perspective. If I'd wanted to start Sandstorm immediately, building on Protobuf would probably have been the right answer.

I would say that low-level developer tooling in general is pretty tough to make a business out of, because everyone expects it to be free and open source. It's also pretty tough to build as part of another business, because usually creating something new from scratch doesn't justify the cost, vs. using something off the shelf. I feel like the only people who can create new fundamental tools from scratch (especially things like programming languages) are giant companies like Google, and random hackers who are lucky enough to be able to mess around without funding.

Sorry, that probably isn't the answer you were looking for. I don't like it either. :/

choppaface · on Oct 30, 2020

No this is super helpful thank you!

Agree with you about StringPtr and string_view; also std::future; std::optional was not there and also not in-place new for a while at the start I think; lastly, I'm pretty sure unique_ptr would have been a headache over Own. I didn't really mean to suggest capnp relied on memory layout of kj types (and agree it doesnt) but rather I believe even today you'd be very hard pressed to get 100% zero-copy out of the STL.

Abseil and Folly are a lot lot bigger than KJ (folly is more of a playground), and I totally agree they are an amalgamation of utility code at team scale. KJ, though, had mainly only one author though, and it seems I got it right that capnp wouldn't be possible with the STL (at least when it started).

Wasn't so much trying to poke at the question of "does the business say KJ/capnp is necessary?" -- I agree with you that posed that way it can be hard to get a good answer.

More like: how is it best to scope out something on the scale of capnp/kj in the context of a bigger company? Do you just give a team a year and let them run?

I'm excited about capnp in the long run as more and more storage moves to NVME. Zero copy and related tricks are already big parts of Apache Arrow / Parquet; it's an important area to explore.

kentonv · on Oct 31, 2020

> how is it best to scope out something on the scale of capnp/kj in the context of a bigger company? Do you just give a team a year and let them run?

No, frankly, I think that would be a recipe for disaster.

The team needs to instead be tasked with a specific use case, and they need to build the platform to solve the specific problems they face while working on that use case. If you tell people to develop a platform without a specific use case, they will almost certainly build something that solves the wrong problems. Programmers (like humans in general) are mostly really bad at guessing which features are truly needed, vs. what sounds neat but really won't ever be used.

So, sadly, I think that businesses should not directly engage in such projects. But, they should be on the lookout for useful tools that their developers have built in the service of other projects, and be willing to factor those out into a separate project later.

Unfortunately, this all makes it very hard to effect revolutionary change in infrastructure. When the infrastructure isn't your main project, you probably aren't going to make risky bets on new ideas there -- you're going to mostly stick to what is known to work.

So how do we get revolutionary changes? That's tough. I suppose that does require letting a team run wild, but you have to acknowledge that 90% of the time they will fail and produce something that is worthless. If the business is big enough that they can make such bets (Google), then great. But for most tech companies I don't think it's justifiable.

choppaface · on Oct 31, 2020

Totally agree. It's a really hard balancing act. Open source really helps us learn though.

typon · on Oct 28, 2020

This style guide basically forces you to write Rust in C++

kentonv · on Oct 28, 2020

I agree with this summary.

Interestingly, though, at the time I wrote the guide, Rust was in its infancy, and I didn't know anything about it. :)

typon · on Oct 28, 2020

Tbh, I actually prefer the C++ with your style guide than Rust. Now if C++ had a package manager and a crates.io equivalent, I wouldn't look back at Rust at all. Unfortunately, C++ is just too far behind.

(Btw, I gave a presentation about your style guide at my company two years ago, trying to convince people that we should be doing this stuff ;)

kentonv · on Oct 28, 2020

I wish C++ had borrow checking, or something like it.

pjmlp · on Oct 28, 2020

Everything I care about is on vcpkg.

nyanpasu64 · on Oct 28, 2020

There are some differences in the details between KJ C++, and both Rust and my Rust-inspired C++ guidelines:

> Value types always have move constructors (and sometimes copy constructors). Resource types are not movable; if ownership transfer is needed, the resource must be allocated on the heap.

In Rust, all types (including resources) are movable.

> Value types almost always have implicit destructors. Resource types may have an explicit destructor.

What's an explicit destructor? Rust's File type closes upon destruction, and one criticism of the design is that it ignores all errors. The only way to know what errors occurred is to call sync_all() beforehand.

However, "Ownership" and "Reference Counting" (and "Exceptions" to an extent) feel very Rust-like.

> If a class's copy constructor would require memory allocation, consider providing a clone() method instead and deleting the copy constructor. Allocation in implicit copies is a common source of death-by-1000-cuts performance problems. kj::String, for example, is movable but not copyable.

When you include such a class in a larger structure, it breaks the ability for the outer class to derive a copy constructor automatically (even an explicit one, or a private one used by a clone() method). What's the best way to approach this?

kentonv · on Oct 30, 2020

> In Rust, all types (including resources) are movable.

Presumably not when pointers are pointing at them or their members.

In Rust, that is enforced by the compiler, but in C++ it is not. The rule that resource types are not movable is intended to provide some sanity here: this means a resource type can hand out pointers to itself or its members without worrying that it'll be moved at some point, invalidating those pointers.

> What's an explicit destructor? Rust's File type closes upon destruction, and one criticism of the design is that it ignores all errors. The only way to know what errors occurred is to call sync_all() beforehand.

I believe destructors should be allowed to throw, which solves that problem.

Obviously, this opinion is rather controversial. AFAICT, though, the main reason that people argue against throwing destructors is because throw-during-unwind leads to program termination. That, though, was an arbitrary decision that, in my opinion, the C++ committee got disastrously wrong. An exception thrown during the unwind of another exception is usually a side-effect of the first exception and could safely be thrown away, or perhaps merged into the main exception somehow. Terminating is the worst possible answer and I would argue is the single biggest design mistake in the whole language (which, with C++, is a high bar).

In KJ we let destructors throw, while making a best effort attempt to avoid throwing during unwind.

> When you include such a class in a larger structure, it breaks the ability for the outer class to derive a copy constructor automatically (even an explicit one, or a private one used by a clone() method). What's the best way to approach this?

In practice I find that this almost never comes up. Complex data structures rarely need to be copied/cloned. I have written very few clone() methods in practice.

_pmf_ · on Oct 28, 2020

I also absolutely love the kj style, but it's so different from basically everything else that I have a hard time incorporating anything in my daily work in the cess pit.

bertr4nd · on Oct 28, 2020

One of the reasons it’s hard to talk about clean C++ codebases is that there’s a huge range of possible complexity. The gap between “Solve a particular problem” and “provide a generic library” is very big. Template metaprogramming deservedly gets a lot of the blame, but implicit heap management via constructors/destructors is also pretty hard to follow.

The upshot is that “professional grade” C++ is often impenetrable (I tried to understand the implementation of boost::intrusive_list once. Yikes.)

LLVM is often cited as a very clean codebase and I think that holds up. Facebook’s folly is another good one (although it occasionally dives into template metaprogramming madness).

Hopefully I’ll be forgiven for also plugging a few projects I used to work on, that might hit a sweet spot between utility and code complexity: - Glow, a compiler for neural networks: https://github.com/pytorch/glow - ReDex, a bytecode optimizer for Android: https://github.com/facebook/redex

There was a nice blog post a few years back called “c++11 is a scripting language” that is unfortunately offline now; it did some task like reading lines from a file and sorting them, and it wasn’t dramatically more complicated than the same thing in Python. It’s worth doing a few of those kinds of exercises to get a feeling for the language.

gumby · on Oct 28, 2020

> I tried to understand the implementation of boost::intrusive_list once. Yikes.

To be fair library code, especially library code that's supposed to run on a huge number of compilers and platforms, must include support for a ton of weirdo special cases which ultimately render it unreadable to someone not steeped in it. It also has to handle weird corner cases most developers might not think of, either because it's just a weird corner of the standard or because it might commonly be, say, embedded in some other structure or used in an unusual template expansion which would imply non-obvious constraints. This BTW is true of any language.

By contrast: I wrote a small lock-protected container template used in a local code base. It's short, fast and easy to use. Also easy to understand if you bother to read the code (but it's so easy to use, why bother). But there's a huge tradeoff: it doesn't act completely like a regular container; it only handles cases we care about in our code base and only works on the three compilers we care about. There is zero interest in implementing a more general solution. When this object doesn't work, sometimes we extend it and sometimes we change the caller.

So which would be a better example to read? I'd argue: neither.

plibither8 · on Oct 28, 2020

For a simple command line program, I created 2048.cpp: https://github.com/plibither8/2048.cpp. It's gained quite some popularity (GitHub tweeted about it [1]), and the codebase has improved and grown all thanks to a new maintainer.

[1]: https://twitter.com/github/status/1017094930991370240

Also on HN: https://news.ycombinator.com/item?id=17897283

PS: A fork of 2048.cpp was even used as a demo at CppCon 2020 (https://www.youtube.com/watch?v=HrOEyJVU5As)

arximboldi · on Oct 28, 2020

That's a tricky question. Large C++ codebases often have a lot of "legacy" that does not necessarily reflect the best practices.

A couple of years ago I wrote a text-editor to have a realistic example of what I consider a nice modern architecture using immutable data-structures. In the README there is also a link to a talk where I cover some of the design and structure:

https://github.com/arximboldi/ewig

Libraries that it uses:

https://github.com/arximboldi/immer

https://github.com/arximboldi/lager

stagger87 · on Oct 28, 2020

I suggest the Doom3 source code.

https://github.com/TTimo/doom3.gpl

I find it to be extremely readable, and it has a C with classes approach that I tend to gravitate towards when I'm developing in C++. It is not an example of "modern C++".

kartayyar · on Oct 28, 2020

The problem with this codebase is it's not modern C++.

Modern C++ and it's use of std::unique_ptr / std::move is so much nicer vs. manual memory allocation.

jstimpfle · on Oct 28, 2020

Second that, it's beautiful.

tayistay · on Oct 28, 2020

Not terribly impressed. This wouldn't pass my code review: https://github.com/TTimo/doom3.gpl/blob/aaa855815ab484d5bd09...

Function is 200 lines long, with 8 or so levels of nesting. Also "goto breakout"

To address the objections below, the function reads from a file into a pixel buffer. It's not some tricky in-place update. That's a great candidate for a more functional style.

Here's more ick:

https://github.com/TTimo/doom3.gpl/blob/aaa855815ab484d5bd09...

Surely that could be factored better.

Down-voters go read Carmack's own article:

https://gamasutra.com/view/news/169296/Indepth_Functional_pr...

I think it's funny that people disagree here. This is exactly the stuff a modern linter would flag in an automated code review.

Sure enough, here's a linter. I think this is essentially the same codebase:

https://lgtm.com/projects/g/Edgarins29/Doom3/context:cpp

"Code quality: D" (on an A-F scale)

And by the way, I have the utmost respect for Carmack. I just wouldn't hold up this codebase as great.

stagger87 · on Oct 28, 2020

It's funny that your drive-by swipe uses an example that I might use to show why this code base is so great!

The code is readable and self documenting. The file format is practically documented by reading this code. The function has a single obvious purpose.

The function length and nesting is more a result of the file format itself. Seems like a waste of time breaking this function up. It would serve no other purpose than delaying the ship date and making this function less readable, a function that may likely never need to be visited again.

tayistay · on Oct 28, 2020

Well, this is about nice code to look at, not avoiding delaying the ship date.

Moreover, here's approximately how I'd write that TGA loading function:

1. Write a function to load just a header. 2. Unit test the function with a header. 3. Write a function to do the RLE decoding. 4. Unit test the function with some data. 5. etc. 6. Start assembling the pieces. 7. Unit test the whole thing.

Meanwhile, you tried to write it all in one monolithic function, and so now you're testing the whole thing (hopefully with a unit test) and you're staring at the debugger (or worse, some printf output) wondering what little mistake you made. Maybe if you're brilliant, like Carmack, you beat me to the finish line. But most of us are mere mortals.

arriu · on Oct 28, 2020

I've always encouraged people to check out Jonathan Blow's talk on this topic ( https://youtu.be/JjDsP5n2kSM?t=817 ). He points out many reasons why this type of code would actually be the optimal version when weighing other factors.

Simple code is easy to write, maintain, and optimize later. Despite the code looking messy, I find it easy to understand and navigate.

A counter example of truly confusing code, which kind of does similar things, would be something like this: https://github.com/ImageMagick/dcraw/blob/master/dcraw.c

tayistay · on Oct 28, 2020

I think I've watched that talk before. The starting point you linked to talks about when to do linear searches. Anyone with some experience knows that stuff. It's a good thing to tell undergrads in case they missed it, but Blow always acts like he's some rebel and conventional wisdom has it all wrong, which isn't really the case. Conventional wisdom says there's a place for linear searches (after all, lsearch()).

Anyway, he's just wrong about long functions being a-ok. I suspect he doesn't write many unit tests. This is the main benefit of separating the code into smaller functions. You can test each. Even if it's just parsing an image file header.

That some of the folks here think smaller functions are useless because they would only be called from from one function just shows they don't write enough tests.

unionpivo · on Oct 28, 2020

That kind of function would be tested with couple of correct files and several, wrong ones.

Splitting everything up and testing separately, make sense if you are building a library or general user program. For program where you control whole toolchain its overkill.

You need to be able to read correct image and hopefully not segfault on bad one. And this function does that.

Guzba · on Oct 28, 2020

Counter argument: http://number-none.com/blow/john_carmack_on_inlined_code.htm...

leetcrew · on Oct 28, 2020

I think about that email a lot at work (where we have some very long functions). I particularly like this excerpt, which is applicable outside of performance-critical situations:

> Besides awareness of the actual code being executed, inlining functions also has the benefit of not making it possible to call the function from other places. That sounds ridiculous, but there is a point to it. As a codebase grows over years of use, there will be lots of opportunities to take a shortcut and just call a function that does only the work you think needs to be done. There might be a FullUpdate() function that calls PartialUpdateA(), and PartialUpdateB(), but in some particular case you may realize (or think) that you only need to do PartialUpdateB(), and you are being efficient by avoiding the other work. Lots and lots of bugs stem from this. Most bugs are a result of the execution state not being exactly what you think it is.

in general I don't think it's worthwhile to split a long function into several static helpers just to get under an arbitrary maximum function length target. I don't think it leads to a net improvement in readability, since I now have to go back and forth between the helpers and the main function to see in what order the helpers get called (what if someone swaps the order of helperA and helperB but not the order of their definitions?). imo this is only worth doing if you're also willing to think long and hard about what happens if someone uses your helpers in a different context.

Jtsummers · on Oct 28, 2020

This exposes one of the weaknesses of C and C++. Nesting functions inside functions is actually a very useful thing to avoid precisely what's described, while still allowing code to be more readable. The inner functions can always be extracted later if deemed appropriate. So if C permitted it you could do something like:

  void FullUpdate(...) {
    void PartialUpdateA(...) {
      ...
    };
    ...
  }

So that function is only visible in this one scope, but where it's used it can have the effect of greatly improving readability (especially if it's called multiple times).

Now, C++ can halfway get there with private methods in classes. So anyone outside the class has to really try to break that encapsulation and access the function. C and C++ can get there by not exposing the functions in headers, so they remain file local.

But that doesn't prevent something like (within a file):

  // should only be called from FullUpdate
  void PartialUpdateA() {...}

  void AnotherFunc() {
    ...
    PartialUpdateA();
    ...
  }

tayistay · on Oct 28, 2020

You can do that fairly easily with C++ lambdas. I think the primary deficiency, though, is that the functions aren't exposed for unit testing.

Jtsummers · on Oct 28, 2020

C++ lambdas do solve the problem, I'm not sure why I didn't include that.

However, unit testing them is not the only concern. There are reasons for using nested functions, class methods, file global, or program global functions.

If you make them global or methods, you lose control of how and when they're called. This can break invariants. So some functions can be hoisted up, but others oughtn't be (in particular, any pure function can be made global without any concern other than occupying a name, side effecting functions should be more carefully considered).

The interface to the functions may change if they capture any variables. If they capture nothing in the local scope, then hoisting them doesn't impact their interface. If they do capture something, hoisting them means adding parameters (complicating the interface) or making them observe variables either in the class (complicating the class) or file/program globals (bad practice).

Regarding unit testing. Nesting functions (or lambdas) are essentially a wash. You were, hopefully, testing the host function to begin with so nothing is changed if you use nesting functions as your first pass refactoring approach. You can then examine those functions and consider which should be moved out and why, and then add tests to any that have been pulled out of the host function.

tayistay · on Oct 28, 2020

C++ gives you enough access control so you can expose things just for testing if you want (make the test a friend, encapsulate the functions into another class). You can do this without breaking encapsulation.

If the functions are capturing a lot of variables, then I would try to reconsider the design. Usually things aren't irreducibly complex.

I don't exactly follow your last paragraph, but tests aren't just something you throw away when they pass. So if I were to write tests for the helper functions, I wouldn't delete those tests on a refactoring pass in order to use nested functions.

Jtsummers · on Oct 28, 2020

> C++ gives you enough access control so you can expose things just for testing if you want (make the test a friend, encapsulate the functions into another class). You can do this without breaking encapsulation.

If tests can do this, then so can anyone else. Consequently encapsulation is broken and your invariants aren't invariant anymore.

> I don't exactly follow your last paragraph, but tests aren't just something you throw away when they pass. So if I were to write tests for the helper functions, I wouldn't delete those tests on a refactoring pass in order to use nested functions.

I didn't write clearly because I didn't re-present the context of that paragraph.

I'm not talking about throwing away tests after they're run. Keep in mind my original post's context: manually inlined functions for access control to that functionality. You already can't test those separately because they aren't exposed. By moving to nested functions you regain some semblance of reasonability (versus 1k+ line functions with who knows how many levels of nested blocks) and the compiler can do the inlining (for performance). But it has zero net effect on testing, it's a wash. Because the public interface is the same (only the primary function interface is accessible to a tester).

If nested functions are available (and with C++ they are with lambdas) the refactoring would (or could) be something like: 1k line function => 500 line function with several lambdas => 3-8 functions totaling ~500 with some lambdas remaining.

Only those that make sense to move out for separate testing would be, and only if you also wanted to expose them for others to call.

tayistay · on Oct 28, 2020

> If tests can do this, then so can anyone else.

If I had an image loader class, I can make a test function a friend, which would allow it to call private functions on the class. This only grants access to the test function (or test class). And there are stronger ways to hide things, like the PIMPL idiom, private headers, opaque pointers.

I find it interesting we're in such different schools of thought here.

> Only those that make sense to move out for separate testing would be, and only if you also wanted to expose them for others to call.

There are so many things that you might want to hide from an interface, yet still test. Imagine if you took that to an extreme and only tested the public interface of a library. I'm all for trying to independently test any bit of code that fills a screen.

guenthert · on Oct 28, 2020

I suppose you're not implementing safety-critical code then, as AV rule 1 [1] demands that "Any one function (or method) will contain no more than 200 logical source lines of code (LSLOCs)."

Personally, I just find it frustrating, if functions don't fit on the screen anymore (and I do use portrait mode already). Further, sub functions, when named appropriately (definitely not like helperA and helperB) can aid readability (as would comments about code blocks do, but who writes those and who maintains those?).

[1] https://stroustrup.com/JSF-AV-rules.pdf

leetcrew · on Oct 28, 2020

you are correct; I do not work on safety-critical code. I have worked with a lot of legacy code where functions depend on and mutate global state, often in fairly subtle ways. as much as possible, I want to see exactly which globals are being read/written and in what order. from this perspective, the "whole function" doesn't fit on one screen, whether or not sub-functions are factored out, and the name of a function cannot possibly tell me all I need to know about it.

typon · on Oct 28, 2020

I hope you read his 2014 update.

free_rms · on Oct 28, 2020

The whole function is operating on a giant shared buffer representing a texture, though. So you can't break it into pure functions without introducing unacceptable copies of the buffer, right?

typon · on Oct 28, 2020

It depends on what you're doing. If parts of the giant function are only reading the buffer, than you can just pass it as a const T& to your smaller inner function. If the inner function is modifying it, then i'd still pass it in, just as a T&. Or maybe I'd define an immediately invoked lambda that captures the buffer by ref. Functions are a tool for defining interfaces to pieces of logic - maybe I'm not as smart as Carmack and I don't have such a huge working memory, but I just can't keep more than 3-4 concepts in my head at the same time. Functions let you isolate those concepts and only think about their interface, rather than their internals.

free_rms · on Oct 28, 2020

I think the principle behind Carmack's letter is that factoring into multiple functions that all touch the same state is an illusion of reducing the burden on your memory. They still interact with each other via that state.

Pure functions, or your const T& situation, actually create smaller arenas of state. Splitting out multiple functions operating on the same object does not.

reactordev · on Oct 28, 2020

Common image decoding code from early 2000s C++. This pattern was actually preferred because it kept the nuance of the image file structure visible within the function. Yes, it's heavy handed now... but back then it was useful to know the "blocks" and "strides" of images when you had to make pixel-perfect changes and didn't have higher-level supporting code.

Could they have abstracted MakeMegaTexture_f into something that built up leaves of tga structs they interweave in that function? Or just chunk through them with the tga data structure. Our standards for what is good code has changed with our understanding of code.

tayistay · on Oct 28, 2020

I was coding C++ (for graphics) in the early 2000s, and I (and I think most others I knew) would have considered it bad back then. But regardless, the topic isn't "C++ code that was considered good when it was written".

Nuzzerino · on Oct 28, 2020

I have written similar code but with more optimizations. It's much faster to allocate chunks of memory onto the stack, then do small amounts of decoding into this stack memory before committing the memory back into the heap. I don't think it is good code from a performance standpoint.

raverbashing · on Oct 28, 2020

That code from the 1st line was most likely converted from existing graphics code in C

Can you rewrite it in a way that's readable, performant and understandable/maintainable to someone with an understanding of the knowledge domain?

I agree, the code is ugly. I disagree it would be better if converted to some OOP hierarchy, it would be probably less understandable even if the code was tidier

Code like

    SnarfleBlaster = new SnarfleBlaster();
    SnuggleNerfer = new SnuggleNerfer();
    Guffles guffles = SnarfleBlaster->Blast();
    SnuggleNerfer->Tumbles(guffles);

Is tidy but absolutely dense

dataflow · on Oct 28, 2020

You're in for a wild ride. C++ takes forever to learn (basically, it never ends) and people have VERY different senses of what "clean" C++ code is like.

And for that matter, even the same person might write in different styles depending on the situation. And the same exact style might be terrible for one situation but awesome for another.

With all that said, one "good" kind of codebase to at least know (even if you can't emulate it) is the kind of codebase that is modeled after the C++ standard library. Some Boost libraries (not all!) are great examples of this. Probably the best example I can think of off the top of my head is Boost.Container:

https://www.boost.org/doc/libs/1_74_0/boost/container/vector...

Note that this does not mean you should write code like this for your applications, though. Boost is heavyweight with the templates/headers/macros and these classes are meant to be extremely generic. Your application-level code does not necessarily need to meet the same types of constraints, and it's just not worth the effort in most cases (as well as being slower to compile). But if you can write this kind of code when it's warranted, it ends up being very high-quality. (Some parts of this involve a lot of difficult work, like paying attention to exception-safety, that is often unnecessary. Other parts are quite simple and well worth picking up, e.g. liberal use of typedefs. And everything else in between.)

A more typical example of a codebase might be some of wxWidgets, e.g.:

https://github.com/wxWidgets/wxWidgets/blob/v3.1.4/src/gener...

It's not in the style of the standard library (so you won't find it to be as generic, exception-safe, etc.) but it's pretty decent.

MaxBarraclough · on Oct 28, 2020

> C++ takes forever to learn (basically, it never ends)

It's not just that it has a huge and growing feature-set, it also has an outsize wealth of dark corners. As a slight silver lining, there's an excellent community on StackOverflow where these quirks are explained well.

You can spend hours just reading about initialization in C++. I have, and I can only recall a small fraction of it (but I rarely use C++). It's nightmarish in ways you'd never imagine. [0][1][2]

[0] https://stackoverflow.com/a/54350350/

[1] https://stackoverflow.com/a/620402/

[2] https://news.ycombinator.com/item?id=18832311

jstimpfle · on Oct 28, 2020

> Probably the best example I can think of off the top of my head is Boost.Container:

> https://www.boost.org/doc/libs/1_74_0/boost/container/vector...

Which I find absolutely hilarious. Thousands of lines of boilerplate for something as simple as an dynamically resizable array. Only for the _header_ which has to be included at each location that uses a vector either directly or indirectly. These are the kinds of things that prevent me from touching C++.

gpderetta · on Oct 28, 2020

This is quite unfair.

Boost.container::vector implements the full standard vector interface (that's already a large interface), plus quite a few non-standard extensions (significantly it has built-in support for default initialization).

Boost.container also has support for stateful allocators and custom pointer types so that it can be used, for example, on shared memory (in fact boost container was originally a sub-library of boost.interprocess). That alone will increase the complexity significantly.

It also I believe still support pre-c++11 (and substandard post c++11) compilers, so there is a lot of compiler workarounds and emulation of c++11 features.

Also, being a template, the whole implementation is just in the _header_.

jstimpfle · on Oct 28, 2020

And you don't notice how you're only making my point?

As for being a template, that doesn't preclude from having most of the implementation in a .cpp file. In fact that's highly desirable (separate compilation).

gpderetta · on Oct 28, 2020

My point is that you would use boost.container if you need its additional features otherwise just use the equivalent standard containers. The complexity is there for a reason.

A separate .cpp would require explicit instantiation. I do not see how that would work for a third party library that has to work for any T.

Normally you can at least abstract away the memory management code, but in the case of boost container the use of custom allocators is expected, so you can't even do that.

kllrnohj · on Oct 28, 2020

It's desirable for complication speed, but not for performance. The blessing & curse of templates is that while you pay for them in compilation speed, you're also getting every scrap of localized optimizations possible without needing to run a much slower, and less capable, LTO pass.

There isn't a "strictly better" here, it's a trade-off. If you want a split implementation, make one. That's the beauty of C++ - there's nothing special the standard library can do that you can't. Android, for example, has a split implementation: https://cs.android.com/android/platform/superproject/+/maste... (I wouldn't use it, though, it's pretty outdated & shitty, but you can still do what you're talking about).

dataflow · on Oct 28, 2020

This was not intended to be an advertisement for C++. Nobody is begging you to use it. If you hate it, or otherwise don't understand its values and trade-offs, please don't use it.

fsloth · on Oct 28, 2020

"These are the kinds of things that prevent me from touching C++"

Please don't let boost stop you.

At my team we avoid boost like the plague and prefer terse and pragmatic code. IMO C++ is best approached like C but with convenient data structures in the standard library - and tons of patterns that _may_ be taken into use if they simplify the code.

That said, unless you are working in a specific setting where C++ is obviously the best tool, you probably should not use C++. Even at best of times C++ is complex and unproductive. But it offers an unbeatable combination of ecosystem maturity, robustness, close to the metal performance and high level concepts to fit few slots better than anything else.

ncmncm · on Oct 28, 2020

If you are unproductive when coding C++, you are Doing It Wrong. Good C++ code flows like water. If you are finding it complex, you have gone down a side alley, and need to get back to the wide, sunny boulevard.

I frequently sit down and write 2000 lines of C++ code, and when it compiles, it works. Aim for that.

fsloth · on Oct 29, 2020

My 2000 lines of C++ work fine just as well. C++ is unproductive when compared to other languages.

C++ can be found to be the best language for a specific class of problems. But.

Those 2000 lines of C++? If that code had been python or F# or even C# and the problem had been 'generic' enough it would likely have less lines of code and be done faster.

If that 2k lines happened to implement some numerics stuff with hard performance requirements then C++ might come up on top.

But my complaint overall is not about a short 2k line program. It's about the whole program lifetime and software complexity of a complex system.

For non-trivial programs C++ opens a whole can of worms concerning compatibility, deployment, locales, memory handling, weird bugs due to sometimes obscure lifetimes, integer sizes on different plarforms... etc etc.

And the C++ spec is infinite for most mortals.

huhtenberg · on Oct 28, 2020

Indeed. To each his own.

I've been using C++ as a daily workhorse since the early 90s and Boost and its standardized derivatives is easily the worst thing that ever happened to the language. It may all make logical sense, but the end result looks, basically, like butt. It is not a refined, thoughtfully evolved language that is a pleasure to use. It has so many features bolted on at so many different angles that it has no distinct shape anymore. Design by committee at its finest.

lentil_soup · on Oct 28, 2020

> Other parts are quite simple and well worth picking up, e.g. liberal use of typedefs

honest question, why do you think liberal use of typedefs is a good habit to pick up?

dataflow · on Oct 28, 2020

Lots of reasons. It's just something I've learned based on experience. They're essentially free abstractions that confine a source of truth (the type of a variable) to a single location. They codify the abstractions as well. They make it easier to understand the semantic purpose of each type and can make meaningful distinctions when concrete types are the same (e.g. std::vector<size_t> is quite unilluminating compared to, say, 'NameIndices' and 'ValueIndices'). They make it very easy to change types down the road (as the concrete type is now declared in a single location). They give you control over the type inference procedure. And they give you the majority of the benefits of 'auto' without the same pitfalls.

That said, I'm not suggesting you use them indiscriminately. 'this_type' and 'value_type' are often useful; 'difference_type' and 'size_type' generally won't gain you anything...

gtirloni · on Oct 28, 2020

https://github.com/scylladb/seastar

https://github.com/scylladb/scylla

https://github.com/catchorg/Catch2

https://github.com/facebook/folly

whimsicalism · on Oct 28, 2020

Haven't looked through the scylla codebases, but can definitely recommend the latter two.

cltby · on Oct 28, 2020

Strongly seconded. Beyond the language aspects, the scylladb projects are full of instructive architectural ideas and design choices.

PeterCorless · on Oct 28, 2020

A good blog I wrote up based on Avi Kivity's talk at Core C++ 2019. Also our blog on using io_uring and eBPF:

• https://www.scylladb.com/2020/03/26/avi-kivity-at-core-c-201... • https://www.scylladb.com/2020/05/05/how-io_uring-and-ebpf-wi...

fsloth · on Oct 28, 2020

This codebase won an Oscar:

https://pbrt.org/

https://github.com/mmp/pbrt-v3

That said, IMHO, there is no "clean" C++ code. There are C++ codebases that use different styles, and their "quality" more or less is context sensitive.

Personally I felt the best tutorial to C++ were actually two other programming languages - Scheme and F#.

Scheme drove in the concept that it's totally fine to model things based on the shape of the data and that you don't need to create type based abstractions around every thing.

F# then demonstrated how a language with type system is supposed to work.

The problem with C++ is that the language is so verbose that unless you have an abbreviated model in your head how a type based language can be used to solve problems in best way, you will get lost in trivial C++ minutiae.

So, personally, I generally think "Am I solving a Scheme like problem or Standard ML like problem" and then try to apply C++ as simply as possible.

Several academics have created a career of how to patch C++/Java/C# with concepts that make them less fragile in a collaborative context:

https://en.wikipedia.org/wiki/Design_Patterns

https://www.amazon.com/Design-Patterns-Elements-Reusable-Obj...

In my opinion design patterns are not a fundamental concept, but rather provide common syntax for collaboration purposes for various patterns that are parts of language and more or less invisible in e.g. Scheme or F#. But if one is diving into C++ it's probably convenient to be familiar with these concepts.

ncmncm · on Oct 28, 2020

Each design pattern identifies a weakness in the language where it appears. If the language were stronger, the pattern would be either just a core language feature, or (better) a library component. Languages get stronger by being able to capture more patterns in libraries.

If your C++ code is "so verbose", you are Doing It Wrong, most likely by doing it the old way. C++98 was verbose.

fsloth · on Oct 29, 2020

C++ code as written can be succint but the spec to describe what the code does can be really verbose.

So - verbose in terms of the spec which describes what the code actually does and mental chatter I have to do with myself when writing it "should I return a ref or a smart_ptr or a value or should I use move semantics..." whereas in F# it's just a ref and garbage collector will deal with it. So I can skip the mental chatter part.

CyberDildonics · on Oct 28, 2020

PBRT won a technical achievement academy award because it was a reasonably complete renderer from start to finish while being not just open source but extensively documented and explained. The source though is not a great example of modern C++ at this point. It is based heavily on inheritance and uses lots of memory allocation and virtual function calls.

fsloth · on Oct 28, 2020

The source is aged a bit but

a) displays a way to put together a non-trivial C++ application

b) displays several extremely useful graphics related algorithms and patterns

c) Is documented on a level and quality few public equal code bases reach - so get the book first and then start reading the sources

daniel-thompson · on Oct 28, 2020

> It is based heavily on inheritance and uses lots of memory allocation and virtual function calls.

Interesting - can you talk a little bit about why those specific design choices are questionable from a modern C++ POV? How would you do it differently, given the problem & domain PBRT covers?

I ask because my mental "sweet spot" in terms of understanding and writing C++ is approximately C++11. It's pretty easy for me to go into symbol shock when I see some of the latest stuff in C++17 and C++20... and I suspect I'm not the only person that happens to.

CyberDildonics · on Oct 28, 2020

The main points are not C++ features, but the way classes and inheritance are how everything is broken down.

A much better approach is to allocate large chunks of memory at one time and run through the arrays of floats directly. Instead of tracing and shading one path ray all the way through, finding out where it hits, running the brdf, looking up textures, casting another ray, etc. it is much better to do each stage in large chunks.

Going from data structures holding lots of virtual pointers to something like I described above can be a substantial improvement in speed and give a lot of clarity at the same time.

Many times programs with a lot of inheritance end up with their execution dictated by types calling other types' functions, which makes it more implicit and buried rather than linear in a top level function. It becomes similar in a way to callback hell where you are trying to track down how the program gets to a certain place. Many times it takes adding break points and walking back through the call stack to reverse engineer it, rather than looking at a single top level function that has a lot of steps.

daniel-thompson · on Oct 28, 2020

Also note that PBRT v4 is out in early release: https://github.com/mmp/pbrt-v4

tony · on Oct 28, 2020

C++ build systems will be your main blocker, as they're languages in themselves. Autotools, CMake, configuring Visual Studio/etc.

Easy tasks:

- Challenge yourself by cloning projects reading their README and compiling/running them [1]

- Port projects to other platforms (e.g. Windows-only game examples to Linux/Mac/FreeBSD) [2]

Codebases I've studied [3]:

sdl/graphic widgets: openttd, aseprite, nanogui(-sdl)

runtime / abstraction: v8, node, protobuf, skia

templates / language integration: pybind11, boost-python

wrapping C: libsdl2pp (SDL2 wrapper)

small: tmux-mem-cpu-load, uMario_Jakowski

huge: blender

Architecture / theory: https://aosabook.org/en/wesnoth.html, https://aosabook.org/en/audacity.html, https://aosabook.org/en/cmake.html, https://www.aosabook.org/en/llvm.html

Books: Scott Meyers C++ books

[1] It's C, but to push myself to grok build systems I tried to port tmux's autotools build system to CMake: https://github.com/tony/tmux/compare/master...tony:cmake

[2] https://github.com/jakowskidev/uMario_Jakowski/pull/1

[3] https://github.com/tony/.dot-config/blob/5c59f46/.vcspull.ya...

fsloth · on Oct 28, 2020

Oh, yes, the C++ ecosystem is horribly fragmented.

But I would hardly call the build system the main blocker unless the software module under development is near trivial in complexity.

thechao · on Oct 28, 2020

C++ was my first language (`94?) and the one I am very passionate about; I worked on the Indiana concepts proposal & variadics for part of my dissertation, and my advisor & his colleagues designed large chunks of what would be C++11.

In my opinion, the very best C++ is plain-old-C that uses modern C++ standard libraries, while carefully controlling the definition of the memory allocator for those libraries.

Olipro · on Oct 28, 2020

What use-case are you envisioning where one needs to "carefully control" the allocator for STL types?

Your idea of heavenly C++ sounds like my idea of hell.

hexane360 · on Oct 28, 2020

>Your idea of heavenly C++ sounds like my idea of hell.

I'm concerned this may be the definitive C++ experience...

ska · on Oct 28, 2020

True of pretty much all "big tent" languages, I suspect.

ajoseps · on Oct 28, 2020

You can use custom allocators to minimize dynamic allocation. STL now provides some of these allocators out of the box, but it still can be useful to define your own. Arena or memory pool allocators are the first that come to mind.

spacechild1 · on Oct 28, 2020

While I work in a field where such things are necessary (real-time audio programming), I doubt that such things are relevant for someone just starting out with C++...

ajoseps · on Oct 28, 2020

Oh yeah, I definitely agree. I don't think it's relevant for someone just starting out. Just wanted to put it out as an example

jcelerier · on Oct 28, 2020

> I doubt that such things are relevant for someone just starting out with C++...

you'd be amazed at the amount of people who start learning C++ because they want to write VSTs

spacechild1 · on Oct 28, 2020

True, but they would probably use a framework like JUCE which does a great job at hiding the complexity. Writing a VST3 plugin from scratch is not for the faint of heart.

jasode · on Oct 28, 2020

>In my opinion, the very best C++ is plain-old-C [...]

Not to start a debate but just genuinely curious...

Do you avoid writing ~destructors() in C++ to adhere to "plain C" principles?

Imo, destructors (RAII) is one of the killer C++ features. There's a lot of buggy plain C code that has a malloc() but missing the associated free() and/or an openXyz() but missing the closeXyz(). Plain C has to manually write cleanup() functions or code generators to simulate dtors() which is a hassle.

I guess the related larger question is if the "best C++" means avoid writing any classes at all? (E.g. Linux source with plain C uses structs and function parameters to simulate OOP: https://lwn.net/Articles/444910/)

thechao · on Oct 28, 2020

I use RAII via `unique_ptr<>` and friends. I would never have a "bare malloc": all allocation occur as side-effects of using containers, or are stuffed into `unique_ptrs<>`.

gumby · on Oct 28, 2020

what if you allocate on the stack?

    {
      something a_thing {};
      ...;
    }

Is a perfectly reasonable way to allocate something on the stack and have it unwound at the end of the block. No pointers needed. Perhaps I'm misunderstanding your argument.

`new`, much less `malloc` hardly ever need appear. I'm working in a reasonably large code base that has four uses of new: three hidden in in internal graph node construction and one in a high-performance deserializer (actually that last is in the process of being replaced by code that simply requests pages from the OS). There are very few cases where user code might allocate memory with `new` these days.

pooya13's argument stands as well.

thechao · on Oct 28, 2020

unique_ptr<> is a mechanism to give automatic (this is what you mean by 'on the stack') life time to non-automatic objects, like FILE*s, mmap()s, defined results from malloc(), etc. It's not like I'm heap-allocating an int for local use.

gumby · on Oct 28, 2020

IMHO that's a painful level of abstraction to work at.

Yes, 'automatic' variables in C or C++ are just stack allocated objects;, there's nothing about a FILE* that makes it automatic or static. But if you use, for example, streams you can just create one for a file and when it goes out of scope it closes the file and deallocates for you.

Likewise I would make a tiny class for my mmap'ed memory that did whatever cleanup (perhaps searializing data or zeroing pointers into it) before returning the pages to the OS pool. Sure you can write that code into a unique_ptr deleter and pass that in when you make them, or you could just let the compiler do it for you.

I don't use C++ in what would typically be called an "object-oriented" way, but this is a perfect example of where I would create a small class as a way of telling the compiler to do some bookkeeping on my behalf.

pooya13 · on Oct 28, 2020

What if the resource is not memory?

thechao · on Oct 28, 2020

unique_ptr<> let's me override its destructor behavior with a function object that's called on the resource. I mostly use unique_ptr<> for 'wild' mmap()s, FILE*()s, etc.

ncmncm · on Oct 28, 2020

You can make a world of pain in any language.

But you don't have to. If you are trafficking in unique_ptr customizations, you are coding The Hard Way. std::unique_ptr is convenient mechanism, not an organizing principle. Make a named class or class template with a member that is a unique_ptr, and traffic in instances of that class, by value.

If you are working with FILE pointers, you are doing things the hard way. Do you have some library that demands you give it a FILE pointer? Otherwise you probably want a std::filebuf. Likely on the stack, or a member of class that provides a more useful abstraction than a raw file.

rramadass · on Oct 28, 2020

If you mean using C++ as a "better C" but using the powerful C++ standard library features (Algorithms, Containers etc.) then i very much agree with you. All OO/Generic/Metaprogramming techniques are to be tightly regulated in production code so as not to complicate and obfuscate the codebase. This is one of the reasons that i am cautious and wary of all the "modern" additions in C++11/beyond. Probably my viewpoint may change with time and experience but for now i feel that things have become quite complicated.

ncmncm · on Oct 28, 2020

This is only the second-worst way to use C++.

The worst way is to emulate Java, putting everything in derived classes with virtual functions, and trafficking in shared_ptr. We call that Java Disease.

Using C++ as a "better C" leaves almost all the potential benefit on the table, leaving you hardly more productive than you were in C. Coding C++ as C++, you can be worlds more productive, making the compiler do most of your work for you, and leaving the bugs behind in the code you didn't need to write.

rramadass · on Oct 29, 2020

Not quite what i meant. By "better C" i meant using a "safe subset" of the language without going crazy with all its features which is what i see happening often. Having used C++ since the early 90's i have seen too much incomprehensible and unmaintainable code just because the programmers had learnt a "shiny new technique" from some book and understood it as everything must be done that way(i must admit i too had fallen prey to this disease for a while :-) Thus for example, i have seen code with needlessly deep class hierarchies, design patterns madness, no free standing functions(everything had to be in a class!), complete ban on macros, template and metaprogramming incomprehensibility etc. C++ is powerful but that must be wielded judiciously and responsibly by the programmer which is only feasible with experience and maturity. Thus it is better to err on the side of caution and restrict usage to a well-defined subset of language features.

ncmncm · on Oct 29, 2020

It is true that you need really good reasons to make virtual functions, or to do metaprogramming. I, also, usually avoid them. I have not made a more than two-level inheritance hierarchy in this millennium. There is no substitute for taste, design sense, or insistence on simplicity.

Stepanov referred to OO as "gook", and has publicly regretted making member functions like size().

But putting types to work for you is a great force multiplier.

carapace · on Oct 28, 2020

Ah, a breath of fresh air. Hat's off to you.

I was tempted to say something like "Sure, look at the best C code bases." but that would have been too snarky.

ncmncm · on Oct 28, 2020

Good C is bad C++.

dmacvicar · on Oct 28, 2020

You may enjoy browsing and reading the code of:

- http://www.serenityos.org/ (Andreas Kling)

- https://godotengine.org/ (Juan Linietsky)

Those are among my favorites because of simplicity and readability vs the complexity of the domain they implement.

markpapadakis · on Oct 28, 2020

https://medium.com/@markpapadakis/interesting-codebases-159f...

In addition to those I highly recommend studying ClickHouse’s codebase. There are brilliant design and engineering bits everywhere. I learned more from studying this codebase than most other I can think of, especially with regards to templates meta-programming ( I learned about “template template parameters” from coming across extensive use of those there ). It’s actually somewhat challenging to grok what’s going on — but it is worth pushing through until you “get it”.

shepik · on Oct 28, 2020

Yep, i second the clickhouse suggestion. Much more easier to understand than, say, mongodb or mysql codebases.

muststopmyths · on Oct 28, 2020

I would pick an area you are already familiar with and then look for public codebases in that field. Otherwise it can be hard to just read C++ code and figure out the organization/abstractions.

To generalize:

- Google software like protobufs and chrome are good examples of accepted C++ practice in the general software industry. It goes a bit overboard in the C++-iness for my taste, but that's just a personal thing.

- For game engines, Unreal Engine 4 is good example of pretty well-organized large C++ codebase. For size and complexity of the beast, it is not hard to read the code. Understanding the overall structure depends on how deep you are already into games.

- There are some games over the years that have released their source code. I consider Monolith's games (F.E.A.R, No One Lives Forever 2) to be pretty well-organized C++ code. They're very out of date now, but you can apply newer C++ features to their basic structure and still come away with decent code.

To learn C++, I would start with your own C code and the morph it in the following ways:

- Use C++ objects to represent functionality boundaries and data abstractions. You probably have a bunch of structs and functions that operate on them. Convert those to C++.

- Start using destructors to automatically clean up objects. Read up on RAII patterns.

- Use smart pointers instead of dynamically allocating/freeing memory with malloc/new/free/delete.

After these basics, you can branch out into exploring more in depth with templates, fancy C++17/20 features etc. Don't try to use all the features and capabilities at the outset. You can end up with a codebase you won't want to read in a year, let alone other people on your project. I've seen massive, unreadable templated code because someone got a shiny new template hammer and banged away at every problem as a nail.

tcbawo · on Oct 28, 2020

Sound advice. I would advise designing with unique pointers first and minimize shared pointers where possible. Shared pointers have a smell. Smart pointers have reference counting overhead. Unique pointers communicate ownership, while smart pointers communicate lack of ownership.

I would also recommend using an IDE with decent tooling. C++ is insanely hard to parse. If you go so template/macro crazy that you confuse your IDE context parser, you will most certainly confuse other people reading your code.

rramadass · on Oct 28, 2020

Since C++ is a "Multi-Paradigm" language your study of C++ code should include the various different expressions of the language. Also it is best if you can have a book explaining and walking you through the codebase. To that end i can suggest the following from my own experience;

* To study OO expression of C++ via Class libraries/Frameworks i recommend a study of "Microsoft Foundation Classes (MFC) library" paired with the book MFC Internals. The code is available with the installation of Visual Studio.

* P.J.Plauger's The Standard C Library and The C++ Standard Template Library(with Alexander Stepanov, Meng Lee and David Musser) will teach you excellent C and C++ Generic programming skills. Code available with the standard C++ toolchain installation.

* The Boost libraries paired with the book C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond.

* The FXT library of Algorithms by Jorg Arndt (https://www.jjj.de/fxt/fxtpage.html) along with its book Matters Computational for writing optimized C++ code.

I would also suggest reading the following two classics to learn designing in C++;

* Multi-Paradigm Design for C++ by James Coplien.

* Scientific and Engineering C++: An Introduction with Advanced Techniques and Examples by Barton and Nackman.

hartem_ · on Oct 28, 2020

Apache Mesos has a very solid, clean and well maintained code base (https://github.com/apache/mesos). It’s a massively scalable distributed system that has been running in production for many years at Twitter, Apple, Netflix and other places (more recently it’s being replaced by Kubernetes, but that’s beside the point of this thread).

Mesos is interesting as a whole but it also has some parts that are notoriously hard to get right (like their Paxos implementation).

It also contains an RPC library that implements an actor model and uses futures and promises (https://github.com/apache/mesos/tree/master/3rdparty/libproc...), as well as a header-only utility library called Stout (https://github.com/apache/mesos/tree/master/3rdparty/stout) with a lot of useful functions.

jeffbee · on Oct 28, 2020

I recommend Abseil. They have a good style that uses some modern C++ features but not the kitchen sink. Example you could start right here:

https://github.com/abseil/abseil-cpp/blob/master/absl/string...

LevelDB is good reading, too.

ausjke · on Oct 28, 2020

Considering the new C++11/14/17/20 these days you might want to focus on the new way of coding.

some code for study are listed here: https://github.com/rigtorp/awesome-modern-cpp

ncmncm · on Oct 28, 2020

Most code you will run across is written in an ancient version of the language. Most of the code suggested in this thread is that.

Modern C++ looks very, very different from C, and even more different from Java. So, if you see code that looks like one of those, you know it is not a thing to emulate.

Certain companies, notably Mozilla and Google, have extremely peculiar coding standards that badly warp how code has to be written for them. You do not want to emulate that code. If you see classes with "Init" and "Deinit" members, and constructors that don't do anything, you know you are in a dark back alley headed for a world of suffering.

If you see code that looks like it might as well be C, it's probably bad code. Modern C++ is a very, very different language from C. Most of your C habits, if you have any, are bad C++ habits that you will need to work hard to unlearn. In particular, if you handle naked pointers much, you are Doing It Wrong.

If you see code organized like Java, full of shared_ptr and virtual functions, it's certainly bad code. (I think Dropbox just published a big library that was virtual functions from top to bottom. Shiver.) We call that Java Disease. Inheritance and virtual functions are occasionally useful, as pure mechanism, but not as an organizing principle.

Modern C++ code works with values more than pointers or references, and prefers to pass objects by move over by reference. If you actually see allocations, and particularly deallocations, in top-level code, it is very likely to be bad code.

kartayyar · on Oct 28, 2020

I suggest looking at some of Google's open sourced projects.

- Abseil: https://github.com/abseil/abseil-cpp https://cs.opensource.google/abseil

- Tensorflow: https://github.com/tensorflow/tensorflow/tree/master/tensorf... https://cs.opensource.google/tensorflow

- Chrome https://chromium.googlesource.com/chromium/src.git/+/refs/he... https://source.chromium.org/chromium

There is a very high bar for C++ code quality at Google, and also a published style / best practices guide as well as a code search engine to browse the code more easily.

ncmncm · on Oct 28, 2020

But be careful about picking up Googlisms.

Google's allergy to exceptions pollutes their designs. Any class with default constructor, Init, and Deinit is badly polluted.

ahartmetz · on Oct 29, 2020

I have worked on many C++ codebases and none of them used exceptions pervasively. It works fine and I don't miss surprise return values.

xfs · on Oct 28, 2020

Why though? Realtime systems, games would love to disable exceptions.

ncmncm · on Oct 29, 2020

I code realtime systems with submicrosecond deadlines. And exceptions.

It is necessary to know what you are doing.

p0nce · on Oct 29, 2020

Exceptions are much more needed than the current counterhype makes it appear.

bitigchi · on Oct 28, 2020

Haiku: https://git.haiku-os.org/haiku/tree/

It's written in C++, has strict coding guidelines, and mostly free of bloat.

palotasb · on Oct 28, 2020

I recommend the LLVM source code. I've worked a bit with it implementing toy compilers and using LLVM as a backend for generating x86/SPARC object code. Just compiling it is a nice exercise if you're new to C++ and CMake (as I was, then).

I would start by reading the LLVM Programmer's Manual [prog man] and the LLVM Coding Standards [code stand] and looking up how the code they reference and the examples are implemented and how they interact with each other. You can browse the code in the LLVM Mirror Git repository [llvm-git] or the cross-referenced Doxygen [llvm-doxy] (has very useful links!). If you're willing to spend the time it might be worth setting up the codebase in your IDE of choice with the proper integrations to be able to browse and cross-reference the source code there.

There is a lot to learn especially from the Important and useful LLVM APIs and Picking the Right Data Structure for a Task (and the corresponding code) in the Programmer's Manual. The data structures under llvm/ADT could be especially useful to look at. They are good generic data structures that you may even want to directly reuse in your own project, but they are much easier on the eyes and closer to what you and I would want to see in our own codebases than the source code of the standard library or the Boost data structures.

[prog man]: https://releases.llvm.org/4.0.1/docs/ProgrammersManual.html

[code stand]: https://releases.llvm.org/4.0.1/docs/CodingStandards.html

[llvm-git]: https://github.com/llvm-mirror/llvm

[llvm-doxy]: http://llvm.org/doxygen/modules.html

PS. The Coding Standards will contain a lot of subjective stuff that makes sense inside LLVM for consistency but you don't want to adopt verbatim into an existing (maybe not even a green field) project, like naming variables with Capital letters. You can skim these parts, but it might be useful to know about the existence of these types of issues.

oandrew · on Oct 28, 2020

ClickHouse has very clean and modern C++ codebase. https://github.com/ClickHouse/ClickHouse

darknoon · on Oct 28, 2020

Personally, I think web browsers are good examples. You see integration with various platforms, abstractions on top of them, and general algorithms, generally with a consistent code style.

- WebKit - Chrome

xfs · on Oct 28, 2020

Chromium's base, which is like STL but better and more feature rich, also with an excellent code browser https://source.chromium.org/chromium/chromium/src/+/master:b....

Also Google's Abseil, very modern C++ patterns and containing some unexpectedly deep designs for apparently simple things. https://source.chromium.org/chromium/chromium/src/+/master:t...

saagarjha · on Oct 28, 2020

WebKit's equivalent, WTF: https://trac.webkit.org/browser/webkit/trunk/Source/WTF/wtf

cpeterso · on Oct 28, 2020

You can browse and search through the Firefox code online here: https://searchfox.org/mozilla-central/source/

aloukissas · on Oct 28, 2020

Haven't been in cpp land for a while, but the chromium source was always my favorite good example.

MorganGallant · on Oct 28, 2020

This may have been mentioned already, but the open-sourced LevelDB codebase is great. Original authors are Jeff Dean and Sanjay Ghemawat, so you know it's going to be really well written.

https://github.com/google/leveldb

mattivc · on Oct 28, 2020

One of the C++ project that have influence my programming the most recently, which is to move to a style that is something between C and C++, is Dear ImGUI: https://github.com/ocornut/imgui

ozychhi · on Oct 28, 2020

Yeah as others mentioned this is between C and C++, arguably taking best of each language. But indeed this library is based around the idea of having performant UI (as they mostly redraw whole screen on each frame, it has to be quite quick, though there are ways not to do this, this is the most basic case) and generic containers do not really cut it, also usage of exceptions is sort of impossible with the way you have to structure ImGui code. So it's more of a codebase that illustrates how to write performant code, not so much modern C++ guideline.

iorrus · on Oct 28, 2020

Do they have a coding style or similar in which this is outlined?

I had a quick look at the code but I would be interested in what "a style that is something between C and C++" actually entails.

Avoiding templates but using classes/namespaces? Minimizing use of the stl?

s9w · on Oct 28, 2020

Kindof. Almost no standard library but it does use namespaces. (almost) no templates. Lots of pointers and arrays. Lots of explicit begin/end or push/pop functions instead of more idiomatic solutions like RAII/scope guards etc. va_ macros instead of variadic templates or fold expressions.

Also while ImGUI is great and the guy follows good coding practices - it's not at all what I would recommend for learning C++. Not only is it technically not C++, but the problem it solves (GUI-Framework and immediate on top) leads to some pretty hard to follow code with a lot of hidden global state.

gpderetta · on Oct 28, 2020

> Lots of explicit begin/end or push/pop functions instead of more idiomatic solutions like RAII/scope guards etc. va_ macros instead of variadic templates or fold expressions.

I'm going to have nightmares tonight.

s9w · on Oct 28, 2020

It's a simple API, that's for sure. But it's pretty consistent in itself and obviously is trivial to build abstractions upon. Also makes it easy to port to other languages.

ncmncm · on Oct 28, 2020

That sounds like uniformly bad code.

iorrus · on Oct 28, 2020

Thanks that’s a great explanation

spacechild1 · on Oct 28, 2020

I've started with C++ a couple of years ago and openFrameworks has been my entry ticket: https://github.com/openframeworks/openFrameworks.

The code base is quite simple and readable, because it consists of several self-contained modules, many of which are just wrappers around existing libraries.

For a "professional" code base, you could have a look at JUCE, which is also very modular: https://github.com/juce-framework/JUCE

Both projects are written in "modern C++" (C++11 and above).

arriu · on Oct 28, 2020

Games might be an interesting one to look at because they have a little bit of everything going on (physics, math, rendering, cross platform abstractions, window management, system calls, file management, memory management, etc...)

Take a look at these:

https://github.com/FrictionalGames/AmnesiaTheDarkDescent

https://github.com/FrictionalGames/PenumbraOverture

crocodiletears · on Oct 28, 2020

I can't speak Frictional's code quality, but historically released titles tend to be a tangle of hacks and short-term decisions that hammer things into working or being performant, coming at the expense of future developers and correctness.

I do agree that the field is full of great code to reference, though.

I'm a huge fan of Godot's codebase. It's well written, and easily understood both architecturally, and in-method. https://github.com/godotengine/godot

Ace17 · on Oct 28, 2020

You can't have a unified opinion here.

There are some pretty good public game codebases (Penumbra/HPL, Quake I/II/III, Doom I/III), there are some okay ones (RedAlert), there are also pretty awful ones (Duke3D, Descent).

Note that this seems to be completely independent from whether the game is good or not!

ncmncm · on Oct 28, 2020

When I looked at Godot I saw all pre-C++11, and everything done the hard way. Did I look in the wrong place?

crocodiletears · on Oct 29, 2020

I wouldn't call it the hard way, but it's certainly pre-C++11. My understanding is that the project's lead programmer insists on using older standards for the sake of portability.

sneeuwpopsneeuw · on Oct 28, 2020

One thing that helped me to get some insides in c++ was the old Command and Conquer source code from the 90's https://github.com/electronicarts/CnC_Remastered_Collection Jason Turner has multiple videos where he takes this old code and transforms it to more modern cpp code https://www.youtube.com/watch?v=Oee7gje-XRc

UncleOxidant · on Oct 28, 2020

Narrowing things down a bit, I'm wondering if there are any good C++17+ code bases to read? I see a lot of recommendations for C++11 codebases here, but a lot has changed since then.

jlrubin · on Oct 28, 2020

https://github.com/bitcoin/bitcoin could be fun to explore some modules (email me if you want some guidance).

I wouldn't say it's particularly "clean", given the constraints on the project, but it does do a lot of things that are interesting from a security/DoS perspective. There are even some SDR related work being done recently for alternative relays!