I was surprised to see `fmt/format.h` on that list, but I do have to admit that the objections seem reasonable. Perhaps because he(?) mentioned wanting to use -O0. Template code is almost useless without optimization. If -O0 is needed then I am surprised that all of the STL doesn't get pitched.
Ok, I was also surprised to see co-routines on the nice list, but I don't have direct experience there. I normally see complaints about them. I would like them to be good because some code is easier to express that way.
> I was surprised to see `fmt/format.h` on that list, but I do have to admit that the objections seem reasonable
The author talks about the code bloat, beacause of "an API that encourages custom formatter specification to live in a template". But at the end he mentions the standard solution to this problem:
> A preferable interface (I use, but also others AFAIK) is to check the type in a template (no choice there), and dispatch the formatting routine to somewhere that lives in a single translation unit.
So what prevents you from doing this with <format>? As I understand, the implementations of parse() and format() of std::formatter don't depend on the template parameters and can delegate to non-template functions residing in one CPP file. You can also provide additional wformat_parse_context/wformat_context overloads if you need wchar_t support.
I guess there’s some legitimate complaint about compile time, but the code bloat issue is simply crazy if you use a linker written since ~1995. And the “fix” is simply to move the common code into a .cc file which the author even mentions.
The alternatives are worse: un-type-checked printf or the horrible stream interpreter system (std::cout << “foo”) which was a cute but bad idea in 1985.
The author talked about its effect on compile-times and I have to say, I agree with him. It's also why I dislike header-only libraries, they also bloat the compile times unnecessarily.
Don't get me wrong, the fmt library is very nice, but you can't deny its effect on compile times.
fmt/core.h has been heavily optimized for build speed and is usually faster to compile than equivalent iostream code: https://github.com/fmtlib/fmt?tab=readme-ov-file#compile-tim.... Once modularized std is available we might be able to be compete with printf.
Wondering the same. Seems like you could provide your own implementation or use a third party implementation. Be curious to see a write up on the bloat, what exactly it looks like
{fmt} doesn't encourage "custom formatter specification to live in a template". On the contrary, if you look at the docs in https://fmt.dev/latest/api.html#formatting-user-defined-type..., none of the examples is parameterized. One even demonstrates how to define your formatting code in a source file. And if your formatters are so big that they meaningfully impact build speed you are doing something wrong. fmt/core.h is heavily optimized for build speed so you can just use it as a type-safe replacement for *printf. That said, implementations of std::format (especially Microsoft's) may not be as optimized for build speed yet. This will likely improve now that the ABI can be stabilized.
Unfortunately, if you're using the standard library, you get this just by switching to the C++20 mode. For example, the committee decided to put tons of std::ranges-related stuff right in <algorithm>.
It isn’t just a nicer API but type safe and much faster at runtime.
Since I rarely compile all my code at once (usually just a single file followed by a re-link) compile time doesn’t matter much. And that’s even though while editing or writing code I don’t have all the slowdown bloat of an IDE so compile time is more noticeable.
In my experience if you're doing perf critical stuff with string formatting you won't be using anything like fmt or printf, and for everything else the runtime difference is almost entirely unimportant.
it's partly because of the engine code. there's even bigger stuff, especially especially if it's a company with any legacy codebase that's 10-20 years old or whatever (e.g. EA / Frostbite.) one i worked on took hours to compile the first time on a machine with 128gb of ram and a threadripper. the onboarding doc suggests getting some coffee at that point haha
a big part of working on them as a generalist ends up being the ability to know how to even navigate something like that (especially since they're often haphazardly documented)
(part of it is that most of the games "fork" the engine rather than using it as a standalone thing)
it's probably not everyone on the team building that whole thing each time, but yea. hundreds of solutions and millions of LOC isn't unusual
*i just did a quick check with unreal's source, it's ~20 million LoC (assuming I didn't mess up the filtering somehow)
The main time killer in day-to-day work on such big projects is usually the linker step, which is terribly slow with the MSVC linker and doesn't benefit from incremental or distributed compilation (not sure though how much the MSVC linker has improved in the last 5 years or so).
Is it necessary to build and link windows games with msvc? Pretend I know nothing about the subject.
Large code bases built and linked with open source toolchains have solved this with, for example, thinlto. And by "large" I mean orders of magnitude larger than the mentioned game.
Not necessary, but definitely the path of least resistance for developing Windows games, and AFAIK MSVC is required for developing Xbox games. Commercial closed source middleware is sometimes distributed as compiled C++ static link libraries plus headers without being able to recompile yourself.
Genuine question, what code base is 100 million lines of code (I think that is the meaning an order of magnitude bigger that 11 million LOC). I can not comprehend how that much code is anything but dead weight saddled on ‘generations’ of a business’s employees.
If it helps, even by 10 million LOC (assumedly most of which isn't unused legacy cruft) you're starting to dig deeply into templates and codegen instead of actual manual coding. I'm sure with all this AI stuff that will only get larger and larger.
But yeah, outside of some of the largest of FAANG software, there aren't a lot of codebases that can truly justify 100m lines of code. I'm sure GTA VI is over 100m lines but can be cut down to 10m if that was a priority (it never is)
You can use clang for windows builds. It’s what I use for my games. However, I assume that most popular engines might not compile on clang+windows for random reasons.
It's a lot more common than you'd think. I'm not in gamedev but similarly weird (multiple supported userspaces and OS's for embedded an device line) our "full build" is probably approaching 50M+ lines and only quite recently do people do incrementals from a build server snapshots. No bazel or distributed ccache or anything.
Especially for games or OS development, you might have shifting toolchains and SDks. Different teams may move out of sync because different teams want different things at a given time.
I just cleaned out my build folders and did a full build. 40M+ lines, 3 mins and 35 seconds. If you're not getting similar speed then maybe you should look into adding more machines. Last time I was on game dev the best you could do was share other programmers machines via incredibuild. No thought about adding more machines just for building or using cloud infra
We've been around 1mloc for the Drakensang single player and MMO games, and that was 15..5 years ago, with a relatively small team (up to 20 programmers), and budget-wise far away from what's considered an AAA production.
While I wouldn't want to work on a 10mloc C++ code base either, it sounds totally realistic to me.
realistic, but probably far from "optimal" (not that LoC is a good target for anything other than complexity). There simply isn't much incentive to properly refactor legacy code into something more managable. Rather eat the cost with devs ramping up and working around the cruft.
most of the builds are incremental, but even incremental builds still take awhile when it's that massive (linking, modifying a header / template code, etc)
True, but linker times still suck and don't parallelize well.
Also, sometimes you need to iterate on some very core .h file and touching any of those brings the whole house of cards down and triggers a full or nearly full rebuild.
I'm a junior in uni, and I hate it when I say "Yeah we learned this technique in the C class, but it's UB in C++ so please rewrite that" in reviewing friends' codes that do type-punning with unions.
So I'm also very happy with the 'std::bit_cast' in general.
BTW how about std::is_constant_evaluated()?
I assumed it would help folks who do heavy physics simulations, but looks like not listed in the article.
TBF, I have yet to see a C++ compiler where the union type punning trick doesn't work, there would be a lot of broken code if real-world compilers would change the current behaviour no matter what the standard says.
Of course now that std::bit_cast exists it's the safe thing to do (but then there's still C code that's compiled in C++ mode which was even recommended by Microsoft because the Visual Studio team couldn't be bothered to keep their C compiler in shape until a little while ago).
> I have yet to see a C++ compiler where the union type punning trick doesn't work
The problem isn’t that compilers won’t implement the feature (that would take more work); the problem is that it’s processor-specific.
The spec doesn’t mandate many specific bit-ordering layouts (some are, such two’s complement representation being mandated which was just added, &obj == &base, I think nullptr has to be 0, etc) rather than trying to make everything a PDP-11.
where the order is not related to the order the fields appear in the struct (that has to be maintained for ABI reasons), and not all fields need to appear (the others are initialized with 0/NULL).
...all those limitations taken together, and the C++ designated initialization feature is pretty much useless except for the most trivial structs - while in C99, designated initialization really shines with complex, nested structs.
The funny thing is that none of those limitations would be required. Clang had supported full C99 designated init in C++ mode just fine for many years before C++20 appeared.
Oh man, I've desperately wanted both of these features in basically every language I've used.
I think you can do a similar sort of array initialization in C#, but definitely not chained initializers. Those are both so useful, but I can see why they aren't included in C++/++
C++ isn’t C and has different structure semantics. Members are initialized in the order defined, which means you can write
struct foo {
int a = 0;
int b = a+1;
}
If the compiler just did the initialization in the order of declaration, regardless of the order in the initialization list this would not do what you expect:
struct obj {
int a;
int b;
}
int ival = 0;
auto o = obj {.b = ++ival, .a = ival};
o.a would not equal o.b.
I would like to have the initialization syntax of C because then one could reorder elements (say for packing reasons) and the designated initialization would “just work”…except it wouldn’t.
C++ designated initialization does buy you two things: 1- documentation, but more importantly 2- if you do reorder a struct or class data members the compiler will warn you that your initialization lists are now invalid rather than silently failing. I don’t know how to even find them all in a large code base any other way!
I think an exception might be made for a plain "C-like" struct that doesn't initialize members or contain anything except basic types. In the specific example[0] the code is actually surrounded by extern "C" { ... } so I suppose that the compiler "knows" this is a plain C struct? (Does extern "C" change parsing rules? I will need to look at what GCC does)
extern "C" { ... } unfortunately only turns off C++ name mangling for symbols, it doesn't switch the C++ compiler into "C language mode". Such a feature would be incredibly useful though for mixing C and C++ code in the same source file.
Destructors will execute in the reverse of declaration order, so if initialization order doesnt match declaration order, and some members depend on each other somehow, things will break. At the very least, it could be surprising. Not a problem in C where destructors don't exist.
I think, as usual this was the compromise that the committee was able to agree on above all objections. There is stills the possibility that the rules are relaxed if there is agreement. But somebody has to do the work to push it through standardization.
I also thought that the behaviour as standardized was useless, but recently I started writing more minimalist code eschewing constructors where aggregate initialisation would suffice, and I haven't really missed the ability to reorder initializers or skip them.
Initialization in C++ is already a mess. Making one of the core behaviours (members are initialized in declaration order) work subtly different for this case would make it even more difficult for the programmer to build a correct mental model.
From what I can tell, the snippet you posted would compile fine in C++20 mode.
This would be a big footgun though. That already happens for initialising class members in constructors and it's enough of a footgun that compilers have a warning to tell you if your initialisation order is different to the declaration order.
Doesn't the same footgun apply to function call arguments which are evaluated in an unspecified order?
f(a++, b++)
That footgun doesn't seem to have very much impact in the real world that I've seen. Largely because people do not write complex expressions like that anymore. Since we already mostly avoid such expressions, we may as well take the benefit for designated initializers.
Yeah that is a footgun too. Enough that Rust defines argument evaluation order.
It is a much smaller footgun though and I don't recall ever being bitten by it. I have definitely been bitten by member initialisation order though multiple times. Generally because of questionable designs where one member is passed as a parameter of another member and it hasn't actually been initialised yet.
Really I think the answer is to initialise in the order that initialisation is written (this is what Rust does) rather than declaration order. But that would be a breaking change so I guess they opted for the conservative choice.
> Personally, I find code that leverages ranges harder to read, not easier, because lambdas inlined in functions introduce new scopes that have a strong non-linearizing effect on the code. This isn’t a criticism of ranges per se, but certainly is a stylistic preference.
Does anyone know what “non-linearizing” means here?
I assume “code outside the lambda runs first, then code inside the lambda maybe runs later, maybe runs multiple times, maybe doesn’t run at all”.
It can especially create problems when the lambda captures a variable by reference which gets mutated and/or deallocated before the lambda runs, and the developer didn’t plan for mutation or deallocation.
Or (a problem with lambdas, but not “non-linearizing”), if the lambda captures a variable by value (copies the value) and mutates it, and the developer expected the mutation to persist outside the lambda.
This was my first encounter with the three-way comparison operator (<=>). Can someone give a practical use case? There must be one for it to be included in the spec, but I'm not seeing it.
But the sort answer is all the other operators are automatically generated from that one if it is defined. So it makes the code simpler. And for many types <=> isn't much more complicated than the others
> Signed overflow/underflow remain UB (and it’s understandable that changing this behavior would have dramatic consequences)
I think that the dramatic consequences are only understandable if you succumb to mimetic contagion.
The consequences are real but not dramatic and possibly not even measurable in many workloads.
It just means that you’ll have an extra sign extension (one of the cheapest ops the CPU has) in a subset of your loops, namely the ones that had a 32 bit signed induction variable and the compiler could reason about that variable but only if it also could assume no wrapping. That’s a lot of caveats.
Most loops will be unaffected by making signed integer overflow defined. Anything that’s not in a loop will almost certainly be unaffected by this change. If you use size_t as your indices then you’ll definitely be unaffected.
So yeah. “Dramatic consequences”. I wish folks stopped exaggerating. There’s nothing dramatic here. It’s a fraction of a percent of perf maybe.
> a 32 bit signed induction variable and the compiler could reason about that variable but only if it also could assume no wrapping.
(Amateur C programmer silly question) I think I understand it as if we increment the variable (i+10) and use it in an if condition. With UB the compiler could skip that code altogether and assume it will never be reached?
The compiler has to assume that I+10 won’t overflow by virtue of I never being big enough. So, it’ll emit all of the code and UB won’t come into play.
It’s more like this. If you say A[I] where I is 32 bit signed and you’re on a 64 bit target, then this lowers to:
- sign extend I to get a 64-bit value
- multiply it by the size of A’s element type
- add that to A
- then do the access
The last three steps will be just one instruction in the common case on arm and x86. The first step will require a separate instruction on x86.
The compiler can kill the sign extend if it’s sure that the integer value cannot be negative. That’s hard to prove. But you can almost prove it if you see code like:
for (int i = 0; something; ++i)
It looks like i starts out as zero and only grows! So it has to be positive! So if you say A[i] then no sign extend needed!
But wait, what if ++i overflows?
With signed int UB, the compiler can just assume it won’t overflow. And then it can prove that i is nonnegative. And then it can kill the sign extend on those CPUs where it’s not free, like x86.
I’m a compiler writer. I know how valuable this optimization is. Namely, it’s the tiniest of benefits on some program/CPU combos. Modern languages like Java or Swift just give this well defined semantics and call it a day because this isn’t a good hill to die on. Fucking up the language isn’t worth 0.3% on some stupid benchmark, period.
> Modern languages like Java or Swift just give this well defined semantics and call it a day because this isn’t a good hill to die on. Fucking up the language isn’t worth 0.3% on some stupid benchmark, period.
I saw some projects just opt to use -fwrapv as a gcc option. clang I think has the same one too now. The docs for gcc mentions [1] "This flag enables some optimizations and disables others"
I agree with this, and I will take it a step further: people who want the compiler to know the ranges of their variables to enable better optimizations should use the various "assume" attributes/builtins which have existed for a long time. The compiler can do way more if it knows your loop count is max 5 than if it knows it is max 2 billion.
Is it just me, or the worst part of coroutines is lack of tooling around them? Whenever I get a crash in a coroutine, the "stacktrace" is totally useless and doesn't actually show where the crash happened, just some boiler plate code around executing some continuation which doesn't refer to real code that you wrote.
more or less agree, although this issue isn't even really unique to C++. in practice it's still worth it imo, since debugging callback heavy stuff isn't exactly fun either
You'd still have lines referring to a callback you've writtem. I found it that with coroutines, not even a single stackframe refers to my code , other than the one starting the loop.
code that uses coroutines is often a lot more straightforward than callbacks. with callbacks, it can also be non-trivial to even find out where / when some delegate was set. it's also often difficult to even navigate / read a single chain of actions without having to jump back and forth across several files, while also needing to follow around a trail of random variables. i'd still make the trade over to coroutines every single time. i'm definitely biased, though. i really really hate working with callback heavy code
(what i've usually done for hunting down coroutine errors is use a tracing profiler, or i guess just printf debugging. it is probably the worst part still, tho)
Lately I've been under the extreme temptation to rewrite my game engine in Rust.
I crave the ergonomy of rust development. I use Rust at my job (not game dev) and it sucks to switch back to C++ for my side projects
But I resist for the moment, because I fear it won't be easy as I predict and it would delay my projects.
I already started using this list of features and refactored most of my code for c++20. I hope C++ will continue on that path and catch up Rust. But there are still so many things missing
In the meantime I refactor little by little my C++ projects to be "rust ready": hierchical ownership, data oriented with minimalist oop. So the day I can't resist no more I will be able to quickly rewrite it in Rust
Rust doesn't allos dynamic libraries in general, so it isn't going to work where (right or wrong) the code is based on plugins. You can work around this with C api interfaces, but that limits you if both sides are rust. (unsafe for what should be safe as I understand.)
Instead of literally loading the plugin into the same address space as the host process, you could launch it as a separate process and use sockets or shared memory to communicate. With a slight shift in API design this might actually be better than traditional dynamically loaded plugins. It will tend to level the performance playing field across languages (e.g. no JNI penalty for plugins written in a JVM language). And it will prevent plugins crashing the host.
You're right, but that's also not the entire workload. You have to also wake up the remote process in some way and, depending on the expected duration, probably wake up for the response. On pretty much all major platforms that work is measured in 10s of microseconds, which puts you easily into the 1000x slower than a non-inlineable function call category.
One common use case for DLLs in gamedev is for code hotloading. You can just recompile your game DLL and unload / reload the library, patch up some globals and vtable pointers and voila, your game logic has been updated without restarting the game. And all that you get just writing normal C++.
This is only for development. Shipping builds will usually statically link everything.
But, as far as I understand, the boundary layer has to still be C (the side that loads DLLs and stuff), because of the natural limitations of templated languages and linkers.
And as soon as you change any interface you'd need to recompile more parts of the code. The same can be applied with Rust using dylib. At the end, the glue code always end up being C.
On any given compiler you can make C++ have a stable ABI. You can even do this commonly in practice across compiler versions, even, the standard library typically tries to achieve this.
It's easier to have a long term cross version cross compiler stable C ABI, but if you're talking a single toolchain that simplifies the problem tremendously and you can absolutely do that with C++ in practice at that point
If you don't care about the ABI being stable then you can use a Rust-based ABI, but you're essentially just static linking everything then. Not sure how that works out for the game developers.
Ok, I was also surprised to see co-routines on the nice list, but I don't have direct experience there. I normally see complaints about them. I would like them to be good because some code is easier to express that way.