> So you will need to know C++ if we want to be the cutting edge and need extrem...

CyberDildonics · on April 19, 2024

I don't even know what you are trying to say here. C++ is great to program in and has a fantastic eco system of tools and libraries. It is something you can safely base a business around.

You won't get "extreme performance" from C++ because it is buried under the weight of decades of compatibility hacks.

This doesn't even make sense. What is an example of something that can't be fast because it is done in C++? What "compatibility hacks" are slowing programs down?

This does not sound like something someone with experience in C++ would say.

tialaramex · on April 19, 2024

> This doesn't even make sense. What is an example of something that can't be fast because it is done in C++? What "compatibility hacks" are slowing programs down?

Let's look at two very different perf leaks in ISO C++. Firstly move assignment which is right at the heart of the language. Initially C++ doesn't have move semantics at all, which is a problem because in a bunch of cases that's key to "extreme performance". So in the "lost decade" period between C++ 03 and C++ 11 considerable work was done to figure out a way to unlock this, but of course C++ insisted on backwards compatibility with the code written for C++ 98 with no such semantic.

The result is that the actual move semantic people wanted can't quite be done, this is now called "destructive move" and proponents of the compatible option C++ 11 and later had claim it's equivalent to the move they delivered, plus delete. But that's... disingenuous. The C++ 11 move is actually (move + create), so to build "destructive move", when that's what you need, you must write (move + create) + destroy. Sometimes the compiler can see what's happening here and emit the same machine code you'd get in a language with native move semantics, but sometimes it has no way to figure this out, and you're emitting move + create + destroy, which is markedly slower as well as of course being more error prone.

Secondly right up at the surface, where we can feel the rain, and appropriately something Bjarne Stroustrup (author of the book we're talking about here) has noticed himself, the provided growable array, std::vector lacks the bifurcated reservation. Vec::reserve_exact isn't enough, you need Vec::reserve as well for good performance with a growable array API, but C++ doesn't provide this, it provides only Vec::reserve_exact under the name "reserve" and there's no room to offer both in the compatible API, which means as a user/ application programmer this fundamental type is a small perf leak.

Bjarne shrugs this off, oh well, just don't use the reserve API to get performance benefits. But everybody else can do so, this choice means C++ is leaving it on the table for somebody else with a better growable array type.

Finally though, I want to look at a beacon over the horizon, where "extreme performance" really is something they're thinking about. Iterator Loops (sometimes "Chunk Loops") are a technique where the language lets you directly express in your definition of a loop how it can be performed in a SIMD-fashion, so e.g. you write the code to search an N byte buffer for the byte you're looking for, but then revisit that loop and also write code for searching N=M*16 bytes, using 16 bytes at once. This lets you write, in a high level language, code which will be easy for a compiler to SIMD accelerate without fragile idiom recognition. C++ of course does not provide Iterator Loops and instead you'd reach for manual inline assembler in these cases today.

CyberDildonics · on April 19, 2024

Who is filling your head with all this? I have never seen any program slow down over a move, because you should be moving enough data that copying a pointer doesn't matter anyway. This is trivial stuff. Show me in an actual program somewhere that shows what you're talking about.

Vec::reserve_exact isn't enough, you need Vec::reserve

Reserve reserves the amount that you give it. This isn't hard or complicated. What exactly do you think it should do differently, and how hard would it be to write your own? Vector is a simple data structure. Anything you don't like about it isn't a language limitation.

this fundamental type is a small perf leak

Show me a program that illustrates what you are talking about.

I want to look at a beacon over the horizon,

You can do this with intrinsics or libraries that use intrinsics.

the language lets you directly express in your definition of a loop

This isn't a performance limitation it's a convenience you want integrated into the language. There is one that does this, it's called ISPC. Is that what you use?

This seems to me like you've gone down some sort of anti C++ rabbit hole where people who don't know what they're doing get worked up about things that don't matter.

Meanwhile in the standard library there is actually the unordered_map design which is stifled by algorithmic complexity requirements, but people use flat maps to get around that.

tialaramex · on April 20, 2024

> I have never seen any program slow down over a move

I'm sure, and yet the people who care have noticed that C++ is slower, that's what P1144 and P2786 and so on are trying to solve, without quite saying "destructive move" out loud.

Here's the current iteration of P1144: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p11... Like pointer provenance this is something WG21 would prefer not to think about because it's embarrassing, so it has hung around like a bad smell for quite a few years but I suspect (unlike provenance work) it'll get into C++ 26 in some form.

> Reserve reserves the amount that you give it. This isn't hard or complicated. What exactly do you think it should do differently

It's OK, after all Bjarne Stroustrup didn't see this either, but you've missed something quite important

The whole idea of this type is amortized constant time growth, typically via doubling although any fixed ratio would work and there could be a benefit to choosing other ratios in principle (Folly has a long paper about this). For this purpose it's crucial that capacity doesn't grow linearly. But if our only reservation mechanism is Vec::reserve_exact (aka C++ reserve) we end up with linear growth. Bjarne's solution is to abandon use of reserve for performance, but we can do much better by just bifurcating the API as Rust did (and as some other languages do)

I started writing a program to illustrate this, but it's probably easier to just spell it out with an example. Suppose we receive Doodads over the network, they arrive in groups of say up to 20 Doodads at a time, and we don't know in advance how many there will be, until the last group indicates it's the end of the Doodads. Typically in total there's maybe 40-50 Doodads, but there can be as few as five (one group of just five) or as many as a thousand. We're going to put all the Doodads in a growable array as we receive them. Let's walk through receiving 19, 14, 18, 12 and finally 6 Doodads.

Bjarne says don't bother with reservation as in C++ this doesn't work for performance, so we just use the "natural" doubling. Our std::vector allocates space for 1, 2, 4, 8, 16, 32, 64 and finally 128 Doodads (eight allocations), and does a total of 127 copy Doodad operations. Surely we can do better knowing what's coming?

If we ignore Bjarne's advice and use C++ std::vector reserve to reserve for each group, we allocate space for 19, 33, 51, 63 and finally 69 Doodads (five allocations), and we perform 19 + 33 + 51 + 63 = 166 copy operations. Fewer allocations, more copies.

If we have the bifurcated API, we allocate space for 19, 38 and 76 Doodads (three allocations) and we do 19 + 33 = 52 copy operations. Significantly better.

> This isn't a performance limitation it's a convenience you want integrated into the language

I disagree and the results speak for themselves.

> There is one that does this, it's called ISPC. Is that what you use?

I've never used ISPC. It's somewhat interesting although since it's Intel focused of course it's not actually portable.

> This seems to me like you've gone down some sort of anti C++ rabbit hole where people who don't know what they're doing get worked up about things that don't matter.

It's always funniest to read about what "doesn't matter" right before it gets fixed and suddenly it's important. The growable array API probably won't get fixed, education means that Bjarne's "Don't use reserve" taints a whole population so even if you fix this today it'd be years before the C++ programming community use reserve where it's appropriate but I expect "relocation" in some form will land, and chances are in a few years you'll be telling people it's why C++ has "extreme performance"...

CyberDildonics · on April 20, 2024

You said You won't get "extreme performance" from C++ because it is buried under the weight of decades of compatibility hacks.

Now your whole comment is about vector behavior. You haven't talked about what 'decades of compatibility hacks' are holding back performance. Whatever behavior you want from a vector is not a language limitation.

You could write your own vector and be done with it, although I'm still not sure what you mean, since once you reserve capacity a vector still doubles capacity when you overrun it. The reason this is never a performance obstacle is that if you're going to use more memory anyway, you reserve more up front. This is what any normal programmer does and they move on.

Show what you mean here:

https://godbolt.org/

I've never used ISPC. It's somewhat interesting although since it's Intel focused of course it's not actually portable.

I guess now the goal posts are shifting. First it was that "C++ as a language has performance limitations" now it's "rust has a vector that has a function I want and also I want SIMD stuff that doesn't exist. It does exist? not like that!"

Try to stay on track. You said there were "decades of compatibility hacks" holding back C++ performance then you went down a rabbit hole that has nothing to do with supporting that.

tialaramex · on April 20, 2024

Actually first my comment explicitly talks about move, but you just decided you don't care that C++ move has a perf leak and claimed that you didn't notice so therefore it doesn't count, which I guess could equally apply to somebody who wants to claim Python has extreme performance, or Visual Basic.

I deliberately picked two examples from ends of the spectrum, a core language feature and then a pure library type. Both in some ways of equal practical performance necessity.

> The reason this is never a performance obstacle is that if you're going to use more memory anyway, you reserve more up front.

We often don't know how big a growable container will finally be when we're adding things to it, and so without travelling back from the future to tell ourselves how big it will grow this is useless even though we know how much we're adding right now. This is the essence of the defect in std::vector

Here's the demonstration I wrote about in my previous post, no I am not going to build a replacement for std::vector with the correct API in C++ so this is Rust where the appropriate API already exists. It provides a policy knob, so you can pick Bjarne, Cpp or Best in the main function to see what happens for yourself.

https://rust.godbolt.org/z/16qooGo69

CyberDildonics · on April 20, 2024

you just decided you don't care that C++ move has a perf leak and claimed that you didn't notice so therefore it doesn't count, which I guess could equally apply to somebody who wants to claim Python has extreme performance, or Visual Basic.

Are you seriously implying copying a pointer makes C++ like python or visual basic in speed? Where did you even get these ideas? Show me any program anywhere, any github ticket any performance profile where a C++ move is somehow a performance problem.

I looked at your link and I still have no idea how doubling the size of std::vector "destroys" amortization. The std link you had before wasn't even about this, it was about memcpy.

Please link the origins of where you are getting this stuff. It doesn't sounds like there is some niche rust forum where people are looking for anything, no matter how far fetched to pretend C++ has a performance problem.

Meanwhile literally everything that needs performance is being written in C++. Why aren't codecs and browsers and games written in something else?

tialaramex · on April 20, 2024

> Are you seriously implying copying a pointer makes C++ like python or visual basic in speed?

No, but I get the feeling you really do think that "copying a pointer" is somehow what's at stake here which suggests you've badly misunderstood how move works on C++ in general.

> Show me any program anywhere, any github ticket any performance profile where a C++ move is somehow a performance problem.

There's a CppNow talk from 2018 or so in which Arthur demonstrates a 3x perf difference. That's obviously an extreme case, you're not magically going to save most of your runtime by fixing this in most software, but it shows this is a real issue.

Since you're the one who believes in "extreme performance" you're probably surprised fixing this wasn't a priority. But P2137 gives a better indication of the status quo, or rather, what happened to the paper does rather than what's inside it. You can read the paper if you want (if you don't recognise at least most of the authors that means you're way past your depth for whatever that's worth) https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p21... -- WG21 gave a firm "No" to that paper, which is what it was for. C++ is not a language for "extreme performance". It's a language which prizes backwards compatibility. The authors, or rather their employers, needed to be explicitly told that. No, if you want perf you are in the wrong place just as surely as if you want safety, use a different language. C++ is for backwards compatibility with yet more C++.

> It doesn't sounds like there is some niche rust forum where people are looking for anything, no matter how far fetched to pretend C++ has a performance problem.

Indeed, this isn't the product of a "niche rust forum". It's WG21, the C++ Standards Committee.

I think the problem is on your end, C++ demands a very high price in terms of safety, ergonomics, learning curve - and to some people that means it ought to be really good. Otherwise why such a high cost? Those are the same people who figure if they paid $500 for a T-shirt that must mean it's a good T-shirt. Nah, it just means you're a sucker.

> Why aren't codecs [...] written in something else?

This is especially frustrating because C++ is an incredibly bad choice for this work, as you'll have seen with the incidents at Apple. Nobody should be choosing the unsafe language with less than stellar performance to write codecs in 2024, and yet here we are.

And yet, most of the time when somebody even realises they shouldn't write codecs (and file compression and various other similar technologies) in C++ their next guess is Rust which while obviously an improvement over C++ is hardly a good choice for this work either.

The sad truth is that often it's inertia. We used this crap C++ code in 2008, and we re-used it when we refreshed this product in 2018, so it's still C++ today because nobody changed that.

CyberDildonics · on April 21, 2024

All these rants are a classic case of some sort of emotional investment that isn't about evidence. Your links and evidence either don't apply to what you're talking about or they are vague references to "find it yourself" in something large.

First, you are now ignoring your own claims that "amortization is broken" when your own link just showed normal doubling and I asked what was supposed to be wrong. Your link before was about a trivial copying attribute not the doubling of a vector's size.

There's a CppNow talk from 2018 or so in which Arthur demonstrates a 3x perf difference.

A "3x perf difference" compared to what? Prove it and link a timestamp. This barely makes sense. It's a vague claim with vague evidence.

It's WG21, the C++ Standards Committee.

Now your evidence that "fixing this" (no specific of what 'this' means) is linking the entire 2020 iso C++ plan? This is one step removed from the classic "I'm not going to do your homework, google it yourself".

I think the problem is on your end, C++ demands a very high price in terms of safety, ergonomics, learning curve - and to some people that means it ought to be really good. Otherwise why such a high cost? Those are the same people who figure if they paid $500 for a T-shirt that must mean it's a good T-shirt. Nah, it just means you're a sucker.

This seems like some personal frustration. When I write C++ it's very simple and direct. Small classes, value semantics, vectors, hash maps and loops.

Then your 'answer' is that nothing is good enough and nothing works. You say C++ is a bad choice for codecs, yet half the planet is using video codecs written in C++ all day every day. You ignore browsers and games being written in C++ too. You have no solutions, it's just that everything sucks but you can't be bothered to even write your own vector.

This is just your frustrations wrapped in rants with some hand waving non evidence to act like it's based on something other than emotion.

6equj5 · on April 21, 2024

> And yet, most of the time when somebody even realises they shouldn't write codecs (and file compression and various other similar technologies) in C++ their next guess is Rust which while obviously an improvement over C++ is hardly a good choice for this work either.

Not C++ nor Rust?! What then, tia? C? Ada??

tialaramex · on April 21, 2024

It's certainly a conundrum isn't it. What programming language should we use for Wrangling Untrusted File Formats Safely ?

https://github.com/google/wuffs

_gabe_ · on April 20, 2024

In addition to what CyberDildonics already said, C++ is also an (almost) superset of C. You can also inline assembly. If you find particular hot loops that need to be optimized at any point while profiling, it’s trivial to drop down levels of the stack to get the extreme performance you may need. There is no FFI needed in C++ to drop down a level. There is no barrier at all. You can write your code directly for the CPU, inline it into your code, and fine tune it as much as you want. I don’t know how that wouldn’t qualify as extreme performance.

tialaramex · on April 20, 2024

> You can also inline assembly.

That's not C++ any more, you can inline assembly into several other languages for whatever that's worth, which isn't much in this context. "But I could use assembly language" is no more C++ having "extreme performance" than "But I could book a minicab" would give the London Underground "24/7 service".

slekker · on April 19, 2024

A more extreme version of this would be COBOL right? Even if people stopped writing C++ today the amount of code that's already written will last many lifetimes.

tialaramex · on April 19, 2024

Sure, I do not advise people to go learn COBOL either. In fact, I never learned COBOL, they were writing COBOL at the first place I "worked" as a teenager† and it was clearly not the future.

† in the UK there was a discrepancy between what happens to the sort of teenager who exhibits talent and interest in an area like writing software, who is sent to just watch adults doing that and mostly doesn't do any actual work themselves as "Work Shadowing", versus those whose direction seems more... manual who are expected to actually go do stuff in the same period, supervised by adults of course, but still very much doing the actual work, "Work Experience". This seems very obviously unfair, although of course as the teenager who wasn't expected to actually do much I wasn't complaining at the time...

boppo1 · on April 19, 2024

What is the future?

tialaramex · on April 20, 2024

What is the future now? Or what was the future when I was watching grown-ups programming in COBOL over thirty years ago?

Thirty years ago the future was C++. How did that work out? Seems like it was pretty popular, this thread is about the third edition of a book about it.