My comment for too many years is that C/C++ fails to deal with three issues: "How big is it", "Who owns it", and "Who locks it". C++ has, with difficulty, made progress on "How big is it" through templates, but raw pointers keep leaking out. "Who owns it" has been tough to deal with, although "owned pointers" at least try. "Who locks it" has yet to be addressed at the language level, although at least there's now some agreement on multiprocessor semantics at the language level. The old line was that locking is an operating system problem.
After years of C++ and six months of Rust, I don't think C++ can catch up with modern languages with template gimmicks. Getting pointers right needs global analysis. Getting decent error messages about inconsistencies between point A here and point B way over there requires global analysis. Trying to build a borrow checker with the C++ template and type system is pounding a screw.
Rust has its own problems. Figuring out how to do something safely can be quite difficult. It can involve solving puzzles, and often involves rewriting things at several levels, especially when threads are involved. As a result, "unsafe" is too often used as an escape hatch when someone can't spend the time to get it right. Those who slave under the whips of "agile" may be forced to such hacks.
> three issues: "How big is it", "Who owns it", and "Who locks it"
The issue with this kind of thinking is the belief that there is a "it", not "they".
Say, you are writing a HTTP server [1]. A beginner mindset (what seems to be demonstrated in this post) with start allocating left and right, for every HTTP header, for every piece of string, for every piece of metadata record, etc. Thus, "how big is it" become no bigger than it needs to be. The lifetime of these allocations and deallocations will be made as "narrow" as possible, meaning that the programmer will try to deallocate as soon the they stop needing the data stored; "who owns it" becomes the user of the data itself. And so on.
However, think of an alternative strategy, something which game and embedded developers have been using for decades. You allocate a memory arena when the HTTP request comes. From that point on, every "allocation" is equivalent to bumping a pointer in that arena. If the arena gets full, we allocate another and chain it to the previous one, as a linked list. No deallocations are performed (or could possible be performed) until the Request is parsed, relevant processing is done, and Response is constructed and sent. At this point, the entire arena chain associated with that particular request-response cycle is deallocated in one go.
How big is it? We don't care, smaller than the arena.
Who owns it? That particular request-response cycle.
Who locks it? No one, if different request-response are parallelised wrt each other; otherwise, depends on the nature of concurrency/parallelism.
[1] Usually, I would have given a gamedev example, and then people would have said that it only works in gamedev. That's why I have tried to give a webdev example, considering the majority demographic on this website.
> You allocate a memory arena when the HTTP request comes.
The per-request-arena concept is one of the most powerful and IMO underappreciated tools out there for dealing with memory-lifetime issues. I've used it or advocated for its use at multiple jobs and projects, quite successfully, even for kernel code. Windows NT uses it in IRPs. Network code in both BSD and SysV did something pretty similar, at least in the long-before-Linux days. It's tried and true. Having a clear moment when things should be freed and an equally clear description of what those things are is wonderful.
That said, even games and network servers and storage drivers often have other complex data structures not related to individual requests. It's usually not hard to come up with rules for each of these, but the rules forced on you by something like STL might not be as convenient, performant, and/or verifiable as those you come up with yourself.
I hope that some day programmers will stop using memory-unsafe languages except when they really need to, and that when it is necessary those languages will give them the tools to solve those problems themselves instead of pretending they're already solved. They're not. Rust's borrow checker plus Zig's arena concept plus a couple more pieces might actually get us there. C++'s feature salad never will.
Maybe I didn't express myself properly (English is not my native language), but the point wasn't that everyone should use Memory Arena™ instead of Smart Pointers™ or Borrow Checker™. I am not trying to sell a conference here :)
The point was, to put it in Mike Acton's words, "when there is one, there are many". Usually, things are grouped, perhaps semantically, and it make sense to think of them as a group rather than breaks them in pieces. Arenas are one rather simple grouping mechanism, but they are not the only one. Of course, explaining complex grouping will require explaining the associated problem-solution pair, which is obviously out of scope of a forum reply.
All I would say is, if one steps out of one-at-a-time thinking, and statically plans the dynamically happening data transformation, and therefore exploit patterns and relationships inherent in that data flow, programming in low level languages becomes much easier.
But you need to know the problem (really really know it), and you need to know the solution. Temet Nosce.
Um, yeah, but there are other issues that make it not so nice. And it's an inoperative concept for most kernel/embedded work. I'd say in most contexts where forking would be OK you should be using a GC language anyway.
> And it's an inoperative concept for most kernel/embedded work
I recently integrated mini_httpd (a small forking server) into an embedded system as part of a solution for distributing updates across a cluster of these systems.
You must be talking about 15 cent microcontrollers (that still have a TCP/IP stack with SSL) or something, not ARM cores with MMU's running Linux.
I said most kernel or embedded. It's lovely that there are all sorts of embedded devices nowadays capable to support a style of programming indistinguishable from a general-purpose system, but that's not the only or even most relevant case. There are also billions of devices out there - not just 15-cent microcontrollers either - that don't lend themselves to that style because of real-time or other requirements. Even within a conventional system, there's a ton of stuff in the kernel - pre init, board support, most storage and networking - that can still use a per-request-arena approach to good effect but can't fork a process.
As I said, if you can fork all the time you probably shouldn't be using a memory-unsafe language anyway. Nothing you've said so far has suggested otherwise. The basic techniques for implementing your own memory safety are still valuable and worth discussing for people who aren't in Easy Mode all the time.
This subthread is strictly about web serving. Requirements combinations like "HTTPS serving in pre-init or board-support code" or "real-time HTTPS serving in under-powered embedded board" are not practical or relevant, unless we replace "HTTPS serving" with something else.
With regard to the other point, if we are using a memory-safe language, then we can use fork to get an instant arena, regardless of how that language performs resource management under the hood. Whatever allocations happen in the request, of memory or file descriptors, are reliably gone when that exits. If there is any problem in the implementation of the memory safety, the failure is contained to a process. Thus we have an additional reason to use process containment: not having control over the memory management, and not trusting it 100%.
A meta comment, not really related to your point, but I found it amusing in context.
You're writing about how doing lots of little things in an HTTP server is inefficient; and you're replying to John Nagle, who contributed the Nagle TCP optimization that debounces tiny sends with a little bit of delay, in the hope of doing more work in one go!
(On topic of your HTTP example: this is where generational GC can shine. Ideally one of your younger generations covers the full allocation cycle for a request & response, so that nothing survives the collection and it's ultra-cheap to collect since tracing the roots doesn't find anything. The nice thing about GC instead of an explicit arena is that it's global, so you don't need to contort all your library calls to ensure they're using the right allocator.)
Game dev here. Try reading the Godot code base. It's tiny allocations all the way through.
It's really quite bad.
BUT
I'm using it professionally and it's useful. It's hard for me to mentally reconcile how good and bad the creator of Godot simulatenously was (is?).
Casey also goes way too far in his take (as usual).
RAII is a method to guarantee safety and correctness. It's useful in many many scenarios.
Game dev and c++ is a great area to hone your performance skills. And it's awesome to really squeeze the power out of a CPU, but doing that is only a small small piece of the game development journey. Ultimately, you must make a product that is entertaining.
PS: Mike Acton's talk is really only tangentially related. He's just talking about uber perf in general.
As productive and useful as Godot is (it is an awesome engine), its design resembles game engines of the late 90s to early 2010s when OOP was all the rage. Back then, most game engines were written that way (lots of tiny heap-allocated objects, connected by shared pointers). It's really just since the 2010s that the CPU/memory gap is the main driving force of game engine design (which wasn't much of an issue in the late 90's).
On the other hand, many games don't need to juggle more than a few dozen to a few hundred dynamic instances of one "thing" (outside of the particle- and animation-systems at least), so for most games a modern ECS design is definitely overkill (except for some specific parts of the game). Providing a good "game building workflow" in the editor is definitely more important.
Hey Floh, love your work! It's validating to hear you say these things. For me, the biggest annoyance with Godot's code base is the lack of clear ownership semantics (no use of smart pointers), and the use of fully hand rolled collection types for everything. It makes it hard to inspect variables in a debugger! Also the performance of the collection types are bad because of all the allocations. Seems like those problems will never get solved as it would be too expensive to rectify.
A fairly common way is to first implement experimental gameplay logic in a scripting- or visual-language, and once it works, extract the performance-critical parts into "proper" C++. Basically, the common lower level gameplay building blocks are written in C++, glued together by some higher level mechanism (like noodle graphs or a scripting language, and common features move into C++ as needed). Unless it's Unity, in that case, replace C++ with C#.
Wow, Casey doesn't hold back. Clearly, without hesitating says that 100% of code written in RAII* style sucks. Interesting considering Stroustrup considers RAII to be "the basis of some of the most effective modern C++ design techniques."
I'd like to hear Casey's comments on TDD.
*RAII style is lumped together with try/catch, smart pointers, tons of malloc/free new/delete
Casey hasn't a clue. He's only written games and he hasn't actually had to solve the issues that are being solved by in other domains. If he actually did have that experience he'd see his ideas don't scale. Games are in generally, vastly different than other apps (word processors, video editors, browsers, etc) in for one, for the most part, they get to choose all of their data upfront. If you're making "The Last of Us 2" there is no user data. There's no "some people will use this to write a letter to grandma and yet some other people will write an 800 page book on physics with mathematical diagrams" and yet another will write a report on the market with linked live data.
Consider "games" like Minecraft, Roblox or Dreams, those are entirely user-data-driven. Different types of games are at least as different to each other as to other types of applications (or rather, they are not less diverse than other applications, they usually just have a higher focus on performance).
How are they different? You have some primitives like a block, and meaning is given by users to a group of them. The program has to care only about the primitives.
(Of course making it performant, not rendering/“simulating” everything and the like is exceedingly hard, but it is true that it is more of a “closed world” as opposed to some other areas of software development.)
This type of "creative games" lets users combine basic building blocks in the same way a word processor application allows to write entire books by combining a a limited set of characters. The limitations of strictly linear games like Last of Us are not because of technological restrictions but because it's hard to tell a cinematic story while still giving the user complete freedom.
Minecraft was created in Java with ducktape and hacks and a lot of ugliness and perf characteristics which wouldn't fly past a pedantic programmer like Casey
Is your argument that arenas don't scale because user-provided data is variable in size?
Although arena memory is casually described as "allocate one huge chunk of memory up front," you are not literally only allocating one block ever and praying it never runs out. If you run out, you allocate another block. The point is that you don't call malloc for every string, object, list, etc. Adhering to this largely eliminates the need for RAII. What about this doesn't scale?
Personal anecdote: I'm building an IDE, where literally all of my data is provided by the user, and arenas have worked perfectly. I don't think I have a single destructor except for dealing with things like file descriptors, etc.
Bjarne Stroustrup is a Director at Morgan-Stanley, a major investment bank in New York where software difficulties may cost millions of dollars per minute. He has described his workday as people coming to him with a difficult software engineering problem, about which he asks increasingly detailed questions until light dawns, and they go away ready to re-write the badly designed subsystem causing the trouble.
When using a "rubber duck" I think you just force yourself to turn loose thoughts into coherent ideas by having to express them clearly. This can reveal some problems that were not apparent before the idea was expressed. It's a "rubber duck" because it's really just a monologue and an inanimate object could do the job of the listener.
What's described above seems more like Socratic questioning, where a person asks questions that reveal facets of an idea that the person being asked may not have considered, thereby prompting the asked to rethink their assumptions and draw new conclusions.
I don't see grouped resources as an alternative to the way we use RAII. It doesn't mean that the approach doesn't have its own merits, but it doesn't cover the things we do with RAII. The architecture of a DAW isn't much like a server that responds to requests, even if there are elements of it that do correspond to that pattern.
Alternatively, you are referring to something I'm not aware of.
Does TDD here mean ‘Type Driven Development’ a la Idris or, as I assume (and seems more likely) does it mean Test Driven Development?
If you mean test driven, I don’t think I have heard Casie discuss anything remotely close to the standard presentation of TDD strategies. I would imagine, based on watching a large amount of Casie streaming, both alone and with Blow, that he would view testing (as in TDD) as unproductive for his particular style of development and his goals as a programmer. But, since it’s in my brain now I will ask him next time I catch a live stream just to satisfy my curiosity.
I can’t speak to any discussions specifically about TypeDD. But would bet Casie would be wholly uninterested and consider it theoretical academic fluff that pulls away from the fundamental data transformation work of development.
Your comment reads like you disagreed in some way with the parent, but then you showed a scenario and defined it in a way you can answer the important questions. Just drop the "we don't care" part and they're all good answers that can be used to model the ownership and usage.
The lack of locks in this scenario is important in itself and can be expressed in types. (It shows for example that as long as the whole process is migrated to another thread, you can have safe N:M scheduling) The process ownership of the arena can be well defined as well.
The "alternative strategy" as you described it doesn't have different questions - just different answers.
No, the point wasn't a scenario. The point was to stop focussing on individual allocation-deallocation and start thinking about the data transformation pipeline. And every program is a data transformer — because data transformation is all a computer does or can possibly do.
When one starts thinking in terms of data transformation pipeline, and start to relate lifetime of data with the lifetime of various phases of that pipeline, then one stops worrying about individual "objects" and start thinking in terms of aggregates. Suddenly, there is no need to track the ownership or size for each object, because those properties are now shared with the same (or isomorphic) properties of the phases of pipeline itself. As long as the position in the pipeline is tracked for (e.g., the call stack), all other lifetimes will automatically be tracked.
I get the idea you're describing and use it for various purposes. But I don't agree "there is no need to track the ownership or size for each object". You can offload some of that thinking to the arenas, the same way you'd do it with GC, sure. But you can't just ignore ownership - sure request context is owned by the arena - but do you need to copy the values you pass to logging? do you need to wait for log flush before destroying request context?
What about the more common cached data / process state?
Arenas simplify the processing just like GC but they're not magic and don't solve everything.
EDIT: And about data sharing (with logging, etc.), that's part of the data pipeline too. So yes, your aggregation mechanism will take it into account. These are not two separate problems, irritatingly coupled due to reality; these are two parts of the same problem.
> From that point on, every "allocation" is equivalent to bumping a pointer in that arena.
I'm not a C++ developer, so I have a hard time imagining how would this be implemented in real code. I know how to allocate a blob of memory, but how do I redirect all the later allocations (that use `new` or that are hidden in std::string and other containers) to use parts of that blob? How do I know when the blob is going to overflow so that I need to allocate another one? Is it possible to give 5-10 lines example showing the basics of the technique?
It’s not an either or thing, there is probably no need to allocate strings in the arena.
Some cpp structures do allow for custom allocators, but you are more likely to have an arena instance and call some specific function on it and it will do the allocation for you. It usually operates on only a few types, not meant for arbitrary allocations.
> Some cpp structures do allow for custom allocators,
Yeah, that's what I was thinking about, but my only knowledge of allocators comes from gcc error messages which include all the template parameters - I have no idea how they work or how to switch to another one :(
I get the idea, though, it's basically what Erlang does for its processes - each one has a "private heap" just for it. If the process exits, the whole chunk of memory can be reclaimed and there's no need to run GC on it anymore (there's no sharing of memory between processes in Erlang, at all, so it's safe).
You either pass the allocator around to every stl container, and everything that uses an stl container. Or you override new/delete and have a sidechannel that defines what arena to allocate in.
Another optimization would be to keep the request buffer around and use it as backing storage for all the strings objects that result. The weakness of this strategy is that it doesn't make sense to keep it around if you only need a small part of it. It's a common source of memory leaks in Java where `String.substring()` just creates a new object with different start and end indexes for the same backing buffer.
A possible disadvantage of arena allocators is that everything is now tied to the lifetime of the arena, even though it might not be required that long. You'd need to practice a strategy like subarenas, RAAI, garbage collection or reference counting within the request as well.
The occurrence and advantages of all these strategies unfortunately are heavily workload-dependent. Most strategies work well for 80% of the workload. Different workloads of course.
This is why I'm a strong advocate of hiring former game developers - they know what matters when it matters, not before, and not after. how? Pain and tears of being there, trying in the obvious wrong way first, and then fixing it during production while kiddies shout at you. Former game devs are the Marines of software.
Muratori has demonstrated to me one time too many that he hasn't a got a clue about the things he's talking about.
To be precise: He may or may not have a clue about some things he's talking about, but I don't know enough about them to form an opinion. But in more than one occasions he went on to talk (sometimes at great length) about things where I know he's mostly or completely wrong in what he said.
I don't have a way to know the scope of these things, the things he doesn't know or understand yet talks about. Maybe it's just when it comes to "systems programming". Maybe it's everything that isn't game design and programming. And maybe he's wrong about games too. As I said, I don't know enough to judge everything he says, but the things I was able to judge convinced me not to trust him and his opinions.
And since I can't set the scope safely, I have to not trust him regarding everything.
For example, while his post[1][2] on ETW being "the worst API ever made" is slightly amusing and has _some_ good points, it mostly demonstrates:
(a) Inability or unwillingness to read the documentation. It would have saved most of his problems. But let's say that to criticize the _design_ of an API we can disregard that for a moment.
(b) Inability or unwillingness to pick the right tool for the task or pure trolling.
(c) Worst of all: Complete lack of understanding of the goals, purposes, limitation, design goals and tradeoffs of the system.
You can't criticize the design as being too complicated, while offering an irrelevant alternative for a "glorified memcpy()" (his words), when you don't understand what it does or what it's supposed to do.
(I emphasize again: I'm not saying the API is a paragon API design and that all is perfect there. I am saying the criticism is both grossly exaggerated and demonstrates misunderstanding of what the API actually does.)
After a couple of those I don't trust him and you shouldn't either but you're welcome to.
> You can't criticize the design as being too complicated...
I hadn't finished reading the ETW rant; but I don't see anything wrong with it. All he seems to be saying is that for communicating with OS, ioctl-like all-in-one APIs are bad while epoll like APIs are better. Not really a controversial statement.
> After a couple of those I don't trust him
Who said anything about trust? I have made enough mistakes and learnt enough lessons, and it's nice to see other well-respected programmers (like Casey and Mike) come to the same conclusions like mine. To paraphrase your post: "You shouldn't repeat those mistakes but you're welcome to".
EDIT: Actually, after finishing reading the blog post, wow! I thought Linux's perf_event API was shit, but this ETW crap takes the cake. Well done, Microsoft; you had one job.
While I agree with the sentiment, my impression is that the GP's point is about memory safety rather than performance. So yes, this applies for common patterns like per-frame memory in games, in which case the "it" is the arena. Otherwise, as a general rule, profile first, then optimize.
I made a special purpose http sever in C++. I had a function called request() and just allocated everything in the stack of that function. Only a few KB. eg a vector of the headers etc. Of course after the function was done the memory was freed.
> You allocate a memory arena when the HTTP request comes. From that point on, every "allocation" is equivalent to bumping a pointer in that arena. If the arena gets full, we allocate another and chain it to the previous one, as a linked list.
Didn't you just describe the regular memory management as applied by OS in a process scope? At the completion of the process, all memory allocated by the process is reclaimed.
On some systems this approach is combined with granular quotas to allow multi-process coexistence.
Shifting this sort of memory management over to application scope may yield benefits now, but ultimately it's an OS-level concern.
Asking the OS for memory is a low-level operation, which means it is not portable and should be done by a library or framework. Also, since it's a syscall (involving a context switch), you don't want to do in performance-sensitive code.
Some applications might benefit from allocating huge pages. By default, the page size is something of the order of 4KB or 16KB. Large numbers of pages result in huge overhead. By allocating fewer, large pages on the order of MBs can help.
Interestingly it's exactly what the PHP memory model does, each request is a shared nothing VM space, nothing survives the end of the request, even if allocated memory objects don't get free'd.
It took me a while to understand what C++ was: a quest for the highest level semantics possible with the lowest performance overhead. It is a quest started a long time ago with a lot of dead ends and circumvolutions.
The quest is still valid but really, the complexity that C++ has given birth to is not worth it anymore. Other languages restarted from a blank state and are probably better bets. I almost switched to D years ago, for a while it looked like Go was going to take over but finally Rust seems to be it. My current work forced me to get into rust, and I don't think I'll go back to C++ anytime soon.
It was a fun ride. So long and thanks for the fish.
That is an interesting take, I don’t really disagree (regarding the high-level semantics/lowest overhead). I do question the use of ‘highest’ as the qualifier for the semantic level, but certainly would support a claim that the goal was some indeterminate high with regards to the semantic level. Even at the inception of C++ as something different than C with classes, there were at least a few languages with a higher semantic level than where C++ is today. Although I do not think there was any then current strategy for making those languages as low overhead as C in practice. So as time has progressed and the cruft and complexity have built up I think they have now decided to find the highest level they can reach, while maintaining backwards compatibility and the low overhead requirements.
I am inclined to agree that a green field language is probably a better bet for achieving the goal, but I am not sold on any of the current contenders. So I still plod along doing my performance sensitive development in stripped down C++ and plug away at my pet language project.
I think there's something important here, potentially obscured by imprecise terminology.
The Rust safety invariants are, at heart, global properties. The central one is that mutable references are unique. Another way of saying that is: if you hold a mutable reference, then all other references held by all other objects in the system do not conflict with it (either because it is included in the "stacked borrow" or because it doesn't reference the data at all). That kind of property is ordinarily quite difficult to prove.
Rust does it though, by encoding these invariants in the types of functions. Most importantly, these types compose, meaning if you've got one function that respects these invariants calling into another, then the whole thing will also be sound. You can compile them separately and still be confident.
By contrast, achieving similar goals in the C family of languages does require a global analysis. The classic example is alias analysis. It's implemented in many compilers, tons of PhD ink has been spilled, but long story short it doesn't work. You get okay results sometimes on small programs[1], but as systems scale up, basically there always becomes a way for one pointer to alias another.
Thus, the claim I would make is the following: it will not be practical to retrofit Rust-style borrow checking onto an existing unsafe language, because the type system has to be rich enough to express the types of invariants required. The Rust type system has been carefully crafted to be powerful enough, at considerable cost: the complexity of the type system is one of the biggest complaints about the language, and slow compile times, one of its correlates, is another.
I would also claim that trying to scale up static analysis (based on alias analysis and other similar global analysis techniques), while perhaps somewhat helpful, is not going to give comparable results as Rust. In order to get anywhere near the level of confidence that all possible safety problems have been caught, the false positive rate would be unacceptably high.
I think this is one of the enduring achievements of Rust, and one likely to be carried forward in future programming language designs, but extremely unlikely to be successfully retrofitted to existing languages.
The point of the borrow checker is to replace the global property 'all memory has an owner' with the (approximate, conservative) equivalent 'all code passes the borrow checker'.
By tracing ownership information through each component, and forbidding (or ignoring via `unsafe`) situations which cannot be traced in this way, the latter property can be decomposed and solved locally.
In other languages, like C++, we may still want this global property, but we can't break it down into local reasoning. Even if we decide on an approximate, conservative equivalent like 'all code passes STATIC ANALYSER X', getting local reasoning to work would require enough changes that it's arguable whether we're actually programming in the same language anymore. For example, we may need to add extra annotations to our code (which aren't present in existing codebases); we may need to extract extra information from sources (preventing the normal use of separately-compiled libraries); we (and our dependencies!) may need to avoid certain valid/legal patterns of code which the analyser can't handle; etc.
Writing the above, I'm reminded of trying to use the mypy type checker in Python!
In other languages, like C++, we may still want this global property, but we can't break it down into local reasoning.
That's exactly it. You want some set of machine-checkable constraints which add up to the desired global properties. Rust managed to do that. Attempts to fix this in C++ yield a set of slightly leaky constraints which sort of almost do that. Fixing this requires taking things out of the language, which is unpopular.
It's embarrassing that the code below still compiles with default gcc options, in either C or C++ mode. Yes, it's terrible C++. The compiler allows it.
#include <stdio.h>
#include <string.h>
int main(int argc, char* argv[]) {
char buf[20] = "\0";
char* s = buf;
for (int i = 0; i<argc; i++)
{ s = strcat(buf, argv[i]); }
printf("%s\n",s);
}
(Even Microsoft has "strcat" deprecated by default.)
D's ownership/borrowing system does data flow analysis within functions, i.e. it's intra-function. Inter-function (global) is handled via the function signature.
I think the idea is more like "the entire program is covered by lifetime analysis / borrow checking, so that nothing gets missed." Something like that?
Data-flow analysis is what I assume the author meant. Fancy word for "reasoning about behavior at a level higher than just a syntax node" (usually within a function or module, possibly within an entire program though)
Both D and Rusts compilers do it, and I'm uncertain whether Zig's does but it may too.
That's not true. Global analysis is a synonym for interprocedural analysis, which crosses function boundaries. An intraprocedural analysis that crosses many basic blocks is still "local" rather than "global".
This slide deck is weird. I've taught 143 before. Where did this come from? Perhaps in really old stodgy terminology before there was any interprocedural anything people used the words this way? I really don't know any static analysis or compilers person who would consider "global analysis" to mean "intraprocedural analysis that considers more than one BB".
It's funny because I also remembered it as cross-function analysis, but when I went back to grab a link I saw this and thought maybe I'm mistaken somewhere, so I just used this definition. Oh well. Main thing I was just trying to convey was that it's technical terminology.
I don't know if it's actually wrong though, given that a procedure is just a block of code with one entry point (and let's say one return, for the sake of discussion). I'd have to jog my memory, but I'm thinking: if you can already optimize inside a function (but outside basic blocks), then I don't recall what would be so drastically different across functions. The fundamentally hard part does seem to be going from one basic block to multiple. I might be forgetting something though... do you recall?
The only explanation I can think of is that this is old terminology that has stuck around in course material based off older textbooks that aren't bothering with any sort of interprocedural anything.
Function boundaries do make things fundamentally more difficult than basic block boundaries, for a large number of reasons. You can no longer have single definitions of values and updates to those values are not global updates for the entire program. This is why you need stuff like context/object sensitivity for interprocedural analysis but it doesn't matter for local analysis. Graph structures also become way more chaotic, preventing the nice efficient lattice movement you see in classical local fixed point computation.
Like, static analysis and compilers is my job and none of my colleagues would use this term this way.
But clearly there is some material using it that way. Weird. I've been wrong before and I'll be wrong again in the future. So I'm happy to be wrong here. Clearly there are some situations where "global analysis" is used to describe whole-function analysis.
Interesting... now I'm thinking maybe I was confusing it with interprocedural analysis? I've definitely seen the distinction made between them before, though I'm not sure if I've seen them mentioned alongside local analysis in the same text. I guess if you want to have fun this week, go ask your colleagues what the difference between local, interprocedural, and global analysis is. See if they say the last two are synonymous. :-)
Regarding the difference for interprocedural analysis: I'm not entirely sure I follow it unfortunately. Let's say you can do everything within a function. Can't you just inline all of its callees (at whatever depth you want) and do your analysis/optimizations based off the result of that? Inlining itself is a rather trivial mechanical transformation, and after that everything is inside one function again, which you can already handle. The only real obstacle here seems to be recursion, but if you just treat that as any other opaque function call that you can't optimize across, you should otherwise get the rest of the way there, right? What am I missing?
As a preface, it is difficult to speak about true fundamental limitations of things like abstract interpretation because "print Top" is a valid algorithm that will work for all programs. It is just completely useless. So you can model function calls in trivial ways (just treat calls and returns as giant phi nodes and widen whenever you hit mutual recursion). But this tends to produce pretty poor results for real programs and your fixed point computation tends to take longer due to the shape of real program call graphs. Add in dynamic dispatch and you've got all sorts of fun (watch all your pointers in a java program get merged through the receiver to Object.equals(), for example).
In practice, these structural differences require new approaches. The field has done a really good job at intraprocedural analysis, solving a lot of really important problems many decades ago. How to do fixed point computation over SSAed CFGs is well understood, even when you've got heap relationships to think about. Interprocedural analysis still largely sucks. Even modern approaches like CFL-reachability for dataflow analysis produce a ton of garbage. This is one reason why I say it is just harder.
As you mention, recursion prevents you from inlining everything into one giant function. You do get "the rest of the way there" by inlining to some depth limit in the sense that inlining is a way of achieving context sensitive analysis (though it is generally not preferred). But you've still got big problems at the points of mutual recursion (either you need to widen badly or you need to actually do interprocedural analysis) and your program is also exponentially larger in pathological cases.
>My comment for too many years is that C/C++ fails to deal with three issues: "How big is it", "Who owns it", and "Who locks it".
Ada has is pretty good at dealing with all three of these, TBH.
"How big is it" — Given by the 'Size attribute, and representation-clauses explicitly control record-layout.
"Who owns it" — The declaring entity, which is why you can have dynamically-sized arrays without heap-allocation and "allow the scope to clean things up".
"Who locks it" — a bit more convoluted than the above, but generally one of several options: (1) the Task / protected-object via entries; (2) the object itself, via controlled/limited_controlled inheritance; OR (3) the subprogram/compilation-unit via parameter-passing and/or interface-control (i.e. the only way to alter the interior-value is by some exported interface).
I think the biggest problem with C++ is the lack of explicit safety requiring you understand the internal mechanisms of how everything works in order to make use of almost anything, combined with extremely high levels of abstraction.
The high levels of abstraction are powerful, but quickly become footguns to the uninitiated (and the initiated alike at times).
> Figuring out how to do something safely can be quite difficult. It can involve solving puzzles, and often involves rewriting things at several levels, especially when threads are involved.
I think this gets to the core of really closing the gap. While the "insider" knowledge of C++ is really understanding the internal workings of these abstractions so you don't do something stupid with memory, the "insider" knowledge of Rust is knowing the specific patterns and routes that will work with the safety features of the language rather than against it for solving various problems.
I can still say for me I spend a lot of time 'exploring' Rust, having the compiler bitch at me, and tweaking things about until they work.
Sometimes it can be extremely frustrating, and given I have the insider knowledge of C++ with 20+ years experience, it's easier for me to just drop in and do exactly what I want to do (because I know it's safe... or at least I think it is ;) )
That being said, I'm very excited for languages like Rust, but there is still a gap of cost/benefit between it and C++ for the time being, at least for my professional use.
Well, that depends on the uninitiated. Those who come from a more "user-friendly" language, e.g. Java, are pretty much safe - they can continue with C++ and its Standard Library as if it was Java (as long as they respect the RAII principle). The C people, on the other hand, should expect to be bitten on the ass many times (try, for example, to use the pointer to an element of an std::vector after you have pushed some more elements into it!), and so you are right, those people are, unfortunately, better off knowing the internals.
Every time somebody got lazy and used "unsafe" in Rust that's unavoidably annotated in the code. If the compiler can't prove it's safe and you don't label it "unsafe" it won't compile at all.
Even coming in years later, a maintenance programmer can identify this code is suspect, whereas these other parts are safe.
Whereas if you get lazy in C++ you can do whatever unsafe things you want, anywhere, and it leaves no trace.
Same here, lots of large projects I worked didn't have any unsafe at all. It's a very niche feature, mostly for people doing very low-level stuff.
But I suspect this is survivorship bias. I only ever worked with very experienced developers when doing Rust. I'm pretty sure as soon as I start working with more inexperienced devs I'll start seeing a multitude of clever ways to bypass the borrow checker, similar to the cleverness I currently see them doing in other languages.
It's a feature that's necessary when dealing with FFI or when implementing containers. It's a tool, rather than a crutch to work around the borrow checker. In fact, unless one is using raw pointers everywhere, it's very impractical to use unsafe blocks to circumvent the borrow checker.
I was with you until the last sentence. I write Rust code at scale, with schedule and performance constraints. I’ve literally never had to use the word unsafe in my code.
If you can’t figure out how to do it without fucking up the borrow checker, that’s because you can’t figure it out, and you need to learn how to do your job. You’re never “forced” to hack anything.
Blaming “a whip of agile” is just bad taste. Don’t be bad. Like, wtf, you had a decent comment and then shat the bed right at the end.
Here's the point of view from someone who has only written 1 piece of production software with both C++ and Rust.
Trying to write C++ I was constantly fighting accidental memory copies. With Rust all of this was trivial and everything works as I expect. This is pretty much the reason I chose Rust over C++.
As a beginner I have no idea how I would begin to elegantly write immutable data, parallel code. With Rust it's just .iter() to .par_iter() (everything is immutable by default).
C++ package management is awful (it has none). C++ headers are annoying and pointless.
I know these are strong statements and of course only my opinion but I just see people claiming C++ is fine just having a strong case of Stockholm Syndrome.
Rust is nowhere perfect. Often it can look like line noise, the project folder takes multiple gigabytes even for small-ish projects, the compilations can be slow, the tooling is not always there. This is still multiple orders of magnitudes better than trying to Google how Cmake works step by step.
On the other hand I love rustanalyzer + vscode. It was super easy to get going with and just works. Visual Studio C++ seems to be much more steep as a learning curve.
While I agree with most of what you wrote, CMake feels supercharged compared to the build system part of Cargo.
Cargo is failing hard on the integration front, both on integrating things, and being integrated. build.rs is a minimal substitute of a build system: "deal with it yourself". The way Cargo wants to have total control, on the other hand, makes it hard to integrate with existing projects: those which don't use Rust at the top level.
The only great thing about Cargo's build system is that it works well on the happy path of Rust-only, crates.io-only software.
I was a C++ developer in a past life and Cmake was a big reason I moved on to greener pastures. It’s the only language that is more toilsome to use than C++ itself, and that by an enormous margin. It’s stringly typed (no, that’s not a typo, everything is a string), it’s syntax is obscure, it extends so poorly that it just bakes-in support for building popular libraries (e.g., Google’s test framework, Qt, etc iirc), imports are implicit so it’s tedious to track down the definition for a particular symbol, it completely punts on package management—not only were builds not reproducible but you couldn’t even get it to download and install dependencies for you from a declarative list. Cmake isn’t a build system, it’s a toolkit for scripting your own bespoke build system (and a crumby one at that) which basically means that every project is a unique snowflake with its own distinct quirks which are tedious to learn and maintain—even though 99% of projects would be covered by something like cargo. Those are some of the things I remember off the top of my head ten years later (it was also doggedly slow and things would break across minor releases, but I’m told those things have improved).
Cargo is imperfect, but it’s the right tool for the job 99% of the time.
CMake may be toilsome, but it is acceptable for simple projects due to its builtin dependency resolution, and powerful enough to do anything you want on the complex side.
Cargo is perfect 80% of the time in my usage, but when it's not perfect, it's almost actively harmful, and much worse than CMake. And I say that as no fan of CMake.
I would not be bothered by Cargo not being a build system, except it's the one build system underpinning the entire Rust ecosystem via crates.io, and its "rules" are not interoperable with other build systems.
As a result, you have to deal with Cargo's terrible build system whenever you want to use an external crate, whether you want it or not.
I don't disagree, I find cargo insufficient for projects of any meaningful complexity (it doesn't even support post build steps...). But it's really good at one thing: compiling crates.
But I haven't had a ton of trouble integrating it into CMake projects. It falls into the category of "know your tools." Not everything can "just work" all the time.
> But it's really good at one thing: compiling crates.
Unless you're using "compiling" as strictly compiling, and not "building", then I don't agree either.
It falls flat on its face if you want a build time choice between dependency versions, for example. And build.rs means that Cargo washes its hands from compiling parts of crates that are not written in Rust, so it's arguably not good at compiling (of anything but pure Rust).
The way Cargo is integrating several concerns also makes it hard to create better build systems for Rust, because they would have to pull in the same kitchen sink in order to support Cargo.toml. So that's being bad at letting others compile as a bonus.
EDIT: Actually, that wouldn't be a problem if Cargo the decent package manager didn't mandate Cargo the awful build system.
As much as I agree that Rust needs a better story for builds and interacting with other languages, it sounds like you have a misunderstanding over what Cargo primarily does and what crates are. A crate is a single compilation unit of Rust. Cargo is a tool for compiling crates and pulling in other crates that it references.
build.rs is a half measure to include foreign symbols in compilation artifacts like static/shared libraries and executables. I'd go so far as to advise against using it for anything but specifying linker flags.
I don't think I've ever had a use case for specifying dependency versions at build time. That seems insane, and I do insane things in cmake with regularity. There's a reason versions are pinned to a config file committed to repos in almost every contemporary language.
fwiw, Cargo is a crate itself and you can use it as a library. You can even compile it with C language bindings to call through FFI in other build systems if you felt like it. The lang tools team has done a great job with keeping the scope of Cargo manageable and putting in the ground work to make better tooling around it.
For complex Rust builds, check out cargo-make. It does most of what you'd need in a predominantly Rust codebase. For polyglot environments, cmake with custom targets is the least bad way I've found to do it - and it's not hard to do that by shelling out to Cargo.
A crate may be a compilation unit, but it's irrelevant. Within the Rust space, a crate is a library. something like 95% of crates are using Cargo, and Cargo requires that dependencies are also using Cargo. Today it's impossible to ditch Cargo, and publish your crate with e.g. Bazel as the build system.
There's no misunderstanding that what Cargo does it build Cargo crates. The problem is that it doesn't allow for sanely built (so not using Cargo) crates.
> specifying dependency versions at build time
Packaging for different distributions, where different versions of a dependency are provided, is quite a common thing, and has justifications beyond technical reasons.
> the project folder takes multiple gigabytes even for small-ish projects
This is a workaround but you can specify a global directory for all those artifacts with the env var CARGO_TARGET. This will deduplicate common dependency versions and lets you stick it in a temp disk/exclude from backups.
I think a lot of those issues are more related to how bad C++ s rather than how amazing Rust is. I abandoned C++ several years ago for D and I can't imagine ever going back to C++. Similar issues, header files, no package management.
I don't know Modula-3 and Eiffel, but I do know some Ada. From my experience Rust, with the borrow checker, still brings a lot to the table compared to Ada. Although Ada has things that Rust doesn't have, too, like delta types, which are immensely useful in embedded programming, and SPARK.
Ideally Rust would adopt some of these, or Ada in the next standard.
Ada is adding borrow checker like capabilities to SPARK.
In fact this is what I consider Rust's biggest contribution to the computing world.
Even if Rust dies tomorrow and eventually fades away, it has brought Cyclone and ATS ideas to the masses, to the point that many languages have done, are in the process of doing, design decisions to integrate affine or linear types to some extent with their type systems.
I love C++ and I’ve to admit that it will need a subset language soon but the examples given in this article wouldn’t possibly exist in well checked code-bases because you can immediately see usage stinks just by looking at it.
I think every C++ is bad article can be summarised like this:
1. C++ has lots of features.
2. Let’s nonsensically combine these features to shoot ourselves on the foot.
3. Uh oh, C++ didn’t help us write good code. Hence, C++ sucks.
People have to accept that in real life there is a thing called code-reviews and senior engineers are supposed to prevent such badly written code.
> People have to accept that in real life there is a thing called code-reviews and senior engineers are supposed to prevent such badly written code.
Ah, yes, the "you're holding it wrong" argument, just like the article predicted. The problem with this is that your version of "real life" is significantly different from the real "real life", in which often no code-review is held, or the reviewer misses bugs.
People have to accept that in real life there is a thing called code-reviews and senior engineers are supposed to prevent such badly written code.
Why should people accept that? There is overwhelming evidence that such reviews do not reliably prevent such problems from making it into production. In some cases, other languages exist that do reliably prevent the problem from ever making it into production because the problem is impossible by design.
We can and should consider from time to time whether the advantages offered by an old but established language are now outweighed by the advantages offered by a newer but better designed language. If that doesn't happen with increasing frequency as time passes, we have a serious problem as an industry.
I think it is fair to point out that it requires a comparatively high degree of skill and experience to use C++ well relative to many other languages. Most people do not use C++ well because doing so is quite difficult and requires a large investment of time. This is not helped by the fact that it has an enormous amount of legacy baggage that is technically valid code that no one should ever use — there is an entire anti-language you have to learn too. Its standard library has many flaws such that many experienced programmers eventually write and use their own alternatives to many parts of it. These are all legitimate hurdles and criticisms, it is neither a pretty nor easy language.
The major benefit of C++ is that with sufficient mastery you can do complex things strictly, safely, and concisely that are difficult-to-impossible in any other systems language due to its flexibility and metaprogramming facilities. Its flaws are very real, but so are its strengths.
As someone not very acquainted with C++ I was under the impression that the vast standard library was one of the great selling points of the language. What are some examples of defects that force programmers to rewrite parts of it?
depends where you come from. If you come from Python, Java, you'll find that there's a lack of support for networking, etc. and that it is not vast at all. After all we lack a W3CEndpointReferenceBuilder (https://docs.oracle.com/javase/10/docs/api/javax/xml/ws/wsad...).... not sure how it's possible to be productive without that.
If you come from C you'll think that there are allocations everywhere and judge it unfit for purpose (even if in my experience, C libraries and software tend to allocate much more than C++, and use worse datastructures like linked lists, etc. just because it's easier to code in C).
Finally the C++ committee and standard library implementers refuses to break ABI / API, which means that :
* the C++ standard is not able to follow the state-of-the-art for e.g. hash maps, as the fastest hash maps (if you only care about speed of insertion / retrieval, which is let's be honest 99.5% of the use cases out there) because unordered_map has strict requirements that these newer hash maps do not satisfy.
* things like regex stay broken
The standard library is quite good, but is designed to be fairly general. Sometimes this means that you can do better with something you write yourself.
Let's not confuse this with skill, though. I think ego is the reason why this has been allowed to go on for this long...
The "skill" is mainly pointless memorization of a bunch of idioms. I've attended C++ training with committee members that have made C++ their life... and it was quite discouraging to see that even they do: Let's try this... oh, that didn't work. Let's try this... hmm, right. I know what this is, I've seen it before. Now, it works.
C++ was my passion after Turbo Pascal, I fought to use it instead of C on university assignments, was a TA in C++, did research at CERN in C++, and used it at several multinationals before migrating into managed languages.
Well checked code-based is something that I have hardly seen in real life, outside conference talks about best practices.
Bugs in the article are trivially obvious, because they're in 3-line code examples, explicitly pointed out.
The problem is, the same bugs happen in actual large codebases, without the priming to look for this particular issue out of hundreds of possible issues.
It's a difference between an article saying "This is Waldo" (duuh!) and a "Where's Waldo?" game, where you don't even know how many Waldos are there.
These Rustic people write very dishonest articles. They write all flaws of C++ and then compare how bad C++ is. It really harm the Rust language itself not the C++. Professionals will dislike Rust because of it's community not the language. Rust people should put more effort writing a formal specification else many people will consider Rust as undefined. My comment getting flagged and downvoted and therefore posting here.
To add, C++ has survived for many years as the defacto system-level programming language (C has too). It has survived good and bad shifts in software engineering.
The old "C++ is not going anywhere" argument applies.
C++ is not great choice for small, or transient projects. There are other languages that are a better investment for those projects.
But if you're writing an infrastructure-level application, that is expected to have a shelf life in decades, C++ (or C) is a pragmatic and rational choice.
UB-invoking dereference in std::optional is such a baffling design choice.
The whole point of an optional type is to prevent accidental unchecked access to the value. Sure, sometimes it's useful for performance to skip the check when it's already known-safe from the context, but such dangerous optimization should have been behind a method like `beware_of_the_nasal_demons()`, not an innocent-looking convenience syntax for flirting with the UB — in a language that was supposed to be cleaning up the unsafety.
> The whole point of an optional type is to prevent accidental unchecked access to the value.
You might argue it should be that, but it wasn't my impression that it is that. My impression was std::optional is just there to allow for representing the absence of a value. Ideally that should be zero-overhead, which means dereferencing it should degenerate into a normal object access. Hence the current design.
Do you mean that the purpose of std::optional is to be faster than a pointer (which could represent absence as nullptr)? I would have thought it would be ML-style safety, but it seems like you're right. I guess a tagged union (or std::pair<T, bool>) would be faster—if the std::pair is a local variable, fields of the wrapped object will be at fixed offsets from your stack pointer or frame pointer, so you save yourself an indexed register load the first time you access a field of the object. That seems to have been the intended use.
No, the purpose wasn't to be faster than a pointer. You can't even necessarily use a pointer where you can use std::optional; the value of std::optional is embedded in itself.
One purpose would be type safety (and things revolving around that). Another would be to let you pass around an optional object just as easily as the underlying object.
As to whether it's a good idea to have it at all, I'm not particularly fond of it personally either. I don't have specific issues with its dereferencing safety though.
Don't confuse type safety with memory safety though. Type safety != memory safety != thread safety != ...
Optionals are not zero-overhead in any other language. If you aren't going to bother checking the optional, why not just return any old pointer type? The type system should be used to enforce a contract and bypassing checks on that feels like you are just spitting in the fact of the type checker.
> Optionals are not zero-overhead in any other language.
That's been true for so many other features in C++ too.
> If you aren't going to bother checking the optional, why not just return any old pointer type?
Pointers require something to point to. Optionals can embed the value.
> The type system should be used to enforce a contract and bypassing checks on that feels like you are just spitting in the fact of the type checker.
It's bypassing something alright, but I wouldn't say that's the type checker. And I mean, you could use this logic for everything else. Iterators, arrays, etc. should all have bounds checking too, right? If you want that, there's C#...
>That's been true for so many other features in C++ too.
Plenty of C++ features are zero-cost, not necessarily zero-overhead. To me having an optional without bounds checking is like obtaining a shared_ptr without incrementing it's use_count. It's like arguing that I should increment use_count manually so that copies are "zero-overhead". Sure it's "faster" but the design is still broken and will lead to all sorts of issues.
>Pointers require something to point to. Optionals can embed the value.
So? Just because it embeds a value rather than a pointer doesn't make your program any more correct. If the API designer returned an empty optional and you access it anyways you are still dealing with garbage data.
>And I mean, you could use this logic for everything else. Iterators, arrays, etc. should all have bounds checking too, right? If you want that, there's C#...
Ok? Then lets remove all overhead from the language. Why does shared_ptr increment it's refcount behind my back? No more shared_ptr. This kind of argument without nuance gets you nowhere. Likewise, unlike bounds checks, nothing about std::optional is forced or baked into the language. If you don't want to do null checking, then just don't use std::optional - the fast option is still there and unlike Rust, the fast option is the default.
Optionals are a great tool to eliminate null pointers from a codebase. It's a big enough problem that I would expect an optional type, designed in 2018 to get it right. Allowing automatic dereferencing is an oversight.
> And I mean, you could use this logic for everything else. Iterators, arrays, etc. should all have bounds checking too, right? If you want that, there's C#...
I mean yes, the logical conclusion is that trying to backport safety onto C++ is impossible and moving to another language is the only reasonable option. That's kind of the point of the article.
> I mean yes, the logical conclusion is that trying to backport safety onto C++ is impossible
The logical conclusion is that what some people here seem to want is so radically different from C++ that they really just want an inherently just a different language altogether... which is fine. Nothing about this implies it's impossible to shoehorn C++ into something else (though that may still be true; I'm not sure). It just implies that another language should be considered when you want to prioritize memory safety, like maybe C# or Rust. (And nowhere here am I agreeing or disagreeing with the article.)
> Optionals are not zero-overhead in any other language.
This is a common misconception, and a pet peeve of mine. Properly designed optional types add neither space nor code-size overhead.
Consider the C function:
foo(int* ptr) { ... }
If it is part of the API of foo that `ptr` may be null, then foo must not dereference `ptr` without verifying that it is non-null. So the body of foo must be something like:
foo(int* ptr) {
if (ptr != NULL) {
...
}
}
Or, take the non-pointer case:
foo(struct S s, bool s_present) { ... }
Clearly the intent of this function is that the contents of `s` should not be accessed if `s_present` is false, so the implementation of the function must check `s_present` before inspecting `s`. (Inspecting s incorrectly wouldn't necessarily be UB, like in the previous example, but it's clearly erroneous)
The only thing that a good optional type implementation does is make it a type error to fail to do what everyone agrees both of those functions must do anyway in order to be correct. It need not increase the size of the representations, nor must it emit a single unnecessary instruction when compiled. There are plenty of languages that do this, Rust being a prominent example.
> If it is part of the API of foo that `ptr` may be null, then foo must not dereference `ptr` without verifying that it is non-null. So the body of foo must be something like:
foo(int* ptr) {
if (ptr != NULL) {
...
}
}
then what ?
foo(int* ptr) {
if (ptr != NULL) {
*ptr += 1; // optional<T>::operator* would introduce a branch here ?
*ptr = *ptr / 2 ; // same
if(*ptr > some_constant) // same
{ ... } // etc etc
}
}
and compiler optimizations can't always be assumed, for instance debug mode (-O0) performance does matter
but this declares a new variable - I find this super messy and really decreases readability in practice since now it's not obvious anymore that what you're working with was a function parameter. Also it adds a scope level - I much prefer the
if(!bla)
return;
// use bla
early-return style. So that's really a no-go for me from years comparing the two styles
Well if you have aesthetic objections to the way rust does it, I can't argue with you. Kotlin does it the way you like, though.
Anyway, I started in on this thread because I was objecting to the claim that optional types always have overhead. They don't. That's all I wanted to show.
> Well if you have aesthetic objections to the way rust does it,
I agree with GP that it is harder to maintain ("messy" they say) if there is more than one variable referring to the same value. It is not so subjective as you make it to be.
The idiomatic way to solve that in rust is to re-bind to the same variable name.
if let Some(foo) = foo { /* ... * / }
That's possible because in Rust name shadowing
let foo = grab_foo_bytes();
let foo = parse_foo_bytes(foo);
makes the previous binding of the variable no longer namable and thus no longer accessible, but doesn't drop it (and trigger RAII destructors).
Now someone will probably come in and say "oh no, this isn't exactly like c, how will anyone ever understand it". To that I reply why is it that c users get to say "if you don't know how c works exactly you're holding it wrong" and then comment about other languages "I don't want to have to learn anything to hold it right".
> The idiomatic way to solve that in rust is to re-bind to the same variable name.
OK, that's reasonable. Is the idiomatic way to use optionals to introduce a layer of nesting? I prefer keeping functions very "flat"-looking. It sounds like Rust's optionals will give people an excuse to create labyrinthine functions where I'm constantly scrolling around to remind myself of what level of nesting I'm at and whether I'm in a loop or not, etc.
As you say, you don't need a new block to shadow a previous var, so hopefully that style catches on.
> "oh no, this isn't exactly like c, how will anyone ever understand it"
Not very persuasive, sure, but the network effect of the C/C++ culture (including its general syntax and imperative nature) is a strength in and of itself. New languages would do well to coddle the existing C++, Java et. al. users wherever it doesn't contradict the language's central mission.
I definitely agree. One way I do that is by having an internal function that takes a valid value and a public function that does the validating/error handling.
That doesn't always make sense though. There's a few other idiomatic ways to avoid nesting. Since statements evaluate to values, you can write
let foo = if let Some(foo) = foo {
foo
} else {
// Something that either evaluates to the same type as foo or returns early
}
That's so common there's a special operator for it, ?. It essentially either early returns the sad path or evaluates to the happy path.
fn get_foo() -> Option<Foo>;
fn frob() -> Option<Bar> {
let foo = get_foo()?;
let bar = convert_to_bar(foo);
Some(bar)
}
I prefer to use Result to model missing data like cases instead of Option because it composes better. So that might be
fn get_foo() -> Option<Foo>;
fn frob() -> Result<Bar, BarNotFound> {
let foo = get_foo().ok_or(BarNotFound)?;
let bar = convert_to_bar(foo);
Some(bar)
}
#[derive(Debug, thiserror::Error)]
#[error("Bar not found")]
struct BarNotFound;
That last bit uses a stdlib macro and a very commonly used external lib macro to save a few lines of repetitive typing.
Edit: Also ? doesn't special case Result and Option. You can make your own type conform to the interface (trait) it requires. That would probably be weird though.
{ int x = 42;
{ int x = x + 1; // x + 1 refers to this second x
... } }
This is because the scope of the identifier being declared already starts at the =. So even if the redeclaration were allowed without opening a new block scope, it wouldn't work.
However, there is a good reason for that: initializers can be self-referential, so they have to have their own identifier in scope:
// define circular structure in one step, no assignments:
struct node n = { .next = &n, .prev = &n };
In this regard, the scoping rule is like letrec in Scheme or labels in Common Lisp.
Easily dealt with with a linter. C++ is not fundamentally a safe language. It's fundamentally a no-cost abstraction language. It's not a baffling design choice if you know C++.
> not an innocent-looking convenience syntax for flirting with the UB
I agree, but C++ uses this pattern in so many places that I think it is less confusing to be consistent. When I see a nice, concise syntax or function, I check cppreference for undefined behavior...
> The whole point of an optional type is to prevent accidental unchecked access to the value
no ? it is to model the idea of "a value is there or is not there". definitely nothing more. If replacing a T* t by a optional<T>& incurs a meaningful performance cost (like a branch) then optional just won't be used
boost::optional was designed to mimic pointers. Before boost::optional, some programmers would return a T* as a caveman's optional<T>. Boost wanted to preserve the syntax of *, ->, and operator bool. Dereferencing a null ptr is UB, so operator* for optional was the same.
I agree it would be nice if you could get an assert in operator*. But you can already fire up a debugger or sanitize build and get a nice error message anyway, so it's nbd.
> it would be nice if you could get an assert in operator*
If, as you wrote, operator* for an optional that doesn’t contain a value is UB, it can do whatever it wants, including assert.
I think the problem is that there’s an implicit requirement that operator* optionals that do contain a value is as fast as a pointer dereference.
(Aside: reading https://en.cppreference.com/w/cpp/utility/optional, I wonder how one can misuse “When an object of type optional<T> is contextually converted to bool, the conversion returns true if the object contains a value and false if it does not contain a value.” to write obfuscated code or hide back doors using optional<bool>)
As far as I'm considered, the purpose of std::optional is so you don't have to allocate memory on the heap and then check for null pointer. I don't want it to throw exceptions, just like I don't want the language to check the validity of a pointer every time I want to dereference it.
> There are significant challenges to migrating existing, large, C and C++ codebases to a different language – no one can deny this. Nonetheless, the question simply must be how we can accomplish it, rather than if we should try.
Um... no. Not every (or even most) existing large C/C++ codebase should be migrated to another language.
"But security vulnerabilities! And crashes!" I admit all that. But a program can be useful and therefore valuable, even if it crashes at times. Rewriting the program adds value only by removing crashes and exploitability. In many cases, that's effort that could be spent in more valuable ways.
This amounts to saying that we should accept codebases continuing to contain exploitable vulnerabilities indefinitely. Perhaps for codebases that have a finite expiry date that's tolerable, but for a codebase that's expected to be maintained indefinitely I don't see how it can possibly be worthwhile - a rewrite will be a one-time cost, whereas exploitation is an ongoing cost that will surely exceed the one-time cost eventually.
> This amounts to saying that we should accept codebases continuing to contain exploitable vulnerabilities indefinitely.
Right, why shouldn't we?
I have a CLI application for storing TODOs. It never accesses the network, I don't really care if it crashes, it does not have an expiry daye.
I have a constant amount of time to work on it. To me it is more value to put that time into new features that save me a couple of seconds here and there as a daily user, than to put all the time allocated for months or a year of working on the app to re-write it in a different language to remove the occasional monthly crash I don't care about.
For me, the decision is a no brainer: my time is better spent in the stuff that adds more value, and "avoiding exploitable vulnerabilities" is not it.
I suppose that this is the situation for many apps.
You are claiming that this is wrong. Prove your claim.
If it's a personal app that you're not going to share with others then that app does have a finite expiry, admittedly in a slightly morbid way. I'd submit that an app with a truly stable userbase requires an impossible level of fine tuning - in reality an app is either growing or shrinking.
A program can fail in exceedingly many ways. It is basically impossible to formally verify a program. It’s great to start a new project in a “safer” language, but porting to another language is a different thing.
So for example, let’s take SQLite. It is written in C, but it has an insane amount of tests. Would it benefit anyone to rewrite it in Rust? It will definitely be much more buggy for a long time.
> So for example, let’s take SQLite. It is written in C, but it has an insane amount of tests. Would it benefit anyone to rewrite it in Rust? It will definitely be much more buggy for a long time.
I bet it wouldn't be, actually. In my experience porting between languages is much easier and safer than people tend to think. Meanwhile even with all their tests (which certainly have a maintenance cost) SQLite has been known to have memory safety bugs.
An additional issue is that some sophisticated C++ doesn’t always translate easily into other languages. It isn’t just a fairly direct reimplementation but a legit redesign. That will be a bug factory, especially for the kinds of codes that tend to be difficult to translate, as proving equivalence won’t be trivial.
I'd submit that the kind of code that's difficult to translate - that is, code where it's not clear where the responsibility for the lifecycle of a given piece of memory lies - is already a bug factory.
Not at all. Modern C++ can express some memory safety and lifetime models simply and elegantly that are difficult to express in other systems languages. It doesn’t define one for you by default but it also doesn’t limit you to a single model that is clearly inappropriate for some important systems code.
The bug factory, in many cases, is a consequence of having no way to properly express lifetimes in languages that only support a single lifetime model (or no lifetime at all in the case of C), therefore requiring unsafe hacks and workarounds. If, for example, your entire address space is accessed via DMA then Rust’s memory lifecycle model breaks, and this is a canonical design characteristic of all high-performance database engines. You can trivially design C++ constructs that automagically handle lifetimes under these constraints; in other systems languages you have to do a lot of fiddly manual resource management in unsafe code blocks.
The idea that a single memory lifecycle model is appropriate or optimal for all systems applications is objectively wrong. C++ doesn’t implement other formally verifiable safety models but it provides the tools required to elegantly build applications using them and largely hide making them safe.
This is one of the well-known strengths of modern C++: the ability to implement many different formally verifiable memory safety and concurrency models as first-class constructs. Not every application needs it but some, certainly everything I work on, definitely do. I don’t disagree with the objective — I highly value the ability of the compiler to ensure that my code is safe.
Yet Microsoft and Google, despite their C++ investment into compilers and ISO seats, are also investing into hardware memory tagging, forcing static analysers down developer throats no matter what, while slowly adopting other AOT compiled languages on their products.
Because while Modern C++ does indeed improve the memory safety and lifetime models, a large majority of the C++ community doesn't care about modern C++ features and has even started the Orthodox C++ campaign.
Not every program has such a well-defined life cycle that fits into Rust’s memory model. There was a great post on why the wayland library’s rust implementation was abandoned. There was basically no point of Rust’s memory model there over C.
For new code, absolutely. But rewrites, especially when the new language can’t necessarily give huge safety guarantees as in this case are very bug-prone.
You haven't known suffering until you tried to rewrite a large C++ codebase in Java. No clear ownership you say? All those members that clearly belong to one object suddenly have to be guarded against accidentally having their references shared, all that clear math code turns into an indecipherable mess and you can forget about lifecycles unless you guard every resource with a try block. Sadly I end up writing Java code from time to time because I can write passable swing UIs faster than I can set up Qt.
Quake III appears to be C code. The argument for (automatically) rewriting C or nearly-C to Rust is stronger than for C++.
Curiously, they seem to have found just one, inconsequential, memory usage error in the entire, large C program. This calls into question the frequently repeated assertion that there is no substantial and correct C code.
Yeah, and clang-tidy gives another; I just typed the first example and got both -Wdangling-gsl and "Std::basic_string_view outlives its value [bugprone-dangling-handle]" from clang-tidy.
It doesn't invalidate the point of the article, that these things perhaps should be easier to avoid, or impossible to express. You can code safe-ish C++ with high warnings settings and enough liters and static analysis tools backing you, but it's not ideal experience.
I changed teams at work (Google) to one that uses a C++ codebase that has survived for 20 years. As someone writing C++ for the first time, clang-tidy has been huge. It stops so many screw ups before I get to the code reviewer.
I've not found C++ as bad as I worried, partly because Google's style guide limits the amount of silliness you can do, partly because I have avoided memory issues by just never allocating to the heap using `new` with smart pointers, but mostly because Clang and Clang-Tidy have helped me avoid all the footguns.
I wouldn't write a new project in C++ (especially outside Google where the package management ecosystem is non-existant), but I'm no longer of the opinion that I would never join a team just because their current codebase was written in C++.
Those C++ bad, complex, unsafe etc. etc articles are starting to get boring.
If one wants to shoot him/herself in a foot it is fine. C++ offers countless possibilities.
It (and plethora of libraries) also offers quick way to write sophisticated and performant applications without much fuss.
Make your choice. I personally use C++ to great advantage and find it very productive and safe. And while being good programmer / designer I am not a C++ expert. Far from it.
Or use language of your choice. Nothing is wrong with it. We do not have to live in the world where "there can be only one".
In a way, I agree. To me, C++ is an absolute train wreck of a language and choosing it for a new project borders on malpractice.
But if people want to use it and it doesn't affect me, there's a limit to how much energy I'm willing to spend trying to talk them out of it... especially if they are a potential competitor, in which case I might nod encouragingly when I hear they're using it.
> To me, C++ is an absolute train wreck of a language and choosing it for a new project borders on malpractice.
Personnally I tend to think exactly the opposite. And choosing a brand new hype language because "it's shiny and fun" on new projects that will have to be there for 20 years is just a sign of being immature and a malpractice.
There is no guarantee your shiny language will be alive nor even supported on my next gen platform. When for sure, the good old safe set: C++, C, JS, python, Java will be there and alive, even in 20 years from now.
And the result you will probably struggle and spend more time to get your nice shinny Rust code running correctly on iPhone/Android 20/NG-Cloud than you will ever to debug a damn core file in C++.
To comment ironically on your post. It is more to "zealots" and "evangelist" like you that I personnally refuse to talk. These people are often more interested with playing with the last fancy tech available than producing anything useful and sustainable in their work.
> choosing it for a new project borders on malpractice
Did it multiple times recently and I’m fairly confident in my choice. Here’s the main reasons.
1. Interoperability. If you’re writing a web service which only needs TCP sockets and local files, standard libraries of all modern languages get you covered. However, many desktop applications need to consume large C or C++ APIs implemented by operating systems. Maintaining FFI wrappers is expensive in the long run.
2. Library ecosystem. For HPC only Fortran has comparable one, for game development only C# has. The rest of the languages aren’t even close for these areas.
3. SIMD intrinsics are awesome for performance. They slowly appear in other languages, but so far, the support in C and C++ is just better. Probably because the support is first-party, by Intel and ARM.
4. Tooling is good. I use debugger, CPU and GPU profilers almost every day.
And, I will nod encouragingly, knowing that you will spend hundreds of times as many hours waiting on builds as I will spend finding and fixing any bugs using your compiler might have helped avoid; and, knowing that you will find overwhelmingly fewer experienced coders available to help when you need them.
In the past decade, I have spent more time on filing compiler bug reports than I have on tracking down and fixing memory usage errors. Rust does not solve a problem I have. But its Node.js-like dependency milieu worries me.
That said, I wish you good fortune with your choice. I do not doubt you will find it. But if you do, it will be a result of your work, not your choice of language.
It's interesting that one of the things C++ got right was having value semantics by default and most of these "problems" are the result of using either references or types that behave like dumb references/pointers such as std::string_view.
The deliberate omission of std::span::at() is annoying. The paper that introduced std::span was titled "span: bounds-safe views for sequences of objects," but there is actually no bounds checking in the standard. It's only true with debug builds that also use slow, debug versions of the STL, which have almost negligible value compared to just using sanitizers.
D is garbage collected language. I've spent the past weekend writing some toy programs in D, and it was very pleasant experience, mainly because of the GC - I didn't have to think about allocating arrays or strings, I could just use them. There are ways to circumvent the GC, like RefCounted, but the baseline is different in D and C++.
Many of the comments here are focusing on Rust as the obvious alternative, but it's great that the article twice also mentions Swift and Rust together as the obvious choices. If Swift ever gets more adoption outside Apple's platforms, it will be a game-changer. It's a very elegant language with much stronger typing than C++. It feels like writing in a scripting language but retains the type and memory safety.
I moved away from full C++ development in 2006, yet C++ still is the tool I reach for in native code, because many libraries are only available in C++ or C, and I definitly will not be writing C unless obliged to do so.
Microsoft had a project to move Windows mostly into .NET, sabotaged by their Windows team, which a decade later rebooted the ideas using COM instead, an idea that has been a commercial failure, while everyone just keeps migrating to .NET.
Microsoft Security team pushed a best practices paper that new projects should use:
1. .NET
2. Rust
3. C++ with Core Guidelines, with Visual C++ analysers turned on
4. They also have a security annotations for C and C++ code (SAF)
Google castrates the use of C++ to app developers in Android to writing native libraries to be consumed by Java and Kotlin, is behind the effort to adopt hardware memory tagging alongside ARM, the reason why Linux kernel no longer uses VLAs and the big pusher for Rust on the Linux kernel.
When you're writing systems software you need systems software engineers and most of them are far more competent in C/C++ than, say, Rust, Ada, Go, etc.
We choose C/C++ because it's what we know and what our colleagues know, but I think a lot of us wish there was a better alternative.
Not to take away anything from your post, but Go is not a systems language. No language with a mandatory garbage collector can make that claim since it makes some systems code effectively impossible to implement.
I agree, but some people don't. When I interviewed at Google a couple years ago, they were rewriting the Fuchsia network stack in Rust. It was currently written in Go. My jaw about hit the floor when I heard that. I'm guessing they realized it was a mistake, but then again maybe the Go version was just a temporary placeholder. IDK, didn't get the job.
Rust is an entirely adequate systems language for most purposes. I think it could probably replace C for almost all purposes except in cases where extreme portability is required, which C excels at.
C++ is really only the answer if you need extreme performance and/or expressiveness out of a systems language. Some code, like database engines, really benefits from that.
Uh, if you think that's bad, I once worked at a company that made communications gear for first responders and the military. Most of their stack was written in Python 2. I'm not talking UI, I'm talking systems software level stuff. Python tied together with dbus on top of a shake-n-bake Yocto distro that was out of date. People's lives depended on that tower of crap.
It isn’t an “anti-GC crowd”, it is based in pretty solid theoretical computer science with large amounts of empirical evidence behind it. GC-based environments are incompatible with schedule-based safety, optimization, etc which are major optimizations and design elements in modern systems.
No one has ever demonstrated a systems architecture that can outperform a state-of-the-art schedule-optimized design. This result is expected in theory. There are several optimality theorems, treated as a soft limit, that only hold true if you don’t control the schedule. The requirements of GCs guarantee you don’t control the schedule. State-of-the-art system designs, in non-GC environments, aren’t limited by those theorems and frankly run circles around GC-based systems. I work with a lot of companies that run nothing but managed languages and even they don’t believe that produces an optimal system, just an adequate one for non-intensive use cases. And that is a legitimate position.
You clearly are deeply invested in the superiority of GCs for all use cases. That’s fine. I make a lot of money replacing them with empirically much higher performance systems. There isn’t a lot of computer science to support the pro-GC position if performance is the objective. And to be clear, the loss of performance is integer factor, not something that can be trivially dismissed.
And I make money replacing C++ systems with ones written in AOT/JIT managed languages, with different kinds of automatic memory management.
The last time I did full stack C++ development was in 2006, nowadays its use is constrained to a couple of unavoidable native libraries, or GPGPU shaders.
Not everything can be proved in practice when management prefers to give money to the ones that sabotage projects like Windows Dev team has done to Longhorn and Midori efforts.
Are you aware that for quite some time any of your Bing searches using Asian servers were powered by Midori?
Thankfully this is a problem that eventually will get fixed by generational evolution, pity I won't be around to fully witness it.
I agree with you that a GCd language, almost by definition can’t be a low-level language.
I’m not familiar with schedule-optimized design though, could you expand a bit on it?
But I assume it can’t easily be used with routinely changing design requirements, with non-obvious object life-times, like most business applications, CRUD apps —- which is the primary use-case of high-level languages.
Each of those is also using and promoting Rust now as an alternative. Security is a major driver for that as each of those has been dealing with security issues in their C++ code bases regularly. They won't stop using C++ overnight of course but they are vastly less likely to use it for new things and are actively replacing C++ components with Rust equivalents at this point as well. In Google, which developed Go for the same reason, Rust is also competing with that.
Mission critical software projects, by their nature are started rarely and make very conservative language choices. By its nature, much of it is old, so it's in C++. There's also a fair amount of it in C# and Java, too.
the problem I see with the author's very narrow point of view is that Rust, C++ or C are just tools which can all be misused and produce defects when used improperly or without a good understanding. Rust borrow checker can be implemented in a C++ compiler on a subset of the language which is ironically also what rust does, unsafe parts of Rust well are still unsafe... Already C++ compilers and static analyzers (thank you llvm) got surprisingly good at detecting memory unsafe issues (the author's string_view dangling reference for instance). To really solve a broader class of issues (including logical issues), in 2021, I am frankly way more excited by languages with advanced type systems (for instance Idris) than by Rust.
Agreed! It's a tough world out there in the fight for relevance. Besides Go and Rust in the recent languages that more or less clamor to compete with C++ are Swift, Nim, Zig, D etc. and honestly I don't know if Rust or any of those will save us from human error or incompetence.
Is there a safe subset of C++ defined somewhere? Can I turn off some of these ridiculous "features"? As we add so much to the language, the number of footguns increases quadratically.
C is still a widely used and beautiful language. It is "safe," in the sense that the machine does more or less exactly what you ask it to. Allocate memory when you need it, and free it when you consume it / are done with it.
You do not have to use these features if you do not want to.
C++ is very powerful, and has an amazing world of libraries available, but sometimes I feel like it is at least two languages at the same time.
For example, one benefit of using C++ over C is writing C style programs with modern libraries and convenient containers in the STL, but you quickly learn that C style programming isn't supported across the board, and hitting that wall can be jarring and unexpected (Looking at you openCV - just try to allocate space for a struct containing a cv::Mat)
I sympathize that compilers do not do "enough" of this for us. But there's really no substitute for a good code standard and developer culture focused on quality.
Unfortunely C++ inherited the perfomance focused culture from C and sees such kind of tooling as needless bloat, only required for developers that need an extra hand while coding.
If you check other well known survey among C++ community you will see similar results in static analysis tooling and coding standards adoption.
So while, contrary to the C community, WG 21 members do strive to push the code quality forward and publish tooling that helps to enforce it, the adoption could be much higher than it actually is on the field.
I used to write a lot of C++ (like 14 years ago at this point). I haven't really paid any attention to it in that time. I am looking at the code examples and what I see is a language trying to adopt features from Java, Rust and others - but where in those other respective languages there's like just 1 thing going on in a single line of code. In these C++ examples you have the namespacing stuff, templates/generics, implicit operators and constructor calls. It just feels like exactly what you would expect of a language like C++ to try to adopt the native features of these other languages. Which isn't a bad thing, but it just screams to me like wouldn't it just be easier to switch languages?
That all being said, I do get the "rush" C/C++ programmers feel when they write multi-threaded socket servers. You step back and go whoa - that worked?
It is, in fact, never a surprise when a modern C++ program runs the first time without faults. That is an important difference from C, and a quality it shares with Rust.
What is missed by the authors of all articles of this kind is that, while all of the errors cited are possible, none are tempting.
For example, while it is easy enough to overindex an std::vector<T>, the language provides a "for" statement that exactly walks the vector with no possibility of overrun.
Using an empty std::optional<T> may seem like a danger, but nobody uses one unaware that it might be empty; it needs different syntax to look into than a regular value.
std::string_view and span, similarly, may seem foolishly risky, but they are always safe when passed down a call chain. One does not find them returned from functions in a responsibly designed codebase.
C++ is a box of sharp tools that a responsible engineer can use to make fine, reliable software. We have, today, many, many times more responsible engineers writing good C++ code than we have Rust coders in total.
Work making C++ code safe and reliable thus has a much bigger impact on the world than anything ever done with Rust. That will be true for a long time to come. That is not a reason to avoid Rust, but it is a reason to remain respectful in interaction with those engineers.
Modern C++ is extremely powerful, more so than any other systems language, but not simple. I’m not even a fan of the language per se, but there aren’t any alternatives currently that are as powerful or expressive and I actually use that power pretty regularly. You can’t express modern C++ in other languages with similar code gen in remotely similar lines of code, which is its unique strength.
In my experience, if you use the full idiomatic toolbox of modern C++, code mostly works the first time. That was never the case when I was writing C, or even legacy C++ (which was terrible).
I did a lot of C++ pre-C++11, and then jumped back straight into C++17, and I confirm the experience of other commenters - in modern C++, "it works the first time" is normal. It wasn't in the old C++, usually when my code didn't crash after compilation, it was a sign there was a gnarly bug hidden :). But the improvements over the past ~15 years made a huge difference.
That said, even though modern C++ is a safety razor, you're still playing with a sharp object. Most of the footguns of yore are still there. Compilers got better at telling you when you're doing something stupid (I can't imagine writing C++ without turning on almost all available warnings), but if you stray away from (or abuse) the modern components, you'll have a bad day. Usually around unintentional dangling references.
The other day I did a quick refactor and managed to crash the app. Reason? I used a unique_ptr after moving from it. Compilers and liters usually catch such dumb mistakes, except in this one case they didn't (neither MSVC, nor Clang, nor clang-tidy). So you still end up overcompensating with tests, hedging against edge cases of your tooling. But it's much, much better than it was before.
> In these C++ examples you have the namespacing stuff, templates/generics, implicit operators and constructor calls.
Yeah, I have two separate pieces of opinion on it.
One, C++ is a peculiar language in which any new addition involves extreme amount of "language lawyering". I imagine the Committee and authors of high-profile libraries to be like people from Suits (the TV show) - figuring out ways to "thread the needle", navigating around all the accumulated rules to make it possible to implement a new feature, possibly with a new DSL, using only the existing features of the language. That's how you end up with ridiculously crazy mess in the templates[0].
Two, C++ is a poster child of Greenspun's Tenth Rule[3]. Standard library has drawn a lot of inspiration from Lisp over the years, and templates are essentially used as a half-working macro system these days. Some Lisp experience is actually useful here, because it lets you discover things you could do with a programming language that you never imagine you could. Knowing both the end goal and how it could be expressed in a language that was designed for it makes it easier to understand the crazier template constructs in C++.
> it just screams to me like wouldn't it just be easier to switch languages?
Maybe. If I were in charge of version 2.0 of the C++ codebase I'm working on, I'd probably consider Rust or .NET. But modern C++ is good enough, and we have good testing & QA culture, so there's no reason to change it. And for people like me, for whom C++ was their gateway to programming, it's actually somewhat pleasurable to work with.
--
[0] - Template metaprogramming is essentially its own separate language since at least C++11, but the community still isn't keen on treating it as such. I'm actually hoping C++ will adopt metaobject protoco... er[1], I mean metaclasses[2] - together with concepts (already in C++20), it might give a chance to "refactor the rules" as it is. Backwards compatibility is the sword of Damocles here, the promise to make a mess forever hanging above any improvement to the language, but the language lawyers are formidable, so maybe they'll make it work.
I completely agree with the points from this article. It is fundamentally infeasible to make all C and C++ code safe. The main reason, PAST code. There are countless lines of unsafe code already written. However, there is nothing magical about Rust's facilities (e.g. borrow checker) that make them inapplicable to FUTURE C and C++ code. I believe this is the only way forward for C and C++. They need to offer mechanisms that guarantee that any code written today can be provably memory and thread safe. That of course requires yet another tool in the myriad of tools that exist for C and C++.
Language semantics matter. If a language doesn't place very strong meaning on certain things from the get-go, no amount of static checking will ever be able to guarantee certain traits of the code written in it, because it doesn't have enough information to work off of. And I'm not just talking about undefined behavior.
People have been writing safety checkers/linters/etc for C and C++ for decades. Many of them are very impressive and useful. None of them can ever be totally sound, as a fact of the languages themselves.
by the same token you could say it's also fundamentally infeasible for Rust to be code safe, otherwise they would not have unsafe code... so in the end it's an article that says nothing interesting at all
The ability to occasionally opt-out is both a feature and a necessity because certain systems are inherently unstable and offer varying guarantees. I’m arguing for an opt-in feature in C++. Of course there is no panacea.
The typical, well known blog pattern for writing against C++: Show some buggy code an then conclude that C++ is wrong. Instead of asking: Who is wrong, the programmer trying to speak the language or the language itself? As always: Only use what you understand how it works. Instead of directly escalating it to cardinal questions of language superiority he had better used his time constructively and wrote a tutorial on how to correctly use things like string_view and span, that are a typed reference to memory that exceed the live time of the string_view.
> Who is wrong, the programmer trying to speak the language or the language itself?
This is more complicated a question than you think.
If the language is such that you always have to look up some particular feature, then it's likely the language is wrong in that the designer chose that syntax poorly. OTOH, if the programmer just "threw something together" that happened to be incorrect, then obviously the programmer.
This latter example is actually where having good error-messages can come in: pointing out what is wrong, and perhaps some relevant location in the language-definition. — Compare and contrast Ada and C++ error-messages (for "typical implementations") here as an example.
C++ is a systems programming language. Just like C, it allows you to do anything you want to do and be as close to the hardware as you want to be. This is completely unnecessary if you want to build something that displays pretty pictures on the screen in a web browser, and you really should not use C++ for a web back end unless you have really hard core performance requirements. But when you need to get close to the hardware and don't want to write assembly, your only real choices are C and C++.
To meet it's design objectives, C++ NEEDS to allow you to do things that are unsafe and it's up to you if you want to use them or not. std::span literally exists only to allow aliasing and the documentation warns you that it's up to you to to ensure that there aren't lifecycle issues. If you want lifecycle managed, don't use std::span, use copies but then you have the performance hit of copies. If you don't want either, use one of the libraries that handle reference counting for you but then you have the overhead of reference counting. If it's important not to have ANY overhead and are certain there are no scope issues, how else would you handle it? The difference between std::span and a raw pointer is mainly a) documentation for programmers reading the code and b) to make it easier for linters to find issues with how std::span is used. Similarly, if you want to capture by reference in a lambda, go right ahead but then ensuring that the underlying objects don't go out of scope is up to you.
Use something like Rust or <insert language de jour here>, and you either have to lobotomize the language and give up the "safety" benefits the proponents talk about. If you don't lobotomize the language you either can't get what you want done or you give up a massive amount of efficiency.
When the rubber actually meets the road, and you try doing things like introducing Rust or <insert language de jour here> to something like drivers for the Linux kernel, you immediately run into serious problems that show you why systems programming languages are needed. See for example the recent discussion about introducing drivers written in Rust into the Linux Kernel.
There are tools to do fairly hard core checking of C++ code and if you really must you can write your own for things specific to your project using the clang libraries which you can call even from python and which do much of the heavy lifting for you. These tools just aren't built into most compilers and aren't required by the standard. They are however part of most serious development workflows. Further, if you actually make the effort to learn the language, code that looks remotely like what is written for the blog post is going to set off your spidey sense in about a millisecond. Nobody said using C++ was easy. The fact that it scares of dilettantes, and that when a dilettante makes it onto the team, their lack of skill is immediately obvious in their code is a feature, not a bug. It ensures that code quality and team quality remains high.
C++ (and C) is designed to solve hard problems or achieve performance you can't with other languages. Hard things are called hard because it's difficult to do them. Otherwise they would be called easy.
If you aren't even willing to read the documentation for std::span or lambda captures or plausibly read them and choose to use them in ways the documentation tells you will cause problems for a blog post, you are just wasting everyone's time.
Note that there is in fact a recommended subset and set of code guidelines of the C++ language that it's suggested be used for new projects. Still under development and heavy revision but already very useful. https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines. clang-tidy has options to check for conformance to these and much more. The Core Guidelines aren't going to allow you to start writing production quality C++ in a day or anything like that. There will still be a steep learning curve. But if you setup your linters properly and read the Core Guidelines (and a couple of good C++ books) you won't be quite as bewildered about how to start.
C++ is a beautiful, pragmatic and very efficient language, just constrain your use of it to STL, and avoid Boost and other over-engineered dependencies.
The Ockham's razor principle of code design is not only important but essential when using large mature languages that have many surprising and nuanced ways to skin a cat. Being too clever with C++ is a recipe for disaster.
Avoid OOP (inheritance in particular) and the sacred art of templates metaprogramming, but also avoid the lowlands of using C++ as a version of C language (with malloc everywhere), and you shall be rewarded in this life or next.
OOP and Boost also Templates has benefits when it comes to increasing barrier of static reverse engineering.
For example, Security researcher "Marcus Hutchins" famously echoed that Boost is a cluster fuck of OOP sadness [0]
Similarly, a close friend of mine "Omer Yair" mentioned [1],
"A well written OOP malware might be harder to RE statically than a poorly written C code. Writing OOP malware badly though just makes it similar to C code so not sure of the benefits going that route."
I'm not sure I understand your points about Boost - everything the article discusses is in the STL, is it not?
I'm also not sure I understand your point about being too clever and template metaprogramming - nothing the article discusses involves template metaprogramming, and I'd say it's doing very straightforward things. Do you think some part of this is too clever?
Also, I'm not sure I understand how to reconcile your "Avoid OOP" advice with the advice to use the STL. Doesn't using the STL mean making use of OOP?
STL and OOP are orthogonal, e.g. you can use STL without ever using the concept of inheritance.
Boost is too popular imo, and many people take it for granted when using C++ that it comes with this nice utility library to complement STL, kind of like another python package, no big deal.
From what I've seen on multiple projects it creates an explosion of poorly understood (and often poorly implemented) dependencies which leads to maintenance nightmare, memory leaks and bugs, performance issues and production outages.
If you can avoid using an external dependency such as a gigantic library full of experimental features created by thousands of C++ enthusiasts avoid it at all costs. as it tempts the team to use it left and right when it's not really needed, just like templates.
Templates, if used without strictly complying to the KISS principle can get unwieldy, full of nuanced, intricate and unnecessary abstraction, and make it hard to understand and maintain the code
C++ is a giant language and its use on a particular project has to be very tightly constrained to a very minimal set of features and dependencies.
What do you call "wisdom" that's passed from person to person, so everyone knows it, but it isn't actually true?
A huge amount of Boost is nothing but headers. And it's very easy to use those header-only libraries and avoid the rest. Sure, many people have just pulled in the entirety of Boost... but it's really strange to blame Boost for those bad choices.
By the way, a lot of what today is standard C++ had its origin in Boost. smart pointers, thread, regex, random, ratio, tuple, etc etc. All came from Boost, and those of us using Boost were happily taking advantage of it, while folks who believe the "wisdom" you just shared were building it yourselves.
well i've seen it used "enthusiastically" for the lack of better word, i.e. once someone on the team starts using one header, with some relatively well proven and robust feature (like smart pointers), it creates a temptation for them and other team members to use anything else available in that giant library "for free".
And some features there could be more experimental in nature and some are either not intended, or not really needed for a simple use case you have at hand, so unless you veto each new #include by a panel, it becomes a giant cluster fsck full of infinite permutations of advanced, and poorly understood features which creates very fragile foundation for the project and maintenance nightmare.
Since the good parts of Boost are already in STL, i always recommend to just stick with (very minimal set of) STL, don't use what you don't need right now, and avoid the temptation to throw every available library or "neat" feature at a problem (which might save a bit of thinking time short term but open a can of worms long term)
> And some features there could be more experimental in nature and some are either not intended, or not really needed for a simple use case you have at hand, so unless you veto each new #include by a panel, it becomes a giant cluster fsck full of infinite permutations of advanced, and poorly understood features which creates very fragile foundation for the project and maintenance nightmare.
how is that better than the usual 2021 project having a package.json with 350 sub-dependencies
Speed is a misnomer there, leading to "Java is faster than C" type of benchmarks. You (and most experienced programmers) care about determinism. If we can fix the minimum and maximum, we can make it work; otherwise, beware all ye who enter here.
Don't remember where I read this (twitter?) but it feels so true. Sure you can get a better chair arrangement with enough effort, but it doesn't matter since the boat is sinking.
I hate how people group C with convos like this. Sure it has problems, but does not have any to the insanity that C++ can, let alone hiding it like C++ can.
There is a reason people jokingly say C++ can blow your leg off, while C is just a foot gun.
C++ is just a massively bloated language.
If you need native code just use C, Rust or even assembly. Stay away from the C++.
C++ is a massively bloated language. But it is also vastly more expressive and more safe than C, and also significantly faster at runtime. It just has a stupidly high barrier to entry.
C++ cannot be safer than C as long as it remains backward compatible, which would mean becoming a fundamentally different language. You can't fix a leaky pot by adding another leaky pot on top (err, or something...).
Also "significantly faster at runtime" seems like a stretch. C++ doesn't have any fundamental features over C that would affect performance (for instance, it doesn't fix the pointer aliasing problem).
The poster child for C++ being faster than C is std::sort() vs qsort(). The latter requires a function pointer dereference for each call to the user-supplied ordering function - the former does not. There are many similar opportunities for C++ to be faster than C.
The solution for C would be to "stamp out" a specialization of qsort() for a specific data structure either manually or with code generation. The C++ template system really isn't some "magic performance pixie dust", it just saves some typing ;)
...or alternatively, with global optimization (LTO/LTCG) there's actually a good chance that the comparision function calls of a C qsort() implementation can be "dissolved" via aggressive inlining as long as all required parts are visible to the compiler - that requires a lot of trust in the compiler though (on the other hand, C++'s "zero cost abstraction" philosophy requires the same level of trust).
Much of what you say is true, but I'm not sure that this is:
> (on the other hand, C++'s "zero cost abstraction" philosophy requires the same level of trust).
The C++ Standard _requires_ that a Standard Library implementation implements certain performance guarantees - I don't think the C Standard has such requirements. This means that certain implementations are effectively required - for instance that std::map is implemented (for better or worse) as a red-black tree.
To be clear, the standard mandates the complexity of most standard library functions. In no way it mandates zero copy abstraction (and it is not clear how one would even word that). Features are usually defined in such a way that zero cost abstractions are at least known to be possible though.
Making a language more expressive at risk of making bugs less obvious is not a good thing. The goal with expressive syntax is to make code easier to read/write, not deceive the developer.
Also C++ does not have a faster runtime than C. C has hardly any run time requirements. Mainly just setting up the stack, compared to C++'s runtime requirements I don't see how anyone could claim the C++ runtime environment is faster.
Anyways, my point is that C and C++ have diverged far enough these days bundling them together is not really productive nor accurate.
After years of C++ and six months of Rust, I don't think C++ can catch up with modern languages with template gimmicks. Getting pointers right needs global analysis. Getting decent error messages about inconsistencies between point A here and point B way over there requires global analysis. Trying to build a borrow checker with the C++ template and type system is pounding a screw.
Rust has its own problems. Figuring out how to do something safely can be quite difficult. It can involve solving puzzles, and often involves rewriting things at several levels, especially when threads are involved. As a result, "unsafe" is too often used as an escape hatch when someone can't spend the time to get it right. Those who slave under the whips of "agile" may be forced to such hacks.