Hacker News new | past | comments | ask | show | jobs | submit login

This scratches the surface of why I hope C slowly fades away as the default low-level language. C sounds simple when you look through K&R C. C lets you feel like you understand the stack, ALU and memory. A pointer is just an integer and I can manipulate it like an integer.

But the reality is filled with a staggering number of weird special cases that exist because memory doesn't work like a simple flat address space; or the compiler needs to optimise field layouts, loops, functions, allocations, register assignments and local variables; or your CPU doesn't use the lower 4 bits or upper 24 bits when addressing.

C has no way to shield common language constructs from these problems. So everything in the language is a little bit compromised. Program in C for long enough and you'll hit a lot of special special cases – usually in the worst way: runtime misbehavior. Type punning corruption, pointers with equal bit representations that are not equal, values that change mysteriously between lines of code, field offsets that are different at debug and release time.

When using fundamental language constructs, we shouldn't need to worry about these ugly special cases – but C is built around these ideas. The need to specify memory layouts, direct memory access and other low level access should be gated by barriers that ensure the language's representation and the machine representation don't tread on each other's toes.

Rust has a long way to go but is so much more robust at runtime. I think there's room for other languages to occupy a similar space but they're need to focus on no-std-lib no-runtime operation (not always the sexiest target).




The following is purely my opinion.

Rust might have gone too far the other way. Yes it strives to be a modern language, with a lot of functional programming features and "OOP" done right (aka no inheritance, simply method polymorphism). To me Rust is more a C++ replacement than a C replacement.

A replacement for C should try to be as simple as possible while fixing C weak typing mess with strong typing for instance, including when it comes to pointers (for instance like in Ada where you can't just take the address of anything you want, you need to declare it as aliased to begin with). In fact Ada tries hard to make pointers redundant which is a great thing. This and range types and bound check arrays( Ada has a lot lot of great ideas, and "C style" Ada is a treat to use).

So a C replacement should keep the procedural nature of C, with closures perhaps, but not go further in terms of paradigm. C aficionados often claim they love C because it's "simple"(it isn't) That's what they mean, they love C because you can't really do OO with it.

On the other hand, Rust macros are exemplary of what a good macro system should be.


> To me Rust is more a C++ replacement than a C replacement.

People make this comparison a lot, but it’s not fair. Rust is far simpler, in terms of number of features of the language, to that of C++. It’s fair to say that Rust is far more expressive than C, but this just represents an initial learning curve that is higher than C, but not as high as C++. The type system is fairly simple in Rust, but it can be used to create very complex things. It does take time to be able to read it, but the guarantees you get from it are very powerful.

> a C replacement should keep the procedural nature of C, with closures perhaps, but not go further in terms of paradigm

Why leave all of the language development of the last ~50 years on the table? Iterators in Rust are amazing, algebraic data types (enums) allow for simple state machine constructs, lifetimes guarantee memory safety, default immutable memory and single mutable owner mean no more concurrent memory modifications. It’s safer than Java, and as fast and memory constrained as C, what’s not to love?

These are amazing features, that allow for amazingly stable programs. To put it succinctly, you get to code once, and then rely on that code for a long time, to focus on building features, not bugs.


The only thing I ask of any C-replacement languages is that they make it possible to describe ABIs, message layouts, and on-disk formats in detail. C historically does that well enough, though not really all that well (e.g., bitfields and enums have issues, and struct packing is not always easy to get right). There is a lot of value to this!

The ability to mix object code from multiple languages (FFI) is critical, and that's why C is the common denominator for FFIs.


I won't do this justice, so I'll point you to the Rust book on this subject: https://doc.rust-lang.org/book/second-edition/ch19-01-unsafe...

In short, at the moment, if you want a stable ABI, you must export an un_mangled C interface. This is easy enough. One of my favorite sites is this for FFI stuff: http://jakegoulding.com/rust-ffi-omnibus/

It covers a bunch of languages, and suggests techniques for overcoming any complexity in exposing higher order types.


I wasn't criticizing Rust, and I'm aware of what you linked. The point I was making is that this sort of functionality is a sine qua non for C-replacement languages. That Rust has it is to its credit, and one reason that I think it is a very good C-replacement language.


C is the common denominator for FFI for OS that happen to be written in straight C, which then mix the C ABI with OS ABI.

That is not true in mainframes, Windows (ongoing replacement of Win32 with COM, it is lot of fun to do COM in C, more so UWP), Android (JNI, AIDL), Web (JavaScript/WebAssembly), iOS/macOS (Objective-C runtime), Fuchsia (IDL).


Re: COM, that works because MSFT can a) commit to an ABI for C++, b) they can stick to a subset of C++ that makes that easy enough. Similarly for OS X.

The point isn't to stay in a C-like world. The point is that it must be possible to specify an ABI, file formats, and so on, in the host language. Yes, we can use external IDLs (even user them as DSLs if you like). I know all about and have worked with quite a lot of them (e.g., ASN.1, XDR, and many many others). But when you get down to a certain level, a light-weight approach really pays off. For example, at the system call level, and for many sufficiently simple file formats and such. At a higher end in complexity, on the other hand, then ASN.1 and friends pay off instead, mostly in a) obtaining interoperability with implementations written in other languages, b) managing extensibility.

Another thing is that some of us think about how things work at many levels. It's nice to be able to think about an interface and accurately picture what goes on at a low layer. It's difficult to do that with HLLs. But HLLs can easily have features that allow fine-grained control of or exposure of low level details, and that's what I'm saying I want to have as an option.


COM and WinRT are still based on the C ABI though. So anything that can do that, can do them - hopefully with more layers on top for convenience sake, of course...


Not quite, COM yes, although it is masochistic to do so without any higher level help.

UWP on the other hand extends COM to support inheritance and value types across calls, and the lowest Microsoft was willing to go on their high performance components was the Windows Runtime C++ Template Library, now replaced by C++/WinRT bindings and updated MIDL compiler.


The ABI is still C. Value types are just structs, and inheritance is implemented via aggregation - under the hood, it's all just interfaces and conventions on how they all play together. So any language that can do C FFI, can do WinRT. We did most of it with Python and ctypes as a prototype.

And you could even use that from C back in the day if you #included the "Windows.*.h" headers that shipped with the SDK - they had a bunch of #ifdef __cplusplus etc in them. Not sure if that stuff is still there or not.


Why would you think that Rust is "far simpler" than C++? Exactly the same concepts (with extras on Rust's side because of the ML-like constructs) must be learned for both, except they're distributed differently on the learning curve.

And I wouldn't say that Rust's safer than Java either. Memory access errors are basically non-existant in Java and it has quite robust concurrency primitives and libraries.


That’s based on my experience of working with C++ on and off over the last 18 years and reattempting to learn some of the modern pieces of it, as compared to learning and working with Rust over the last 3.

I find Rust simpler, more concise, and nearly no footguns.

Edit, I didn’t see your Java comment:

> I wouldn't say that Rust's safer than Java either.

Here’s why it is safer: It has the same set of guarantees about memory safety as Java. In terms of concurrency, it is safer because of the single mutable reference guarantees across threads, enabled by a better type system. The type system is stronger, which allows for more details about code to be put into types, like algebraic data types of Rust, without the limitations of Java’s non-generic enums.

And the big one: no bare nulls allowed. This reduces a huge set of logic errors. Now Java is getting Value types in a future release which will allow for a true Option type in Java, ie one that cannot itself be null, the other points still stand.

On top of that, in large applications the GC can become an Achilles Heel, where it eventually pauses the application with dire consequences. Rust of course has no GC. And a great side effect of this is that Drop (java Finalizers) actually work in Rust, which is amazing.

Basically for these reasons, Rust is safer than Java.


Well, that's still personal opinion, even if grounded in experience.

My personal opinion is that they have similar complexity, but in general Rust decides to clearly define behaviour as much as possible and fail fast, preferably at compile time. This can be seen when comparing features: memory management is there (supported by the borrow checker), smart pointers are there, references & pointers are there, generics are there (even if not exactly the same), integer and floating point handling are there, concurrency is there (compile-time checking is nice), macros are there (albeit different and perhaps less error prone), string types are more complex than cpp, enums are more complex (but also more powerful), operator overloading is there, iterators are clearly more complex (and the functional aspects in general), etc.

So the average programmer will have to learn a similar amount of concepts or even more in Rust's case.

Where Rust is indeed simpler is error handling and building.

--- I figured you would bring up concurrency, and while Rust's compile-time verification is really nice, Java also has a simpler form in threading annotations. Additionally, I believe it's rare to use such low-level primitives in Java.

Nulls (and type system holes in general) are a correctness issue, not a safety issue, unless we want to argue that all bugs are safety issues.

GC pauses are a performance issue, as you already know.

Saying that Rust threading is safer than Java's is technically correct and Rust is probably a bit safer than Java because of that, but it's hard to argue that Rust should be chosen for a project because of that (this is the whole point of saying X is safer than Y). Java is super-simple to learn compared to Rust.


> that's still personal opinion, even if grounded in experience

I just wanted to make it clear how and why I've formed that opinion.

> unless we want to argue that all bugs are safety issues

Yes, I can definitely be accused of expanding the definition of safety to include all things that cause programs to crash in horrible ways. I look at it as differing levels of guarantees the language makes for you, and Rust makes many more at compile time than others. This adds to the learning curve you mention, is it worth it? Completely depends on if you see those guarantees as valuable.


> Why would you think that Rust is "far simpler" than C++?

C++ as a language is a lot more complicated. It has a lot of features with weird corner cases that interact in unfortunate ways.

> And I wouldn't say that Rust's safer than Java either.

Depends on what you mean with safe. If you mean memory safe, then they are about as safe. Rust has fewer runtime failures due to concurrency and null pointers though.


Not sure why the above comment is downvoted. The number of features in C++ isn't the root of the problem with C++. The real problem with C++ is that it both has a lot of features and that all of those features interact in subtle and surprising ways. Even if Rust had as many features as C++ (which it doesn't, not by a long shot; Rust is a medium-sized language like Python (though with a far less forgiving learning curve than Python)), one of the design philosophies of Rust is that features should interact in predictable ways, and if two features can't interact in predictable ways, then the compiler should do its damndest to prevent them from being used together and explain why.


Rust also has features which interact in subtle and surprising ways. Generics and operator traits bumping into the "coherence" rules is my personal pet peave. I doubt any user would have predicted that interaction, and even though the compiler gives a detailed pointer to why, it's cold comfort.

Honestly, after hating C++ for years, trying to use Rust made me appreciate more about C++. I wish someone would make a language which took the best parts of both.


> even though the compiler gives a detailed pointer to why, it's cold comfort

We'll just have to agree to disagree, because having the compiler watch my back and preemptively guard against unforeseen pitfalls is about the warmest comfort I could ask for. :P


Generics and operator traits ought to be orthogonal and composable. Would you still be comforted if the compiler didn't let you mix for loops and arithmetic in one function? I mean, even if it gave you a detailed error message...


The biggest issue is not how complicated it is, rather there is no good way to prevent people to write C in C++.


Ok then please prove that C++ is more complicated than Rust by listing which features of C++ are not available in Rust. It's nice that the compiler can tell me when I do something wrong, but I still have to know how to write it correctly in the first place. In Java it's more or less code and forget.

And yes, I meant memory safety.


Besides the (desirable) features already listed by someone else, here are a few others, some of which can have unfortunate interactions: implicit conversions, constructors, member initializer lists, inheritance, overloading, default arguments, capture lists, variadic templates.


The biggest ones people ask for are non-type templates, HKT, and sometimes move constructors.


You can still get NullPointerExceptions in Java, frequently.

Rust doesn't let you do that, at least not without conciously deciding to use unwrap() everywhere.


> To me Rust is more a C++ replacement than a C replacement

Precisely. Rust is what C++ would be if it didn't need to be backwards compatible (though I'm not a fan of some of the syntax choices in Rust). Unfortunately, there's a reason C++ needs to retain backwards compatibility and its for that reason that Rust isn't going to replace C++ anytime soon.

> That's what they mean, they love C because you can't really do OO with it.

Until they invent GObject or one of the other OO on C libraries. The anti-C++ sentiment is more because compiler support was really quite dire until fairly recently (C++ 11).


The good thing about no-runtime languages is that you can mix modules written in them, within reason.

So Rust can make inroads into existing C++ code bases where it makes sense faster than we think, without replacing C++ wholesale for a long time. Firefox itsef is likely such a codebase.


That's harder than you would think due to name mangling.

Basically they can only talk to each other through a C interface.


I've been programming C on and off for 30 years. Rust feels too complex to be a C replacement. C++, yes...

C's paradigm is really about working with memory. Anything that restricts memory access with bound or type checking is going to feel like "not C"...


C doesn't have anything to say about whether bounds are checked or not. That's why so many things, including pointers to arbitrary locations in memory, are undefined: the standard was built to accommodate both embedded system implementations where scribbling all over RAM is desirable, and implementations that do things like bound checking.


While I am not a fan of it, D's BetterC mode kind of fits what you are looking for.

https://dlang.org/spec/betterc.html

It's kind of a subset of D language, without dependency on GC or runtime. It feels like cleaned up C. Of course without runtime, most of the standard library doesn't work, so you are on your own for the most part, but then, C doesn't really have that much of a standard library to begin with.


> C aficionados often claim they love C because it's "simple"(it isn't) That's what they mean, they love C because you can't really do OO with it

I do not think C programmers are anti OO. Infact, a lot of C patterns are modeled on OO (struct + function). I think the appeal of C is that you are able to write the fastest implementation any given algorithm, something that just isn't possible in most other languages.


> I think the appeal of C is that you are able to write the fastest implementation any given algorithm, something that just isn't possible in most other languages.

Maybe you can, but lots of the C code I've seen in the last years was pretty inefficient compared to what one would have gotten from a reasonable C++ or Rust implementation:

Examples are inefficient strlen() operations due to the default "string" type, unnecessary copies and allocations of things like strings due to ambiguous ownership, unnecessary null checks to quiet down static analyzers in the absence of non-null references, and extra indirections or allocations in order to work-around missing compile-time generics (besides macros), etc.


> I think the appeal of C is that you are able to write the fastest implementation any given algorithm, something that just isn't possible in most other languages.

If you told this out loud during the 80's and early 90's everyone would just laugh.


True, and I meant this in the context where you could embed assembly into your functions if needed.


That's not a very satisfying answer, as it doesn't apply to the languages typically compared with C- C++, Rust, or even D, Nim, Zig, etc.


I don't think struct + function is any more OO than tuple + lambda is. That's a very shallow definition of OO.


Well, that's how C++ does everything…


You don't need to use any polymorphism (generics or traits) in Rust. You can use it as a pure structs and functions language like C (especially if you're not using the standard library).

The argument in favour of at least some polymorphism is that it helps to minimise highly repetitive code that would be avoided in C through aliasing, type punning or other pointer conversion.


I love Rust, but I don't think that using a subset of Rust is a real option at this point. You'd have to stop using other libraries and even the standard library for this - and while it's possible, it definetly won't make your life easier. Rust has a wonderful community, but it's mostly filled with enthusiasts who're actively using nightly, so if you try to do things in a more limited way, you'll be on your own very often - much more often than for example I, personally, would be comfortable.


While many people use nightly, the statistics show that most use stable, with some using nightly in addition. You just hear about nightly more.


This is true, but doesn't refute my point. If you want not only to write something in Rust, but also to participate in the community and get help from it, it's much easier to do it if you're using nightly.

I consider myself an experienced developer, but unlike other new technologies, which I almost always learn just from documentation, with Rust I often turn to community for help. I understand that this complexity is neccessary and I wouldn't Rust to be easier, of course. But still, with Rust's complexity and overwhelming amount of information that's already out of date with the best practices - it sometimes seems that over a half of google results on stack overflow are still about pre-1.0 Rust. And when we (people who learn Rust) turn to the community, it usually tells us (in the most helpful and nice way, of course) "just use nightly".


Yes, it is a bit unfortunate that many suggest nightly. It makes sense, given that they’re enthusiasts, and therefore know all about the latest and greatest. But, these days, it’s becoming rarer and rarer to actually need it. But that’s only generally; it’s true that embedded often still needs nightly, though that should change pretty soon!

I wish Google didn’t customize results so much; I rarely see anything pre-1.0. It really makes it hard to figure out how to best help :/


Is that still true for libraries? I think that's what he meant, not all rust users as a whole. My impression certainly agrees with him.


If a library uses nightly, then you have to use nightly to use it.

At this point, the only major library that requires nightly is Rocket, and Actix-web has been rapidly ascendant due to (rumordly) being used in production by Microsoft, and working on stable.

The data, from last year’s survey, (which means even more stuff was nightly only, for example, the RLS) where you can choose multiple options https://blog.rust-lang.org/images/2017-09-05-Rust-2017-Surve...


> You don't need to use any polymorphism (generics or traits) in Rust. You can use it as a pure structs and functions language like C

This just means that the Rust language does not have the feature of being simple. Of course, there are sane subsets of C++ also (e.g., do not use new/delete), but this does not mean that C++ is a sane language. The same goes for Rust, probably.


> The same goes for Rust, probably.

Well, at least you have the good grace to admit that you haven't used Rust. :P


I've used Rust. It's not simple. For instance, lifetime annotations are ugly and confusing.

What a snarky reply from someone who already knows that Rust is not simple. The person you replied to was right, and yet you cast doubt on what he guessed to be true.


The person that response was directed at was attempting to use a mistaken similarity to C++ to paint Rust as a language that is not "sane". You appear to be injecting your own interpretation as to what this subthread is about. :)


I assumed you were contradicting this quote:

> This just means that the Rust language does not have the feature of being simple.

I really don't know what it means for a language to be sane, so it didn't make any sense to me that you were replying to that part. My bad.


This is exactly what c++ tried to do and failed.

Of cause you can find a sane subset of c++ / rust that most comfortable with. But once your project accept c++ / rust, nothing can stop them using the other "features" of the language.

Trying to enforce code style drain lots of energy from the project. Maybe a good linter can help......


While I'm not a fan of polymorphism and actually avoid OO completely (except for making data structures out of classes of course) I feel like this could be something easily solved with tools, if making tools were easy.


That was the argument of C++ advocates in the 90s.


Considering that C doesn't have any good facilities for polymorphism, it's going to continue to be the argument of any language that succeeds C. Unless, that is, that successor language also doesn't have any good facilities for polymorphism, in which case we have Go as a fascinating real-world study in how people will incessantly demand them and deride the language for lacking them (which is saying something, considering that interface{} in Go is already better than pretty much anything one would hack up in C via void pointers or macros).


Apparently under certain conditions it is possible to emulate a crude form of ad hoc polymorphism in C.

https://codegolf.stackexchange.com/questions/2203/tips-for-g...


Wasn't it originally a much bigger deal to be able to have the facilities for more generic programming before templates? These days people's threshold for bloat seems much higher, while the penalty for allocation and jumping around in memory are much higher as well.


I presume that you are defining C++'s virtual functions as "not a good facility for polymorphism". Why?


OP said nothing about C++, just plain C.


Ah, I see. I agree.


At least C11 has light generics support.


It was just recently that C++ got some underpowered kind of generics (beyond templates). So, no, I don't think anybody advocated that in the 90's.

Late 90's advocacy was much more on the lines of "OOP lets you reuse code" and "widget libraries are great, and they all but require OOP". One of those is blatantly correct.


> A replacement for C should try to be as simple as possible while fixing C [...]

I have a feeling that such a language will offer too little to offset the costs of switching


I have always kind of wondered why Ada did not catch on as a systems programming language (I think it was designed to fill that niche as well).


1. At the time (i.e. pre-C++), it was a gigantic language. Further, because of the Ada policies (IIRC), you could not subset the language; you had to include the full multi-process model to call it "Ada", for example.

2. There were no good, cheaply available implementations of Ada. GNAT was originally quite awkward and finicky when I first poked at it, years after seeing Ada in a class (and I can't remember what compiler we used there).

3. Most importantly, no common, popular systems picked it up as the preferred systems programming language. MacOS had Pascal for a while, Windows went with C or C++.


Rational Software actually started their business selling Ada Machines, yep just like Lisp Machines but with Ada, but they pivoted into selling just compilers.


> OOP" done right (aka no inheritance, simply method polymorphism).

Why do you consider inheritance a bad feature of OOP?


On a very high level, in the vast majority of cases, you bring along lot of baggage, both in code and mental, that you don't need, and compared to some alternatives, such as composition, it is simply far less flexible. Inheritance fits a much smaller subset of problems than people think (one of them is GUI design, a problem inheritance fits with nicely). But like everything in this industry, it tries to get shoe-horned in to solve every problem.


It's surprising when I go into interviews and the people interviewing me do not have enough experience working with object oriented code (in the languages I know) to have come to realize this - it's one of those things it seems to have to be relearned over and over.


Think about it as the difference between only inheriting functionality (interfaces in Java 1.8+, Rust traits, etc) vs inheriting data.

OOP can become spaghetti very quickly. Especially as people mistakenly allow children to modify parent memory directly, protected non-final fields in Java. Once that happens and as the inheritance higherarchy gets deep, understanding what a field actually is at any point becomes very difficult.

Interface and trait inheritance at least removes one piece of confusion by only allowing functions to be inherited, which makes code much easier to reason about.


Inheritance says that a derived class is a subtype of its base class (this is formalized in the Liskov Substitution Principle). To me, this is strongly coupling two things that don't have a very good reason to be coupled: Code organization and expression substitutability.

Sometimes I want to derive a supertype. Sometimes I want to re-use behavior but I don't want the new type to be expression-compatible at all. But inheritance says that if I'm re-using code, I must accept certain expression substitution relationships between my code and the code I'm re-using.

As with anything, there are reasons to use it and situations where it makes sense. But it just seems like a very niche connection to use as the basis of a language design.


In my personal experience, overriding non-abstract methods is definetly the worst feature of OOP. Too often comlpex codebases have multi-level overrides, some call base method in the beginning, some call it in the end, and some none at all - and almost always this leads to incoherent contracts and behaviour.

Interface-only inheritance and composition instead of inheritance (with components or otherwise) is almost always simpler and thus more error-proof. Even if sometimes you have to repeat yourself a couple times, decrease in mental complexity is definetly worth it.


Say you decide to make a video game, and you decide to make a couple of classes for your game objects to inherit from. You make the classes Tool and Weapon. You have just prevented yourself from making the game Clue. The solutions to this tend to be ugly, and your object relations will tend to change over time, often in ways that are difficult to reconcile. Thus, GoF suggests to prefer composition instead of inheritance.


GoF?


The "Gang of Four", the four authors who wrote the classic "Design Patterns" book, (subtitled "elements of re-usable object-oriented software").


As Wildgoose says, it's a frequently-used initialism for the Design Patterns book, which is otherwise somewhat awkward to refer to concisely and unambiguously.


> "OOP" done right

I wouldn't call the Rust semantics "OOP". But it has a pretty complete implementation of type classes, with class inheritance.


Zig again.


I think it will only happen the way of Luddites in systems programming.

Even if redo Modula-2 with curly brackets I am not sure it would pick up.


> I think there's room for other languages to occupy a similar space but they're need to focus on no-std-lib no-runtime operation (not always the sexiest target).

You're describing the Zig language!

It aims to be as fast as C, and unlike most languages that say this is a goal, it means it. There's no "almost as fast as C if you ignore the garbage-collector and the array-bounds-checking", it's actually as fast as C.

Its author values the minimalism of C, but wants to create a far better language for doing that sort of work. [0]

They're doing surprisingly well. They even managed to make a faster-than-C (faster even than hand-tuned assembly) SHA-2 [1]

[0] https://andrewkelley.me/post/intro-to-zig.html , https://github.com/ziglang/zig/wiki/Why-Zig-When-There-is-Al...

[1] https://ziglang.org/download/0.2.0/release-notes.html , it's also mentioned somewhere in this talk https://youtu.be/Z4oYSByyRak


> faster even than hand-tuned assembly

I think you and I are working off different definitions of what that means, as my definition doesn't really allow for a faster implementation (unless there's some really spooky stuff going on in the compiler). I suspect you mean faster than a popular hand tuned implementation.


Obviously Zig's output (courtesy of its LLVM backend) can be expressed and distributed as assembly code, but it's not hand-tuned, it's compiler-generated.

Whether the hand-tuned assembly code they used was the fastest out there, I'm not sure. I'd hope so, or they're being misleading.

I might get time to re-watch the relevant part of the video - it's around the 21 minutes mark - https://youtu.be/Z4oYSByyRak?t=1292


My point is that all the fastest hand-tuned assembly has to do to match the compiled output is to do the same thing. The compiler has no ability to do anything a hand-tuned assembly cannot, by definition of how this all works.

So, saying it's faster than hand-tuned assembly doesn't really make sense, but saying it's faster than current optimal hand-tuned assembly could, but that also probably requires searching for a not-very-well tuned problem, as any optimization the compiler could apply is something that could be applied by hand in the hand-tuned assembly (even if it requires a lot of work to do so).

A general statement that might be made (and would also be very impressive) would be that it compiles to implementations that are on par with the current best hand-tuned assembly versions.

The small edge case where the original statement might be true is where some AI/ML/evolutionary techniques are applied and the assembly is faster in the generated output but we don't know why. That is, the speedups are not because of rules and heuristics, but derived by some complex process which we have little insight into, and thus can't manually replicate (which is what I meant by spooky stuff).


I mean, there's nothing a computer does playing chess that a human couldn't, and yet the world's best players simply can't beat the world's best chess-playing programs.


This assumes that the problem of a chess game and providing optimal solutions for an algorithm defined in a higher level language (or pseudocode) are similar enough that the same strengths apply. They may be, but I don't think that's something you can assume without some sort of evidence.

Though, I guess the same could be said of my prior statements. To my knowledge, compilers aren't generally looking through a search space for optimized solutions as many common game AI algorithms do (which would would be in line with my statements about AI/ML/etc above). I don't have the relevant citations or experience to state that with authority though.


Andrew explains in the video that the speedup is mostly due to compile-time speed ups and potentially the use of the rorx instruction. It cannot be faster than the fastest assembly because the by definition is the fastest assembly.


> compile-time speed ups and potentially the use of the rorx instruction

Indeed, but the 'compile-time speedups' are a legitimate point in favour of the compiler. If they didn't occur to the assembly programmer, or struck them as too complex to pull off, then the compiler deserves the point.

Have to say I don't follow why the hand-tuned assembly doesn't use the rorx instruction. It's not mentioned in the assembly file, at least, but I thought that was the point? https://www.nayuki.io/page/fast-sha2-hashes-in-x86-assembly

Also, it's neat to see instruction-selection being so significant. Generally might expect cache/branch behaviour to be the kicker.

> It cannot be faster than the fastest assembly

Well, 'fastest assembly' is the domain of so-called 'superoptimisers', and has us pushing at the stubborn bounds of computability.

We were talking about hand-written assembly-code, compared to compiler-generated. Odds are that none of the binaries tested were the optimal assembly.

The only interesting question is whether the hand-tuned assembly code they tested, was the fastest available at the time. If not, the whole demonstration is a straw-man, of course.

Also I don't like that the winning Zig program runs for so much longer. A good benchmark should provide ironclad assurances that no candidate is getting an unfair advantage re. loading/'warm-up'.


I don't believe it is possible to hand tune every program to beat a compiler.


More a question of practicality, no?

Programmers are creative, compilers are well-tuned machines. Sometimes it's tough to beat a compiler's code, but I'm not sure that 'impossible' is meaningful.

Monkeys with typewriters could function as a highly parallel superoptimiser, after all.


When you realize the compiler's optimizations only account for about 10% of the total program's performance you find that the other 90% is entirely up to the programmer.

Architecture, data structures, batching operations, memory locality, and a bunch of other metrics are all concepts the compiler can't really help you with whatsoever and they have a much larger impact on performance than the 10% the compiler is actually able to optimize.

The problem is that either programmers don't care, or they can't make the distinction between premature optimizations and architecture planning.


You're right to emphasise good data-structures and algorithms (also concurrency, parallelism, etc), but compiler optimisation is nothing to sneeze at. '10%' is laughably off-base.

From a quick google: compiler optimisation can accelerate CPU-bound code to over 5x the unoptimised performance. https://www.phoronix.com/scan.php?page=article&item=clang-gc...


They seem to be benchmarking very specific things and not actual applications. These numbers do not hold in the real world.


In a sense you're right, but hand-tuning assembly is kind of an orthogonal problem to determining whether you're using the right algorithms and data structures.


Where do you get that 90/10 split from? Just curious.


Talks from Mike Acton and Scott Meyers, specifically "Data-Driven Development" and "CPU Caches and why you should care" respectively.

I forgot exactly where I got that number, but it's been a pretty good metric so far.

In a nutshell; the compiler is great a micro-optimizations and absolutely terrible at macro-optimizations. The former will get you a few percent of perf boosts while the later usually results in orders of magnitudes of performance gains.

Its near impossible to apply macro-optimizations at the end of a project without massive refactors.


Cool, I'll check those out, thank you!


I believe the 10/90 number may be from this talk, though I don't have time to rewatch it to confirm: https://www.youtube.com/watch?v=rX0ItVEVjHc


But it should be possible to hand tune to match a compiler, right? And that would mean the compiler was not faster, which is the general claim I was calling into question.


How so? Compilers aren't magic, you can - by definition (Assuming the current state machine style (no AI or nuthin') of compilers ) - you can just manually compile it.

Compilers are clever, but humans can also be really clever: They know how to think - You can change the algorithm entirely.

For practical reasons however (Premature optimisation also), trust your compiler.


It should be noted that 0.2.0 is significantly different than master in regards to pointer syntax due to pointer reform [0]. There are a lot of other changes too, but that one's the most noticeable.

[0] https://github.com/ziglang/zig/issues/770


Does Zig have a memory management story other than C's? Anything like Rust's borrow checker?


C sort of has this mindset built around it that you're writing portable assembly, except for messy bits like stack frame layout or register allocation. But that doesn't really hold true anymore, and arguably hasn't for decades.

The first obvious issue is that C is specified by an abstract machine that doesn't really correspond to actual hardware. There's no concept of segmented memory, or multiple address spaces in C. Traps don't quite work the way you'd want in C [1]; there's no way to catch a trap that occurs in a specific region of code. Worse, the abstract machine leans on a definition of undefined behavior in such a way that it's painful for programmers but not useful enough to make automatic security permissible. (Unsigned overflow, which is going to be a more common cause of overflows that lead to security vulnerabilities, is well-defined).

However, the more real problem with C is that it is outright missing hardware features that you'd like access to. Things like rotate instructions or wide multiply are easy enough to pattern match, so their lack isn't so devastating. But flag bits--particularly add with carry or checked overflow operations--aren't figured into at all. SIMD vectors have no conception. Returning multiple values via registers. Traps [1]. Coroutines. Multiple address spaces (especially important for GPUs).

[1] Admittedly, the OS is also to blame here: POSIX signals are a really atrocious way to handle machine-generated faults.


> flag bits

That's sort of a processor specific implementation detail, IIRC you can access them via GNU extensions.

> SIMD vectors

Not having SIMD IMO is a feature as doing useful things will expose processor specific details, you can use SWAR to do some of SIMD.


Exposing processor-specific details is the desired feature, not a bug. The people who want to program C for portable assembly do so because they want to take advantage of things they know about the architecture: the same source code isn't really portable across different architectures without #ifdef anyways.


But if you don't care about portability, and want more hardware control, you can do inline assembly, or something like Intel's AVX intrinsics.


> Traps don't quite work the way you'd want in C [1]; there's no way to catch a trap that occurs in a specific region of code.

Ah, unless you're in NT land, which makes lexical scoping of traps very easy: https://github.com/tpn/tracer/blob/0224d94b8d17fe74c39cec285...

        //
        // Verify the guard page is working properly by wrapping an attempt to
        // write to it in a structured exception handler that will catch the
        // access violation trap.
        //
        // N.B. We only do this if we're not actively being debugged, as the
        //      traps get dispatched to the debugger engine first as part of
        //      the "first-pass" handling logic of the kernel.
        //

        if (!IsDebuggerPresent()) {

            CaughtException = FALSE;

            TRY_PROBE_MEMORY{

                *Unusable = 1;

            } CATCH_EXCEPTION_ACCESS_VIOLATION{

                CaughtException = TRUE;

            }

            ASSERT(CaughtException);
        }
The helper #defines: https://github.com/tpn/tracer/blob/0224d94b8d17fe74c39cec285..., e.g.

    #define TRY_TSX __try
    #define TRY_AVX __try
    #define TRY_AVX512 __try
    #define TRY_AVX_ALIGNED __try
    #define TRY_AVX_UNALIGNED __try
    
    #define TRY_SSE42 __try
    #define TRY_SSE42_ALIGNED __try
    #define TRY_SSE42_UNALIGNED __try
    
    #define TRY_PROBE_MEMORY __try
    #define TRY_MAPPED_MEMORY_OP __try
    
    #define CATCH_EXCEPTION_ILLEGAL_INSTRUCTION __except(     \
        GetExceptionCode() == EXCEPTION_ILLEGAL_INSTRUCTION ? \
            EXCEPTION_EXECUTE_HANDLER :                       \
            EXCEPTION_CONTINUE_SEARCH                         \
        )
    
    #define CATCH_EXCEPTION_ACCESS_VIOLATION __except(     \
        GetExceptionCode() == EXCEPTION_ACCESS_VIOLATION ? \
            EXCEPTION_EXECUTE_HANDLER :                    \
            EXCEPTION_CONTINUE_SEARCH                      \
        )
    
    #define CATCH_STATUS_IN_PAGE_ERROR __except(     \
        GetExceptionCode() == STATUS_IN_PAGE_ERROR ? \
            EXCEPTION_EXECUTE_HANDLER :              \
            EXCEPTION_CONTINUE_SEARCH                \
        )
    
    #define CATCH_STATUS_IN_PAGE_ERROR_OR_ACCESS_VIOLATION __except( \
        GetExceptionCode() == STATUS_IN_PAGE_ERROR ||                \
        GetExceptionCode() == EXCEPTION_ACCESS_VIOLATION ?           \
            EXCEPTION_EXECUTE_HANDLER :                              \
            EXCEPTION_CONTINUE_SEARCH                                \
        )

Also allows you to do fun things like this for testing if you can do an AVX512 op (although this is not the supported way of doing things):

    #pragma optimize("", off)
    static
    NOINLINE
    VOID
    CanWeUseAvx512(PBOOLEAN UseAvx512Pointer)
    {
        BOOLEAN UseAvx512 = TRUE;
        TRY_AVX512 {
            ZMMWORD Test1 = _mm512_set1_epi64(1);
            ZMMWORD Test2 = _mm512_add_epi64(Test1, Test1);
            UNREFERENCED_PARAMETER(Test2);
        } CATCH_EXCEPTION_ILLEGAL_INSTRUCTION{
            UseAvx512 = FALSE;
        }
        *UseAvx512Pointer = UseAvx512;
    }
    #pragma optimize("", on)
https://github.com/tpn/tracer/blob/0224d94b8d17fe74c39cec285...

The structured exception handling protocol used by NT is really quite elegant.


> A pointer is just an integer and I can manipulate it like an integer.

Except it's not. Not unless you cast it to uintptr_t. Doy. :) Understanding the underlying asm can give you context for what a pointer is, but you can't assume that a pointer is an integer, any more than you can assume 'int' is a machine word in length.

There are reasons why C seems so abstruse and abstract. It has to exist on a huge variety of architectures and environments, from supercomputers to tiny embedded microcontrollers. Fun fact: The Lisp Machine had a C compiler! In Lisp Machine C, pointers were fat: they consisted of a machine pointer to an allocated chunk of memory and an offset within that chunk. And of course it was invalid to have a pointer outside an allocated object; the machine would throw an exception if you even tried. Hence why C pointers are only well-defined over valid references to objects and arrays, one past the end of an array, and NULL. And lest you say, "well no modern cpu works like that anymore", there are programs out there written for the Burroughs architecture that are still running. Are you going to cut off the maintainers of those programs because the architecture they're coding for doesn't resemble an x86?

This is part of why I disdain the call for modernity in programming. "Modern" means "limited". "Modern" means "I have no intention of testing this on anything but my laptop, so I will make assumptions that suit my use case and fuck you if yours differs".

C is a pain in the ass, but it becomes more approachable if you take into account its history. And it still fits in your head more easily than Rust.


Burroughs does not have a modern C compiler. As far as modern C standards go, the Burroughs architecture does not exist.

The same is true for the rest of the oddball stuff. It's dead now. Nobody is even trying. There is no longer any reason to make the C standard nasty in order to support these systems. The plug should have been pulled on them long ago, when they were well under 1% of the market and declining, but now they are nothing. No standards-conforming compiler supports that old stuff.

Even back at 1%, hurting everybody to support that stuff was not worthwhile. It was a terrible trade-off to make.


The obvious flipside of this is that those C programs written for Burroughs never had any chance of running on anything but Burroughs. The only difference is that C offloads the effort of portability onto the user, whereas other languages at least attempt to make it portable in the compiler.


Why not? As long as your program is standards-compliant, it should work on any computer with a working C compiler…


- integral sizes aren't standard

- little vs big endian

- signed int layout

- float layout

- includes vary

- bit field implementations vary

- loads of undefined and implementation-defined behavior


The only thing that can change is implementation-defined behavior, and this can be detected and guarded against. Undefined behavior cannot exist in a standards-compliant program.


What?! Since when did Burroughs had a C compiler?

NEWP was and is more than enough, while being much safer.


Safety is a hardware feature of the Burroughs architecture. C is no less safe there than NEWP.


Sure it is, NEWP only allows for C style unsafely on UNSAFE blocks, clearly tagged by the system administrator as allowed for execution.

So not only is lack of safety expressed on the type system, something that C doesn't have, it also allows someone explicitly stating it is ok to risk its execution.


You're assuming that C runs on the Burroughs the way it commonly does on x86, with arbitrary access to all memory within its address space. This is not specified in the C standard, and it's not the case for C on the Burroughs. In particular, since C was not the OS implementation language, but was mainly uaed to port programs from other architectures, the C heap lived entirely within a single memory block and C code could not interfere with other programs. See: https://en.wikipedia.org/wiki/Burroughs_large_systems_descri...


It still corrupts their own memory space, which is the whole point of C being unsuitable for any kind of safe software.

C also does not interfere with other programs on its own turf, UNIX.


> the compiler needs to optimise ...

wants to. There fixed that for you ;-)

The weird special cases were largely(1) introduced recently by optimizer writers hijacking the standard in order to soften the semantics so that previously illegal optimizations would now be legal, by simply declaring the vast majority of existing C code as "undefined" and up for grabs.

Which is why the Linux kernel, among others, has to set special flags in order to get back to a somewhat sane variant.

(1) Except for those special architectures where the hardware is that weird, but in that case you typically know about it.


> The weird special cases were largely(1) introduced recently by optimizer writers hijacking the standard in order to soften the semantics so that previously illegal optimizations would now be legal, by simply declaring the vast majority of existing C code as "undefined" and up for grabs.

Citation needed.

Signed overflow being undefined behavior was a direct consequence of different hardware representations of numbers. Inserting the instructions required to catch overflow and make it wrapping would be expensive, at least on some systems.

The fact that arrays decay to pointers, C has no dynamic array support, and so on are decisions that go way back and more or less prohibit array bounds checking. After preprocessing, inlining, and various forms of optimization it can be very far from obvious that some random NULL check after the pointer has already been dereferenced is intentional and shouldn't be removed.

Things like overlapping pointers have always had undefined behavior in the sense that inserting the instructions to check for the overlap all over the place would have a large performance impact, no one would use such a compiler, and thus there is no market for one. The standards committee largely documented existing practices rather than inventing new undefined behaviors.

I keep seeing people like you claim we should just "fix" undefined behavior or the standards committee gleefully inserted undefined behaviors because they hate programmers. If you actually sat down and disassembled the C code you write then went through the exercise of inserting instructions to eliminate most types of undefined behavior I suspect it would be a very illuminating experience. I also doubt you'd be so certain of your position.


Signed overflow was made undefined so that it could do whatever the CPU naturally did, not so that the compiler could delete huge chunks of supposedly dead code.

In the old days, it thus wasn't truly undefined. It was undefined by the language, but you could just look in the CPU documentation to see what would happen. There was some crazy behavior in the old days, but nothing you wouldn't expect from looking at the CPU documentation.

These days, nobody is shipping a C99 or newer compiler for any CPU with weird integers. Everything is twos complement, without padding or trap values. All of that "undefined" stuff should thus be entirely compatible across all modern C compilers.


IMO, signed overflow should be implementation defined behavior, rather than the anything goes of undefined behavior.


> I keep seeing people like you claim we should just "fix" undefined behavior

Nope. Just don't take undefined behavior to mean "do arbitrary optimizations that dramatically alter the behavior of programs".

Let's see what C89 says about undefined behavior:

"3.4.3 Undefined behavior --- behavior, upon use of a nonportable or erroneous program construct, of erroneous data, or of indeterminately-valued objects, for which the Standard imposes no requirements. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message)."

Note the "permissible".

Newer versions of the standard:

"3.4.3 undefined behavior behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message)."

Note that the wording has change from "permissible" to "possible". To me the older version indicates that this is the range of things the compiler is allowed to do, and that range is from "do nothing" to "behaving during translation or execution in a documented manner characteristic of the environment".

I don't think "silently do arbitrary optimizations, removing arbitrary code or let demons fly out of your nose" is in the range between "nothing" and "documented behavior characteristic of the environment". And also not between that and "terminate execution".

However, the newer version says "possible", and this to me sounds a lot less like a restriction of what the compiler is allowed to do, and much more like an illustration of things the compiler could do.

And of course that is exactly what has been happening. And it is detrimental.

See also:

"What every compiler writer should know about programmers or “Optimization” based on undefined behaviour hurts performance"

http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_201...

Oh, and I was using C compilers before there was a C standard (did get the updated when ANSI C was released). At that point all behavior was "undefined". And "unspecified". And yet compilers did not take the liberties they take today.


Your post basically boils down to "don't do any optimizations that might possibly exploit undefined behavior". That covers most optimizations. That means compilers can only use -O1. Proving you can still optimize is often equivalent to the halting problem (or you emit lots of branches and duplicate implementations for each function so the code can test the parameters and take the "safe" path in some cases or the "fast" path in others).

> silently do arbitrary optimizations

What are arbitrary optimizations to you? This isn't the magical land of unicorns here. Compilers have to be pedantic by nature so you have to sit down and come up with what is allowed and isn't allowed.

I happen to agree that some instances of undefined behavior should be defined and eliminated. For example, uninitialized values. Require the compiler to zero-initialize any storage location before use unless it can prove an unconditional write occurs to that location. This will have a small but manageable code size impact and a relatively small runtime perf impact.

> removing arbitrary code

The compiler is just a series of meaningless instructions executing on meaningless data on a dumb machine. It has absolutely no way to understand what is "arbitrary" code and what isn't. This is basically another argument for "compilers should disable 95% of all optimizations".


Firstly, "ignoring the situation completely with unpredictable results" is a pretty clear permission to do arbitrary optimizations, as long as they preserve the semantics in the case where the execution doesn't exhibit UB.

Secondly, in ISO standards Notes are by definition non-normative, hence your exegesis as it pertains to C99 and later versions is invalid anyway.


Your points are really good. It's a pity that -std=c89 doesn't act as time machine in this respect... would have been useful https://godbolt.org/g/fKfDFq


What changes are you thinking of? The only change with respect to undefined behavior I can think of off the top of my head is punning via unions, which C99 changed from undefined behavior to well-defined behavior.


According to

http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_201...

new undefined behaviors were added, and for significant bits.

I think the bigger issue is that compiler writers have taken "undefined" to mean complete liberty to do whatever the hell they want, even fairly far away from the actual location of the undefined behavior.

And it turns out that the C89 standard had a range of "permissible" actions on undefined behavior. To me "permissible" is restrictive, as in you are not allowed to do anything outside that range.

Newer standards kept that section (3.4.3) the same, word for word, except for changing "permissible" to "possible". To me, "possible" is not restrictive but illustrative.

Now you might think this is reading tea-leaves on my part, and you might be correct, but on the other hand that change of interpretation is exactly what happened. Compilers used to not do these things, and nowadays compilers do do these things, and compiler writers cite the standard as giving them permission. Also consider to what extent wording of standards is finessed and fought over, and that exactly this one word was changed, in a section that otherwise remained the same word for word.


What does "softening" the semantics mean? Aren't they pretty unambiguously defined by the standard?


Newer versions of the standard.


But as you know, CPU ISAs are designed for C programs and compilers are optimized for C programs. So everything, even new languages like Rust, have to buy into C's model to some extent.

Truly getting out from under C's shadow is going to be very difficult. Maybe better languages are a first step on that path but they are only a small step.


That is said a lot, but what are actual desicions that are made to conform to the C model?

What would be good ideas in CPU design that aren't made since they are not compatible with C?

(I get it that CPUs have to support some common paradigms and use cases. For example, virtual memory / process isolation, maybe branch prediction things, or support for calling conventions. However, I don't think that is specific to C).


Support for unaligned memory accesses in hardware is one. The exceeding majority of variable accesses are through pointers of the correct type, and are therefore to variables that are at least naturally aligned.

But a very small handful of systems/procedures make unaligned accesses through type-punned pointers. Rather than just crash on these programs, processors will perform multiple memory accesses and stitch the results back together in hardware (worst case behavior: accesses that cross page boundaries). So there's extra hardware to perform this extra state tracking and detect the unaligned accesses in the first place.

Hardware doesn't just get turned off for conforming programs. Its there consuming energy and giving off heat on all of them.

Edit/addendum Here's another example: What should hardware do on integer overflow? There are several options, including signed-saturation, unsigned saturation, unsigned wraparound, and trap. The semantics of C are such that most hardware just picks 'unsigned wraparound' and makes programs eat the consequences. If you're very lucky then you also get 'an overflow bit, that gets clobbered on the next instruction'.

One way to think about this problem might be to ask the question, "What hardware features would reduce the runtime cost of instrumenting programs with the various sanitizers that are becoming available?"


Regarding alignment, I think it's not C that caused aligned reads to be the standard, but rather hardware itself. There isn't anything in C that prevents unaligned reads, is there? Aligned reads are just a consequence of storing data as vectors of a type, which is a very natural and efficient approach. Unless that's somehow a bad idea or prevents other good ideas, the point is moot.

Also caching and page boundaries have nothing to do with C. They are hardware issues (that good C programmers have to deal with, but anyway).

Contemporary Intel x86 haven't had significant problems with unaligned reads for a long time; do a web search for Lemire's post. (I guess there are still restrictions for atomic reads/writes).

I've long wanted to have access to overflow bits or traps on overflow. C certainly isn't in the way of adding them to CPUs, and signed overflow is even undefined in C, which is an opportunity to make them emit a signal or other exeptions. But I agree, C culture did not require hardware vendors to add overflow bits or traps, so maybe that is C's fault.


> Hardware doesn't just get turned off for conforming programs. Its there consuming energy and giving off heat on all of them.

Idle CMOS transistors don't have very much leakage current. They do consume some power but not much. Intel has gotten a lot better about clock gating ensuring more of the chip is truly idle.


Afaik, sparcs and many other traditional unix-running cpus does not support unaligned memory access


Technically, neither does ARM. Although you can enable emulation for unaligned accesses. One of the first things I did when setting up QNX was enabling emulation for unaligned accesses to eliminate a SIGBUS termination on some of our applications.


ARM supports it now, and with next to no perf cost.


Very speculative, don't take any of this as gospel:

The very concept of "the stack" implies a C-style evaluation model.

Having a single address space for both code and data mainly just gives an opportunity for optimisation to fail.

Everything-is-mutable gives a lot of opportunity for optimisation to fail. The compiler wants the program in SSA form, it register-allocates to a bunch of reused names, only for the CPU microcode to figure out which uses of those names can actually be used to mean separate physical registers.

Arguably the failure of VLIW (in particular Itanium) was because of C: those processors could (maybe) have run more constrained languages very fast, but it was difficult to compile C to run fast on them.


The stack is one of those things that would fall under "support for common paradigms". I checked that PDP-11 (the machine on which C was created) already had a hardware stack - and probably even older machines. There are many other programming languages that have the concept of a call stack. Hardware stacks are just an optimization to support this paradigm, and you are not required to use it. Conversely, C does not require any hardware support for implementation of the call stack.

The main problem with hardware call stacks is that they can't be relocated if you need predictable performance and/or have pointers instead of offsets into the stack. That means they are bounded size. With 64-bit addresses that may be less of a problem, but still. Functional language do not need to use the hardware stack, so that shouldn't be an issue.

> Having a single address space for both code and data mainly just gives an opportunity for optimisation to fail.

I don't think C requires that in any way, but since void pointers are allowed to point to either code or data, representation of void pointers must be somewhat "compatible" with both address spaces.

> Everything-is-mutable gives a lot of opportunity for optimisation to fail.

Not a hardware issue. Also not everything is mutable, not even in C. But you need some mutability for performance even at more abstract levels.


"The stack is one of those things that would fall under "support for common paradigms"."

Maybe for imperative languages where there is no support for lambda expressions (like C), but in Lisp or in functional languages call stacks are not part of the basic semantics. To put this in C terms, compilers for those languages will rewrite functions to take an extra argument called the "continuation", which is itself a pointer to a function, that is called when a function "returns." The result is that all function calls are tail calls, and by heap-allocating arguments and local variables (which is also necessary because it is possible to create and return a lexical closure), no call stack is needed (but you do wind up relying on a garbage collector). Other continuations are also used e.g. to support exceptions in Common Lisp.


Call stacks are almost always used in implementations of functional languages as well, even those that support first-class continuations. Even implementations that use CPS as you describe don't necessarily lose call-stack discipline (e.g. guile); that's a separate choice that some (e.g. SML/NJ) choose to make.

The call stack provides very cache-friendly and gc-friendly memory. It's still a very useful construct in functional languages.


C doesn't assume a single address space for code and data. Function pointers are quite different beasts than regular pointers (and this is where the "two bit-identical pointers can be unequal in C" bit comes in).


"That is said a lot, but what are actual desicions that are made to conform to the C model?"

Among other things, special instructions to support a function call stack, which is very much specific to the C model. In Lisp the call stack is unnecessary because of the use of continuation passing style, which implies that all function calls are tail calls and mostly requires that everything be heap-allocated (many compilers will use a call stack as an optimization to avoid excessive garbage collections; unlike C, it is not a fundamental part of the lower level semantics). If you wanted to design a CPU for Lisp or functional languages, you would probably choose to provide special instructions to support continuation passing style or normal order evaluation.

You could also imagine a CPU that does not support general pointer arithmetic or which has some form of built-in array bounds checks, which would all be fine if supporting C were not a requirement.


> unlike C, it is not a fundamental part of the lower level semantics

I don't think this is an accurate characterization of C, actually. Unless I'm forgetting something, an implementation of C that heap-allocated automatic storage and used continuation passing style could still be conforming.

Or in other words, the C standard doesn't specify anything about the memory layout of the stack- the word "stack" isn't even in there!


Sure, you can apply a CPS conversion to a C program, but what you would get would actually be a (abstract) function call stack because there is nothing in C that breaks the call stack paradigm (setjmp/longjmp come close but the semantics are carefully defined to avoid breaking the semantics of a call stack; in particular they cannot be used to "restore" stack frames that were previously destroyed by a return statement or a call to longjmp). In some languages (like Scheme) it is possible to write code that will not have that "de facto" call stack because there is first-class support for things that completely break the call stack paradigm (e.g. call/cc).


Okay, but most Scheme code doesn't use call/cc, and even when it does the vast majority of the continuations in use are also "abstract function call stacks."

You said it yourself- a stack is a useful tool for implementing these non-C-like languages. Its presence does not force you to "buy into the C model to some extent."


> You could also imagine a CPU that does not support general pointer arithmetic or which has some form of built-in array bounds checks, which would all be fine if supporting C were not a requirement.

I am happy to correct you, C does not require that. But also, array bounds checks are not a thing that you can implement physically. There is not necessarily corresponding only a single array length to any given address.

Imagine a virtual machine that just allocates a chunk of ram using malloc to provide virtual memory to the emulated machine. Now add another level of memory management on top.

Or imagime a function tha gets just a subrange of an array, for example to search an element therein. But the actual "array" might be much larger. It might be unclear even to the compiler how large the array is!


X86 does have array bounds checking now with MPX. Further everything you've mentioned is useful to all languages including the stack. Can you give an example of a negative trade off made for C?


In some languages a call stack is only useful as an optimization and some other form of control flow must be emitted for certain constructs that violate the stack paradigm (e.g. call/cc in Scheme, conditions/restarts in Common Lisp). C is also one of the only languages where general pointer arithmetic is necessary; in most languages used today more restricted forms of pointer arithmetic would be more than adequate.

I am not saying these are negative tradeoffs as far as the ISA goes, but it is absolutely the case that modern CPUs are designed with C (or some similar language) in mind.


Processors exist to run code quickly. Slow ones don't sell very well, so we'd see stack oriented opcodes regardless of C's existence. You would also still use functions like lea, and indexed moves in any language with the concept of an array so that too is not strictly related to C.

Most of the X86's CISC "style" predates C, and the reverse is mostly true - C was heavily inspired by the PDP's instruction set. When the 8086 came out originally BASIC was a far more important language for the class of processor it was. "Serious" software was much more likely to be raw assembler than C. The processor was just too small and compilers weren't good enough.


Right. That's what I actually wanted to ask with my first question (but didn't).


To get the max performance on modern CPU the code has to be cache-aware. It is painful to program for that in C. One often ends up with very complex macros or writing code generators that writes CPU-specific C code. So modern ISA are not designed for C.

By such arguments one can claim that CPU are designed for Fortran as it's compilers allows to write fast code with less efforts than in C/C++.


Actually NVidia's Volta was designed specifically for running C++ code.

http://cppcast.com/2017/09/olivier-giroux/

https://www.youtube.com/watch?v=Ud1savVLALA


I think it is fair to say that modern ISA are built around a C-speaking world if that's more reasonable.

This may also be especially true - on a technicality: Does the ISA actually specify the exact cache structure of a CPU? If not, then it is definitely reasonable to say that the ISA (if not the whole microarchitecture?) then is designed with C in mind.


Surely nobody designs ISA that performs badly for C programs. But it does not mean that it was designed to run average C programs optimally.

If one searches for "data oriented programming", then a typical link is about game programming and how using struct of arrays instead of array of structs improves performance due to better cache locality. But such struct of arrays are hard to use and maintain in a general purpose code. It works ok only in game engines or in numerical code when one have a lot of things with the same attribute and the total number of different attributes is low. So if anything, ISA is tailored to allow max performance when coding games or numerical recepeices in particular style. That style is not feasible to apply, for example, in kernel as C has very poor support for it and one gets too many different attributes to get real benefits from following such style.


Modern CPUs, but it used to be that they were designed for any high level language before C took over the show.


Some of your complaints could be said about assembly,too, but are you wanting to get rid of that, too? Some of your other complaints are about how processors and other hardware work, not C. In many ways, you are asking for C to be like other languages but then it wouldn't be C and all the advantages it does have.

If you get rid of C, you get rid of most of the software your operating system runs on and that begs the question you need to ask yourself: if C is so bad, why is everyone using it?


> if C is so bad, why is everyone using it?

Don't confuse success for correctness, survivorship bias is something to consider here. As others will surely point out, when C was written, there were other languages available at the time that were as fast, but much safer.

In many ways C won, because Unix won, to talk about them separately ignores how important both were to each others success. (Yes, other operating systems adopted C as well, but Unix is it's foundation)


Some mainframes like Unisys ClearPath don't have Assembly support at all, zero, nada, all necessary low level primitives are exposed as compiler intrisics.


I feel like C++11 and Rust have shown a way forward and the C standard should focus on a design that takes into account ownership and protected arrays in a simple way, hopefully as backwards compatible as possible.

Maybe the solution is for the major compilers to default to strong static analysis.

Even though there are safe languages, the enormous integration and compatibility of C means that it isn't going anywhere soon, so I think even imperfect solutions can offer a lot if they are low hanging fruit.


It is a matter of mindset, multiple safe C variants have been attempted throughout the years.

Even lint was already available in 1979, yet hardly anyone used static analysis tools on C codebases.


Which is why I mentioned defaulting to strong static analysis. Having it as a separate tool or an unknown flag is not enough.


I've been writing C on and off since 1990, across embedded systems, game consoles, massive SGI supercomputers, and modern kernels.

C is not fading away because of it's portability and low overhead. It has some nasty behaviors, as the original post shows, but you hit those once in a blue moon. The language is not unsound because of it, it just has some warts like any other language.

I don't particularly enjoy writing C, as it's too low level for my liking and I have to write too much code to do basic things (slist_* makes me shudder), but the reason that I have used it is due to its tremendous portability and direct compilation to machine code. That still matters in some places.

In places where memory is precious, or realtime performance precludes things like memory managers, C will continue to thrive, beside languages like Rust which are great for different goals. There's room for both.

C, in my opinion, has stood the test of time on its simplicity, portability, and the ubiquity of the C standard library, tiny as it may be.


On top of that C just got here thanks to widespread of UNIX and POSIX, and we had so much better options.

Yes they were languages with helmet, seat belts and airbag bolted on, but just like old technology cars on a motorway crash, the CVE database shows what happens when things go wrong.


These architectural bizarrities exist and have to be dealt with by someone. Of course it is better if that someone isn’t you—if some rust maintainer ports the language to the obscure hardware where the CPU ignores the lower four bits when addressing and presents nice, neat, and performance abstractions for you to use.

But who is going to pay for that effort? If the hardware manufacturer cared enough to to invest a lot in developer tooling it probably wouldn’t have picked such a hostile interface to begin with.


Is there a site or book out there that talks about these common pitfalls? I work in C89 so K&R C does seem pretty simple to me. I clearly haven't worked with it long enough to spot these things, or I have code in the wild that unknowingly is victim to these issues and I've yet to stumble on it.


> memory doesn't work like a simple flat address space

Can you expand on that? I thought that modern (non-segmented) memory does work like a simple flat address space, at least from the perspective of userspace programs. Isn't all the weirdness only a result of compiler optimisations?


Just to take one example: multisocket x86 is NUMA. There’s HW to paper over that fact, but the power/perf cost is quite high.


Also Epyc and (even more so) Threadripper have somewhat NUMA memory access in a single CPU.


Road to Rust is paved with good intentions. (Same is true about C++.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: