> There is little of needless code in standard library. And what there is, believe me, over the decades was really polished.
Wrong. Counter-example: locales [1]
Also: String handling, which is responsible for so, so, so many security vulnerabilities. The underlying cause is zero-terminated strings (instead of using start/end pairs or start/length pairs). You can't even tokenize a string without either copying or modifying it!
Also also: just a single, apologetic mention of undefined behavior. Responsible – in cooperation with over-zealous compiler writers – for so many more bugs not already caused by improper string handling.
I love C but I think most of the C standard library is useless and outdated rubbish (pretty much everything above the mem...() and math functions). The entire IO and memory-management areas should have been either thrown out of the standard library, or updated 20 years ago to keep up with low-level operating system features.
But one of the best features of C is that you can mostly ignore the standard library and still enjoy "C the language", e.g. nobody ever choose C for its standard library ;)
(also re UB etc...: use the mighty trio UBSAN, ASAN and TSAN!)
For IO, memory management and similar "low-level services" it's often fine to call the underlying operating system functions directly and with this also make use of more powerful features than the standard library can provide, such as async IO, virtual memory etc (or wrap those OS calls into your own cross-platform wrapper functions). There's also plenty of libraries which do this for you, like libuv (but integrating such big libs is often not trivial).
Same for string handling/processing, there are specialized libraries out there which do a specific job better than the rather generic standard library functions (this is also true for the C++ stdlib), one just has to find those libraries.
PS: I don't think that a "batteries included" standard library (like python has) would even make sense for C, such a library would need to be opinionated by definition, better to keep (too many) opinions out of the standard, as this road just leads to another C++ ;)
C++ is IMO defined by a complete lack of opinion. They just standardized every opinion. For everything. C++ would be a much better language if it showed some opinion, even if I disagreed with it.
In my long C career, I invented string library after string library. They all turned out to be garbage. The common cause of their failures is their inability to interact with any other C code.
Agreed, but that's an issue for all "interface types" on API boundaries (another similar issue is vector and matrix types in math libraries). IMHO a C string library must still be able to efficiently consume and produce vanilla zero-terminated "const char*" string data (and at a stable address, unlike for instance std::string) even if they mostly work with {pointer,length} ranges internally.
You can use something like GitHub or Conan to find specific useful C libraries.
There isn't a universally recognized "standard library replacement" (and people argue about what parts of the standard library they don't like), but here:
> But one of the best features of C is that you can mostly ignore the standard library and still enjoy "C the language", e.g. nobody ever choose C for its standard library ;)
C is one of the hardest languages to do this with given its anaemic dependency management.
> (also re UB etc...: use the mighty trio UBSAN, ASAN and TSAN!)
All of which will miss some cases, even in combination.
There's a lot of subtly broken stuff in C stdlib. For example, time.h calls that may touch the TZ env var aren't thread-safe, because getenv is unsafe (it gives a pointer to its internal globally mutable data structure without locks).
Because C is so bare-bones, it usually leans on POSIX as its extended standard library, and that is also full of old cruft.
And even MSVC does not actually implement the API specified in Annex K. Annex K was inspired by MSVC's design, but they diverged:
> Microsoft Visual Studio implements an early version of the APIs. However, the implementation is incomplete and conforms neither to C11 nor to the original TR 24731-1. For example, it doesn't provide the set_constraint_handler_s function but instead defines a _invalid_parameter_handler _set_invalid_parameter_handler(_invalid_parameter_handler) function with similar behavior but a slightly different and incompatible signature. It also doesn't define the abort_handler_s and ignore_handler_s functions, the memset_s function (which isn't part of the TR), or the RSIZE_MAX macro.The Microsoft implementation also doesn't treat overlapping source and destination sequences as runtime-constraint violations and instead has undefined behavior in such cases.
This makes sense when you consider the history of C. Multi-threading was a relatively late add on, and outside of the standard library.
When I learned C (early 90's), multi-threading wasn't even something you considered. It was either multiple processes or an event loop with select/poll.
Yes they do. What they don't do is search for multi-byte characters. You would not use them for that; you use them to look for ASCII delimiters. You can tokenize a UTF-8 string on ASCII delimiters using functions that are oblivious to UTF-8, and this is in fact a preferred technique in many programs.
I think I was going for more like: strpbrk and friends can't use a multi-byte sequence as a delimiter. And if a delimiter appears as part of a multi byte sequence, you may see strange results.
In a localized world searching for delimiters also starts to make less sense. Eg. I have been told it doesn't make sense to break on whitespace for Chinese text.
To be clear, I am not saying these functions are bad or evil or to blame for their limitations (a bunch of the problem space wasn't invented yet when they were introduced), just noting they have limits. A bunch of more recent languages and libraries have the same or similar issues, too.
strstr can find a multi-byte character in a multi-byte sequence. There is no ISO C function to find the first occurrence of any one of a bag of multi-byte characters, though it's not difficult to write one.
> I have been told it doesn't make sense to break on whitespace for Chinese text.
It could make sense to break on whitespace in some programming language or data format that allows Chinese (and other) identifiers.
A command interpreter that allows Chinese arguments (such as file names) wants to break on spaces, as usual.
> if a delimiter appears as part of a multi byte sequence, you may see strange results.
UTF-8 was designed by a dyed-in-the-wool C-and-Unix engineer, who ensured that such a thing can't happen. No character in the 0x00-0x7F range can occur in a multi-byte character.
The byte which starts a UTF-8 character cannot occur anywhere other than at the start, which is why we can use strstr to look for it.
UTF-8 continuation characters are limited to the range \200 through \300 so there's basically zero chance that if you choose something like comma as your delimiter that it's going to tokenize the middle of a multibyte sequence.
Also take into consideration that, under the hood, functions like strpbrk() are typically accelerated by CPU instructions such as PCMPISTRI which doesn't support UTF-8 natively but it does support UCS-2.
> so there's basically zero chance that if you choose something like comma as your delimiter that it's going to tokenize the middle of a multibyte sequence.
Not just "basically;" there is no possible collision between ASCII characters and any valid multibyte encoding. This can be seen somewhat visually in this table[1] and is an intentional aspect of the UTF-8 design.
How about with joiners and combining characters? Eg. If you encode é as U+0065, U+0301 (\x65\xcc\x81), then search for 'e' and act on the result somehow, you fail to consider the whole glyph.
Sure. You're talking about glyphs that are composed of multiple unicode codepoints; my earlier comment is true of single codepoints only. The comment I was responding to is also talking only about single codepoints (wcspbrk cannot represent delimiters longer than a single codepoint).
On joiners / combining characters: I'd encourage using composed normalization (NFC) rather than decomposed normalization (NFD).
Just curiosity: are there any glyphs that lack a single codepoint representation, where one of the joined codepoints is an ASCII character? (That only helps after normalization, of course.)
Yes. ASCII uses \b as the combining character mark which is a convention that's always been widely supported by typesetting programs such as less and nroff. For example, A\b_ is A̲, and you can do the same thing with apostrophe and tilde for accent marks. There's also UNICODE emojis where two codepoints in sequence get joined together as a single glyph. Never underestimate the creative ways text can be used, or that standards just codify a long history of practices.
Er, I was asking about unicode joining, not this roff \b thing. Sorry for the confusion. I'm aware that multiple-codepoint unicode glyphs exist; I'm asking if any of those involve a codepoint in the ASCII (1-127) range which cannot be normalized to a single codepoint (e.g., e + ' normalizes to a single codepoint é).
Of course. Take for example mͫ (m+m) there's no way to represent that as a single codepoint. Combining marks can also be overlaid multiple times, e.g. m͚ͫ (m+m+∞) so the number of glyphs you can create is limitless. There's only a tiny number of the combinations that are possible which have a tinier normalized form. The new UNICODE combining marks work by almost exactly the same principles as the \b ASCII combining mark. That's why I mentioned it earlier.
In what data format or programming language is 'e' a delimiter? One situation is floating-point constants, where 'e' is a delimiter indicating the exponent. However, if an é occurs in the middle of such a constant, whether as a single code point or a combined character, that is an error. The 'e' must be followed by an optional sign and one or more decimal digits.
The ISO C library string handling stuff is for systems programming, not for scanners and parsers for natural written language.
Every time I review code and see strncpy, I look closer because it's always used incorrectly. It's always about the terminating 0. Is the 0 there or not? Is the 0 part of n or not? Does the destination have to be n+1 in size for the 0?
I quit using it myself because I could never remember just what the exact protocol was for 0.
Yeah used naively strncpy leaves you an unterminated string. Also like all of them it's up to the caller to predetermine if they will fail if called. So instead of having the checks in one place inside the string function. You have them scattered all over the code if at all.
I think I read somewhere the provenience of strncpy was to copy strings into a fixed length field which is why it has the deranged behavior of not terminating the string. Think file systems where the max file name is 8 characters. Or compilers that truncated variable names at 31.
Exhibit B: yes, the command that repeatedly prints `y` to stdout. Exhibit C: all handling of time, with time_t, struct timespec, timeval etc. We should just have a 64 bit uint representing nanoseconds from epoch, which would give us 585 years and get rid of much nonsense.
It's not that GNU is incompetent or overly bureaucratic or anything. It's that they really didn't want to infringe on Unix, and they presumed that Unix's yes would have been implemented in the most obvious, dead-simple way. And, as a plus, GNU yes can stream y's at like 10 GB/s, if you need that.
> enough bits to go backward from the epoch at t=0 to the big bang at t=negative whatever.
For reference, the universe is about 13.787 billion years old [0]. That's about 13.787 * 10^9 * 365 * 24 * 3600 = 4.348 * 10^17 seconds, which (I think?) is a 59-bit number [1]. 10ths of a second will require 62 bits, which is right about at the edge of what a 64-bit signed integer will allow.
If you want milliseconds, you'll need at least 69 bits. For nanoseconds, you'll need at least 89 bits.
So you'll either need an integer type that's wider than what's natively supported in most hardware (thus potentially sacrificing performance), or you'll have to sacrifice precision.
So, the problem is a 64 bit int only gives you ~500 years around the epoch at nanosecond resolution. That's sufficient for timestamping, but not great for a general purpose time library where even ordinary civil calculations might exceed that range, particularly in intermediate arithmetic.
Since larger than 64 bit ints are a disaster for portability, the reasonable solution is to go with a 64 bit signed seconds, 32 bit nano offset field. A lot of language std libs have adopted something along these lines.
DJB was advocating for everything to be in a format like this, referenced to TAI (UTC without leap seconds, basically). Sadly that didn't get any traction.
I remember in my University years i was astounded to find out that none of the programming languages we were taught could represent the Jurrasic period, or the day the earth was formed, as a date.
I was horrified even more when i learned that future leap seconds are undefined, and we literally can't tell what is the time on the clock lik3w a million seconds from now.
In my firmware I keep time as a 64 bit signed number that counts 32.768 khz ticks since the unit powered up. I think it'll roll over in 8 million years or something like that.
If you want to know what's broken? Most real time clock modules. They almost all want to store time as HH:MM:SS MM:DD:YY and sometimes 1/256 of a second but sometimes not.
Probably something like how msgpack (or even variable length unicode encoding) works, where the low 127 bytes is a single uint8 char in place, and then the next 121 signals that a string of length 1-121 bytes follows, and then after that are a handful of sentinel values for "length follows and it's 1/2/4/8 bytes" plus maybe some special cases like zero-length string.
I can understand that this might have been too much implementation complexity/risk to contemplate 40 years ago, but this kind of pattern is very well established at this point, especially in scripting languages with loose typing.
> Also: String handling, which is responsible for so, so, so many security vulnerabilities.
That's not really a counter-example. Sure the string handling was an unfortunately choice but it is the standard. The standard library must implement it as defined. Being standard-compliant is not lack of polish.
It takes more code to use it correctly than to hand-code what you think it is supposed to be doing for you. strtok is Cursed.
strlcpy is similar. If you don't write that much more code, you are not using it correctly, and it is not giving you the value that is the reason you thought was why you were using it.
> There is little of needless code in standard library. And what there is, believe me, over the decades was really polished.
Ha ha ha ha ha ha ha ha ha ha.
Locales are a massive clusterfuck, basically too simple to handle localization if you actually care about it, but supports enough of it to screw you over if you don't care about it. The "wide character" support is also a nightmare. The time library support is also quite a bit wonky (years are measured as years since 1900 because Y2K is definitely not a pressing issue in 1989!).
> inline Assembly
Fun fact, here is the C specification's entire mention of inline assembly:
> The asm keyword may be used to insert assembly language directly into the translator output (6.8). The most common implementation is via a statement of the form: asm (character-string-literal);
There is no discussion of what inline assembly can and cannot do, how it interacts with the rest of the code in term of semantics, how to pass arguments to and form inline assembly, etc. You might get some of this information from the manuals of compiler implementations, but even that can be surprisingly free of necessary information. Compare this to Rust's inline assembly documentation: https://rust-lang.github.io/rfcs/2873-inline-asm.html (which is more detailed than even gcc's or LLVM's inline assembly documentation).
I have been programming in C for the last 25 years, and every year or so someone comes with a new shinny thing that will "replace" C. First it was C++ (we know how well it did...), then Objective-C, then Java, then C#, then Go, and now rust.
Everyone of these language brought new ideas, but they don't stand a chance because their designers don't understand the point of C. The C language didn't win because it was the "best" language or had the best set of features. Far from it. Even in the mid 70s it was a backward language compared to other cool languages of the day like Algol and Lisp.
C won the competition because it just gives programmers the bare minimum functionality to put an operating system and a compiler in place! It is flexible, you can provide your own library if you want, and therefore gives your easy portability. OS writers will chose C any hour of the day or night because it makes their job easier.
By comparison, other languages will require a huge library to be available, and sometimes a complex runtime system, just for you to write a simple "hello world"! Imagine if you need to write a new OS, a compiler, a linker, or a shell interpreter... you get the idea.
My conclusion is that language designers still didn't get what made C so successful and therefore keep coming up with shiny complex things that don't stand a chance to become the next C.
> C won the competition because it just gives programmers the bare minimum functionality to put an operating system and a compiler in place!
Not really. C won because it was the standard compiler for Unix, and was a free compiler for a free operating system in a time when both were highly unusual.
But in the past few decades, C has lost a lot of its market share to other programming languages. In the realm of desktop (and mobile) applications, C is basically unused for new projects--its standard library is truly anemic here, and major support libraries (e.g., GUI toolkits) are often in C++ and not C. Where C is still the dominant language is the land of embedded applications, and it's not had a lot of competition here since most languages don't bother trying to define a freestanding implementation.
Rust is really the first language to try to contend this space. And there are signs that it may supplant C: Intel is apparently looking to move its firmware to Rust; Linux is allowing Rust for device drivers and kernel modules. Hell, even some OS programming courses (e.g., Stanford, Georgia Tech) have moved their curriculum to use Rust instead of C.
As a C (not C++) programmer, I understand where you're coming from — Rust is a big complicated language, like C++, in comparison to C — but personally, I really want to use Rust in some applications where I would otherwise use C. I don't think it can be said universally that Rust has no value to C practitioners or that Rust's designers don't understand why people use C.
For sure. I do see the strengths of rust as well. And would most likely use it if i had to write software that absolutely has to be memory safe. It wont prevent other kind of issues however.
This comment irks me in a strange way. I've found very little software where what you care about doesn't build upon the fact that you trust your program's memory safety. You can use C or any other language, but if your tools can't free you from that worry, memory safety should be something that your programs "absolutely" have.
Not every program is connected to internet nor does they run elevated priviledges, or they run anything where people lives would be at stake.
In addition operating systems gives you bunch of safety measures if you design around them. (Obvious being separate processes)
One has to also remember that even if your program would be memory safe it still does not guarantee its flawless nor secure. One very good recent personal experience I can give you is certain popular swift SQL library. It was written in swift which also boasts the "memory safe by default". However, that library due to its design was subject to the very basic SQL injection attacks. Needless to say I ended up writing my own SQL bindings. I hope that library is now fixed or something replaced it, as since at the time I had to write swift (and no I didn't like it), it was the only and most popular (scary) option on github.
I could also write long rant about threads and how the implementations and the threading primitives often have subtle bugs or are broken in different ways. And how using threads in general makes your application basically undefined unless you only absolutely know every piece of code that runs on that thread.
I wasn't talking about security, but correctness, although statistically people underestimate the level of security required by their projects, more than not. Memory safety does not imply correctness, but if you want to reason about correctness you really have to trust that your program doesn't do something funky with your bits. And, at least my programs, like to prove me wrong about what I believe is possible.
If you want correctness, you need proofs, or generate the code from specification. And even then cosmic rays and bit-banging and faulty hardware can break your application. Or simple change of stack size :)
It may replace some software, but it won't really lure many of the C / embedded developers. The problem is that rust developers miss the point of C. People who program with C want the bare minimum, they don't care about classes or other OOP features. If there was something that sort of grogs what C programmers want, it's probably zig-lang.
To put it in short, C programmers hate needless abstraction and work with raw memory and hand-crafted data structures whenever possible. More data, less code.
Putting on my embedded engineering hat, it's not that I wanted the bare minimum it's that I couldn't afford anything more than the bare minimum.
I couldn't afford abstractions, memory safety guarantees, algebraic data types, generics, pattern matching, thread data safety (at least in the context of interrupts). The languages that had these things were hulking languages with giant runtimes and exactly zero support for embedded development. Not to mention no vendor toolchains.
Rust supports all these things, with no allocator, no standard library, and often with zero additional cost -- in terms of compute and in terms of memory. Then, being built on LLVM means that the vendor toolchain support is quickly becoming a non-issue. I suspect we'll start to see more and more of Rust in embedded, but only time will tell.
And as others have called out, Rust has very few OOP features like an optional notion of a `self` on a function bound to a structure. There's no classes, no subclassing, no message passing, no inheritance at all (structure, interface or implementation), limited dynamic dispatch, no polymorphism.
> Rust has very few OOP features like an optional notion of a `self` on a function bound to a structure. There's no classes, no subclassing, no message passing, no inheritance at all (structure, interface or implementation), limited dynamic dispatch, no polymorphism.
As an embedded developer for a living, if you put it that way, then the first thing it comes to my mind is "so why should I bother learning Rust" for embedded.
Other than the usual "please consider UB and network/memory handling/security issues" (reasons that don't really affect me), so far nobody could provide a convincing answer.
It's not that the usual answer I get is wrong or invalid or doesn't have a point. But if I ask "ok, what else?" there is really little motivation for me to move on.
Someone once told me I'll become an outdated curmudgeon here on HN, but I think I'll be long gone before something truly deserves to replace C for embedded (and low level).
Personally, I always encourage people to learn new and (particularly) different languages from time to time and see what concepts and design patterns they can learn and take with them into their day to day. All I can say is give it a shot sometime and see what you think. Worst case you've learned something :)
> Someone once told me I'll become an outdated curmudgeon here on HN, but I think I'll be long gone before something truly deserves to replace C for embedded (and low level).
You (and I) may become an outdated curmudgeon before long, but it won't be because you refused to learn Rust haha.
Embedded developers use C because they have to, not typically because they want to. And they have to use C because of proprietary toolchains and existing libraries, among other similar reasons. Good reasons, sure, but not because C the language is so great.
> hand-crafted data structures [...] More data, less code
You're paying lip service to the principle of datastructures-over-algorithms, yet you're advocating for a language which has neither algebraic data types nor tuples? Come on. That's like praising functional programming but using Fortran.
> To put it in short, C programmers hate needless abstraction and work with raw memory and hand-crafted data structures whenever possible.
I wrote Monocypher in C for one reason: portability.
From a systems point of view, crypto libraries are trivial: you don't need any dependency, code is pathologically straight-line, there is no almost data structure to speak of beyond arrays of bytes and arrays of words.
Yet I can tell you that if not for portability, Rust would have been a better fit. So I could group buffers and their size in a single argument. So I could provide genuinely high-level interface. So I could use types to avoid silly mistakes and enforce some invariants. Portability won over all that goodness: worst case I can have a Rust wrapper. Heck, someone else already wrote one for me.
These zero-cost-abstraction languages don't pay much attention to binary code size. They can't compete with C in that regard. For the low-cost, high-volume embedded devices, code size relates to the unit cost, and thus the profit margin.
> To put it in short, C programmers hate needless abstraction and work with raw memory and hand-crafted data structures whenever possible. More data, less code.
> Rust doesn't get why C programmers still use C at all.
Because Rust is redundant, its safety guarantees are a subset of safety guarantees of ATS[1]. A project in plain C integrates more naturally with it at any point of its development cycle[2], and it doesn't require giving up safe pointer arithmetics[3]
EDIT: those who downove, let's discuss the topic in substance and let's avoid zealotry. If you promote and pitch Rust to the audience of C by advertising its safety guarantees, zero cost abstractions, and how well it integrates with C ABI, at least be consistent when it turns out that there's another $TECH that does it safer and more consistently with C programmers' reliance on certain useful features of C.
Are you really arguing that Rust is currently redundant because of a language that hasn't hit v1.0.0 yet, has no industry support, and doesn't use a modern build system or package manager?
I do really argue that Rust is redundant for C developers who want to bring some extra safety into their projects without disrupting their toolchains and practices. I also argue that this extra safety can bring more than Rust is capable of providing. Now, what's the technical merit of Rust that justifies spending time learning it when the same time can be spent on bringing ATS into the same codebase/toolchain one function at a time?
What's the point of waiting for 1.0.0? It's just a tag that doesn't save you from bugs and breaking changes.
What is industry support? What does it have to do with a team where everyone can read the documentation of the tool that is already built on top of a mature GCC ecosystem and that adheres existing approaches to debugging, profiling, releasing and maintaining C codebases?
> and doesn't use a modern build system or package
Why do I need a separate solution such as Cargo if I can build, package, and distribute everything with Nix and get reproducibility, distributed builds, transparent caching, and environment isolation along the way for free?
C won because it was the only low level language that was competently implemented for DOS for years. DOS was were 90% of the programming action was in the 80's, and C was a perfect fit for DOS. C also was easily extended to handle segmented memory.
(Turbo/Borland) Pascal was comparable to C in terms of closeness to hardware on DOS - I mean, it even had language facilities specifically for interrupt handlers! A lot of DOS software was written in that. In some countries, it was the case for most DOS software made in them.
So I'd amend this to: C won because it was the only cross-platform low level language that was competently implemented for DOS.
I'm enrolled for a masters at Georiga Tech right now and last I checked the OS course is in C, I would kinda prefer to learn rust anyway so I hope you are right.
Why would you prefer to learn Rust at this point, especially during OS course when all relevant OS code you will encounter is in C and understanding C is crucial to understand low level programming concepts. You can always learn Rust later when/if it becomes more relevant. Without C a lot of programming areas will be a closed door forever.
> understanding C is crucial to understand low level programming concepts
That is a patently false statement. To understand low-level programming concepts, you need to understand fundamental notions about how machines represent state in registers and memory, how memory is organized (including primarily the concept of function calls and the stack), and the indirect referencing of memory via pointers. Note that nowhere in that list did I describe a concept that is unique to C.
In fact, one of the more common approaches to introducing developers to low-level programming is to introduce them to these concepts via assembly (say, Nand2tetris). In my own experience TA'ing such a course, I am more than willing to translate code into whatever language the student is most comfortable with to express the concepts as necessary. You can absolutely learn these concepts in other languages, and my own suspicion is that unsafe Rust does a slightly better job of it than regular C does.
C does not have a monopoly on understanding the low-level organization of code, and quite frankly, C's lack of coverage here can be frustrating. C has no concept of multiple return values, functions with multiple entry points, unwinding the stack, computed goto, SIMD vector types, nested functions, discontinuous structures, or tail calls, and these are all concepts that are present in other languages that cannot be expressed in standard C or often even in vendor-extended C.
> You can absolutely learn these concepts in other languages, and my own suspicion is that unsafe Rust does a slightly better job of it than regular C does.
> C has no concept of multiple return values, functions with multiple entry points, unwinding the stack, computed goto, SIMD vector types, nested functions, discontinuous structures, or tail calls, and these are all concepts that are present in other languages that cannot be expressed in standard C or often even in vendor-extended C.
You probably don't realise it, but you're cheating.
Operating systems are currently written in C, and therefore have a C interface. Which language is best at interfacing with C? (No trick here, just a rhetorical question.) C of course. Other languages would need some sort of FFI, which is generally unwieldy enough that the designers hid it behind a comprehensive standard library.
C doesn't need an huge library to be available, you say? Oh but it does. It's called the kernel. Comes with a freaking huge runtime too.
> Imagine if you need to write a new OS, a compiler, a linker, or a shell interpreter... you get the idea.
I think I do, but I'm afraid you don't. Writing an OS (and the rest) in Pascal (I'm thinking of Oberon specifically) is no harder than to write it in C. If you write your whole OS in Blub, interfacing with Blub will be easiest in Blub, you won't need an extensive Blub standard library because you already have the kernel, all the tools (debuggers, editors…) will be Blub friendly…
Lisp machines used to be a thing, you know.
> My conclusion is that language designers still didn't get what made C so successful […]
Language designers can't even address what makes C so successful: network effects.
The point is that it has a C interface despite being written in assembly. Your original post said the only reason it had a C interface was that it was written in C.
My original sentence was "Operating systems are currently written in C, and therefore have a C interface."
I stand by that claim: the only reason operating systems have a C interface is because they are (at least originally) written in C. The fact that the kernel has some parts in assembly is immaterial, even if those parts happen to comprise the userland interface.
And it's not just the kernel: when UNIX was re-written in C, everything was written in C. The compiler, the core utilities, the editor, the shell… the whole OS, not just the kernel. Of course it was easier to interface to that in the same language everything else was written in.
Likewise, the Oberon operating system, which was written almost entirely in the Oberon language (which should have been named Pascal-3) has an… Oberon interface. Want to use C to interface with it? My, you'd have to write a whole compiler, perhaps a non-trivial runtime, debugging tools, and of course an FFI to interface to Oberon, the de-facto linga franca.
C looks much worse when it's not already the king of the hill. Its strength lies in its ubiquity more than in any quality of the language itself.
> Writing an OS (and the rest) in Pascal (I'm thinking of Oberon specifically) is no harder than to write it in C.
You can't because it is impossible to escape the type system. Without the ability to cast pointers you can't write a memory allocator. You can't write a function like dlopen()...
This article talks about Wirth's Pascal as originally defined. But various language dialects evolved way beyond that, and some of them became effectively dominant.
Thanks for the link, I'll go take a look. I'm pretty sure Oberon addressed some of the pitfalls cited there, if only so Wirth could write an OS with it. Not every language is OS worthy, after all.
it did well enough that C stdlibs (at least MSVC's, LLVM's) and compilers (... pretty much all the big ones) are implemented in C++ and just export C symbols nowadays, likewise for newer OSes like Fuschia.
SerenityOS (https://github.com/SerenityOS/serenity) was written from scratch in two years in C++ and goes as far as having a custom web browser & JS engine. Where is the equivalent in C ? Where are the C web browsers, C office suites, C Godot/Unity/Unreal-like game engines ? Why is Arduino being programmed in C++ and not C ?
Language designers are pretty clever people and understand a great deal about the successes and failures of languages. I think if you look at a lot of what Rust is doing today (and C++ yesterday, for better or worse) you'll see a lot of inspiration and influence from the successes and failures of C. Particularly when it comes to safety and generic programming.
Objective-C, Java, Swift, and C# have become massively successful as application programming languages because C is/was terrible at it. They learned a lot about how painful it was to do basic higher level programming tasks when you are restricted to C's semantics and memory model.
C is great but I don't think it's worth romanticizing since history has shown that C isn't that great for writing anything but systems code. Which is a restricted domain to begin with, and isn't even that attractive for it anymore.
The one thing C has over anything else is interop. The language of FFI is C. There's no inherent reason for that other than history, and it's not super broken so we're not going to fix it.
Games (many in C++ with one or two features beyond basic C). All kind of solvers. GPU programming.
C sucks when you need convenience of big standard library or safety above performance. There is nothing like C when you care about speed, memory footprint and efficient memory management. It's great other languages took over in areas C is terrible at but it's not like areas when it's the best and often they only option disappeared.
These specific claims are flaky at best but hard to argue over since anyone can construct enough cases for or against their position to be right enough not to change their opinion of C.
You can absolutely beat C in everything you list. And you also can't. I don't agree that C is the end all be all of performance or code size, except in a handful of cases where there's nothing else available.
C won over because of the mystique of the syntax in which you can load multiple side effects into expressions. The intuition was that this leads to faster code, and in fact, with naive compilers, it did lead to faster code. C's terseness won over programmers who hated typing things like BEGIN and END for delimiting blocks. Unlike Pascal or Modula, C came with something very useful: a macro preprocessor. This is such an advantage, that it's better to have a crappy one than none at all. Those programmers who did not hate BEGIN and END could have them, thanks to that preprocessor. The preprocessor also ticked off a checkbox for those programmers who were used to doing systems programming using a macro assembler.
C started to be popular in the microcomputer world at a time when systems programming was being done in assembly languages. For instance, most arcade games for 8 bit microcomputers, were written in assembler. Some applications for the IBM PC were written in assembler, such as WordPerfect.
The freedom with pointers thanks to arithmetic would instantly make sense and appeal to assembly people, who would find a systems language without pointer flexibility to be too strait-jacketed.
I see C as a minimally-viable HLL. Just the essential high level stuff – expression-oriented syntax (a huge thing over assembly), structured programming support (conditionals, loops), automatic storage management (no worries about what goes in registers and what goes at certain offsets in the stack frame), abstraction over calling conventions, and a rudimentary type system around scalars with a certain size and pointers pointing at them.
I'm in a similar position (only 10 years professionally though). I agree with everything you said, except I think Rust genuinely has a shot here. It doesn't have a runtime, and you can choose to use just the core libraries or even no standard libraries. It exports to the C ABI, so it's compatible with existing software and libraries. It isn't a garbage pile like C++. And it solves a real problem C has, code safety. It's silly to say any language will completely replace any other language, but I think Rust fits into the slot pretty well for the vast majority of cases where C is currently the best choice.
Well, that is a particularly complicated example. I actually shipped my first Rust code just a couple months ago and it rarely used lifetimes. There is definitely a learning curve, but that true of any language. Rust's insistence on safety makes it less forgiving than other languages, but I think that's a plus. We have enough unsafe languages already, and we're paying the price for it now.
E: If you want to take another stab, the way I learned Rust was the book: https://doc.rust-lang.org/book/ Actually type out every code example, it will help your fingers learn the "feel" of the language, and give you an opportunity to break things on purpose to test your understanding. Learning something new is always a challenge, but I really do like Rust and think it's worth the trouble.
Yes, definitely. You only need explicit lifetimes for when it's impossible for the borrow checker to figure out the lifetimes for itself. For example if one struct references another, the borrow checker can't know which struct should outlive the other. The vast majority of the time, the borrow checker can figure it out automatically without your help. It's discussed here[1] but note that it's Chapter 10 of the book, so it's probably not the best place to start reading if you're new to Rust :)
E: There are only three instances of using explicit lifetimes in the entirety of the project I linked, if you want to see some real-world examples of it. In all of these, it is used to indicate that the struct being declared references another struct which must outlive that struct. That way we don't end up with a dangling reference if the referenced struct failed to outlive this struct.
You're always using lifetimes, but you rarely have to write them out. Writing out lifetimes is only needed when you have a particularly complicated function or data structure; usually the compiler can just compute what the right lifetime should be from the control flow and type of the function.
You don't need to specify the lifetime annotations on print_refs because the references aren't returned and are unambiguously just the lifetime of the function. The compiler will fill those in for you -- even if the lifetimes of x and y are different, just so long as x and y live at least as long as the function.
fn print_refs(x: &i32, y: &i32) {
println!("x is {} and y is {}", x, y);
}
In general you only need to specify lifetime parameters when there's an ambiguous situation. For instance, the following builds and runs just fine.
I'll be the first to admit it takes a while to ramp up into Rust. Part of that is learning to let go and trust the compiler. Unlike many other languages, it won't hurt you haha. It's on your side. Sometimes you need to you know, give it a little more info so it can do it's job.
The Rust team has made huge strides in improving writability of the language, especially with non-lexical lifetimes.
Soon, `fn` and `impl` and `Vec` fade into the background, like `char` and `short` and `long`.
I think it's a subconscious habit of how your eyes traverse the code. Your mind has been trained to skip over a lot of those symbols and get the meaning from the surrounding code, and in the near term, it costs extra work to suppress that habit and actually see the symbols.
Over the long term, I don't think symbols are harder to read than keywords. C itself provides some evidence for that: imagine reading C code if experience with another language had conditioned you to look past * and &. That would be terribly confusing, but for an experienced C programmer, those symbols leap out of the code at you because you know that they carry a lot of meaning.
aside from the lifetime annotations, how is that worse than C to an unfamiliar reader? the format specifiers for printf and friends are way worse than anything in your rust example.
keeping in mind that this is about syntax, here's what I think is actually worse (but I understand that everyone is different so if you like these, it doesn't really matter what I think):
#[] - why the [] if # already makes that line different from the usual code? seems superfluous
'a - is that thing next to 'a' a smudge on my display? Did I forget a quote? better wipe the display with my finger
'&a - "mut" and "ref" but ' ?
foo! - yelling out function calls. "print!" "exit!" "macro!". Angry Codes!
foo? - when you're done yelling, make sure to ask existential questions of the return result. This one I have the least problems with since it actually makes me question the return value ("hm, something's weird here, it could be null, pay attention"), but when coupled with the yelling, just makes the whole thing look dramatic. "do_it_now!(); did_we()?"
fn - by itself not a huge deal, but the list of truncated words that are used frequently is "impl", "mut" and "pub". I save some characters (am I really in such a rush?) at the cost of reading this broken English "f-n impl moot pahb". At least C doesn't have that. Pascal had "interface" "implementation", "begin", "end", etc. Java has "public", "interface", "extends", "class".
Default for Borrowed - suddenly English! No time to type 'function' or 'mutable' but "Default for Borrowed" is a-ok?
underscores all over the place in the standard library. Even C doesn't have that many, mostly in the _r variants that were added.
unwrap() - what does that word have to do with errors? (I know what it does) It wouldn't be my first choice.
I have a whole list of awesome stuff, meh-stuff and wtf-stuff I noted down about Rust while trying to learn it (on multiple occasions), and there are a lot of excellent things about Rust, but the aesthetics of the language are important, otherwise we'd all have no issues coding in Brainf*ck.
Yes it's possible to go nuts with syntax, but this just feels like the "programmer art" of language syntax. I think it's usable (clearly), it's just not elegant to me.
> fn - by itself not a huge deal, but the list of truncated words that are used frequently is "impl", "mut" and "pub". I save some characters (am I really in such a rush?) at the cost of reading this broken English "f-n impl moot pahb". At least C doesn't have that.
int and char aren't exactly words. And let's not forget that a type like "double" makes no sense whatsoever by itself. (It's two of something, but two of what? Oh, it's "double-precision" floating point! How could I miss that?)
Now, let's look at C's standard library:
strcmp, strpbrk, isalnum, ispunct, setjmp, SIGSEGV, SIGFPE (that's the divide-by-zero exception, isn't it obvious?), SIGABRT
Sure seems like C has its own massive issue with "we can't let a name be long"
> underscores all over the place in the standard library. Even C doesn't have that many, mostly in the _r variants that were added.
va_arg, FE_DFL_ENV, int32_t, etc. Continue on into most C libraries, and underscores are pretty common because C has no other namespacing mechanism.
Syntax is pretty strange in any language if you're not used to it. If you're comfortable with C, C's syntax and spelling quirks don't stand out to you.
I fully agree with you. C, a language designed in 1972, has the issues you outlined.
I personally have higher expectations of a language designed in the 2000s by people standing on the shoulders of 40 years of computer science and language research.
Here's a contrived analogy:
Imagine if you bought the newest Tesla truck meant to replace old Ford model trucks and you had to vigorously shake the steering wheel in order to lower the driver's window.
You're saying: "Ford trucks have always used a hand crank and that feels strange too if you're not used to it!"
I'm saying: "Why did they choose to make it equally strange in the first place?"
I realize you're joking, but probably some of it came from limitations of the Era. As recently as early 2000s, I had to use a 68k assembler that had a limit of something like 8 or 10 characters for a label. The really annoying part was you could have longer names, but it would silently ignore any extra. We're still stuck with other decisions that were driven by size constraints (like the separation of /bin and /usr/bin, IIRC).
It comes from the limitations of some early pre-standard C compilers. It was fairly common to use fixed-length arrays, to avoid heap allocations for perf and memory usage reasons. This included identifiers; but then you had to decide on the maximum size - since most identifiers would be fairly short anyway, so a large array would be wasted space.
Since the first edition of ANSI/ISO C was trying to codify the already-existing common practices for maximum portability, it reflects those existing limits (5.2.4.1 "Translation limits"):
"The implementation shall be able to translate and execute at least one program that contains
at least one instance of every one of the following limits:
...
31 significant initial characters in an internal identifier or a macro name
6 significant initial characters in an external identifier"
If you look at many abbreviated function names from the C stdlib, they are specifically 6 characters long - strcpy etc. I wouldn't be surprised if the 6-char external identifier limit goes all the way back to the first K&R C compilers.
Just a few small comments. Maybe understanding some rationale will help.
> #[] - why the [] if # already makes that line different from the usual code? seems superfluous
# does not "make that line different from the usual code." The whole #[] construct is it, it has nothing to do with lines. You could put "#[foo] #[bar]" on one line if you wanted, you could write "#[foo] fn lol()" if you wanted...
> foo! - yelling out function calls. "print!" "exit!" "macro!". Angry Codes!
This helps both humans and computers parse; macro invocations don't have to follow regular Rust syntax, and the ! helps indicate that that's true.
> Default for Borrowed -
This is not language syntax, this is the name of two types. You can name your types however you'd like.
> #[] - why the [] if # already makes that line different from the usual code? seems superfluous
I don't even know what #[bla(foo)] does, or why all that punctuation is needed. maybe nesting is allowed?
> 'a - is that thing next to 'a' a smudge on my display? Did I forget a quote? better wipe the display with my finger
this is a lifetime annotation, which is a genuinely noisy bit of syntax.
> foo! - yelling out function calls. "print!" "exit!" "macro!". Angry Codes!
I guess they really want to make sure you know when a macro is being used. probably a result of ptsd from debugging c and c++ code :)
> fn - by itself not a huge deal, but the list of truncated words that are used frequently is "impl", "mut" and "pub". I save some characters (am I really in such a rush?) at the cost of reading this broken English "f-n impl moot pahb". At least C doesn't have that.
the c keywords aren't too bad, but the standard library is full of this kind of thing. stdio.h and string.h immediately come to mind.
I'm not so much defending rust as I am pointing out that c's syntax isn't that great to begin with. I've been writing c and c++ code every day for several years now, so it's usually pretty easy for me to skim and understand what is going on. but if I try and place myself in the shoes of a newcomer, I don't think the syntax is much better than rust. remember the first time you tried to parse the type of a nontrivial function pointer?
Yah, my take on rust is that its needlessly verbose in places where it doesn't make sense (structures anyone?) and to terse in places where it needs more verbosity and the rules frequency don't make sense (implicit types). Much of it appears to just have been handicaps in the original compiler implementation. At this point someone should step away from the compiler implementation look at how people are working around the language and do a 2.0/"use strict" type thing where they redefine much of the language. Then like what happens with new C++ revisions let the compilers catch up.
People proposing every other language that tried to replace C thought the same. Only time will tell, of course, but I wouldn't bet on it. Nowadays C++ is seen by many as a "garbage pile", the same can happen to rust.
Thee is an important difference between C++ and Rust: C++ tried to play the game by being a superset of C (even started as a precompiler to C) and many C programs are valid C++ (well, many C programs aren't valid C, let's ignore that here) Thus C++ always carries C's legacy along so that there is always the C way of doing things, and then the C++ way of doing things. Rust cut that part of, while allowing to call C functions.
Also, Rust has taken the path of backwards-incompatible changes, while the C++ language designers/committee go to great lengths to maintain near-perfect backwards compatibility with older versions of the language (and refuse to allow for ABI-breaking changes! Grr!)
"there is a non-trivial amount of performance that we cannot recoup because of ABI concerns. We cannot remove runtime overhead involved in passing unique_ptr by value, nor can we change std::hash or class layout for unordered_map , without forcing a recompile everywhere etc. etc."
I thought passing template instantiations over an ABI boundary was generally discouraged and thought to be asking for trouble. I guess this isn't really at odds with what the paper is saying though - it could still be that a lot of people are doing so.
edit Thinking about Hyrum's Law, [0] mentioned in the article, makes me think perhaps there was an upside to Java firmly refusing to support any kind of ahead-of-time compilation for so long. It fully closed the door on any funny business distributing Java packages as brittle precompiled native-code blobs, ensuring the bytecode format remained the way that Java packages were distributed, presumably avoiding some fraction of the issues C++ now faces.
Of course, Java still has backward-compatibility obligations, but unlike in C++ they align pretty well with API compatibility, if I understand things correctly.
There's nothing particular about template instantiations that makes them any different from any other object. The only problem with using them at ABI (or rather, linker) boundary is that instantiations have to be explicit - but that's what "extern template" is for.
is a couple ABI breaks for std::string between C++98 and C++20 that unstable ?
You can write code that uses std::string today and links against a .so built a long time ago (modulo compiler bugs of course, thus the various versions here: https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Dialect-Optio...). GCC & libstdc++ go to great lengths to preserve ABI compatibility.
Rust has a terrible, verbose syntax and solves a problem that is a non issue in many applications where C shines while making ordinary things difficult (like recursive data structures).
I am sure it has its place but I think it's just too ugly to be attractive. There is something pleasing about writing C, Python or even Javascript which you will never get with Rust. It will never be a language a lot of people enjoy writing imo.
Ugly is a matter of opinion of course, but once you start to write it you get a sense of what "looks good" and what "looks bad" and you start to write nice looking Rust if you want to. I can show you some powerful ugly C haha.
I would argue that Java and C# completely succeeded. Who is out there writing Desktop applications or web services in pure C anymore?
C++ too to a lesser extent. I work on spacecraft flight software, and there's a significant push to move from pure C to modern C++.
No single language is going to (or wants to IMO) replace C in every single use case, but replacing C in specific use cases has been a huge boon for productivity.
Rust, I'd argue, is the language trying to really take on most of the C use cases. The only one it's really not trying to take over is old embedded work where C is pretty much the only option.
Otherwise, I can't think of any circumstance Rust isn't trying to muscle in on C and C++'s territory.
I like C fine, and I also taught it when I was in grad school.
New languages mostly don't replace existing ones. Rather, they supplant them for some uses, and open up new kinds of software which are easier to implement or to conceive with the new language. Now, you referred to C++ specifically, and since I'm somewhat familiar with it I'll address some of the points you made with respect to just that one:
Bjarne Strousup said: "If you want to create a new language, a new system - it's quite useful not to try to invent every wheel." For a long while now, C++ teachers/trainers encourage their audience _not_ to think of C++ as an "augmented C" or "C with feature X Y and Z", and to avoid most "C-style" code in favor of idioms appropriate to what the language offers today.
Also, C didn't "win the competition" because there isn't a "bestest language for everything" competition. It has been, and is, a popular language with many uses. Writing operating system kernels is one kind of programming task, where C is the most popular. Even at this level (and lower still), other languages are potentially interesting and often used. See, for example:
Finally, C++ doesn't require a huge library nor a complex runtime system because it has a "freestanding mode" in which the requirements are very limited (although more is required than for C). See:
https://en.cppreference.com/w/cpp/freestanding
When it comes to declarations, the world is already moving over to Pascal's syntactic conventions in some ways. In newer languages, you see something like this:
var x: *int;
more often than:
int* x;
i.e. type name follows variable name, and type modifiers work more like unary prefix operators on types. And this is because it's less ambiguous to parse, and makes more complex types a lot easier to read, since you simply go left-to-right, instead of C's "spiraling" declarations.
The rest of Pascal's syntax is not particularly problematic, either. I'd say that the two biggest problems with it were begin..end for blocks, and having all local declarations in a block separate from code. But Modula-2 already dropped "begin", and various Pascal dialects added inline variable declarations eventually. So, on the whole, I think we'd actually be better off in terms of code readability if Modula-2 rather than C became the standard systems programming language.
Thing that makes me somewhat sad is there isn't much reason why you couldn't fix some of C's syntax and other issues. I feel like any breaking changes wouldn't be that big of deal as long as you have a backwards compatible ABI.
Say you add fn and var as keywords. Exactly how hard would that be to fix? You could probably write a tool to do that.
As opposed to C's syntactic conventions, the absolutely gorgeous mush of *s, &s, __MACRO_LIKE_FUNCTIONS___() and #preprocessor directives
The default is not the best, the default has just beaten the world so many times on the head that anything else became foreign and weird and got laughed out of the room before it even had the chance to say anything.
One programming language history book/blog/paper a day keeps the nonsense notions away, C "Won" like people win the lottery or the roulette.
Programmers have been show, time after time, not to be particularly trustworthy. Have we not learned the lesson that it's really easy to make mistakes, and we should trust tools instead of people to check our work?
> Don’t prevent the programmer from doing what needs to be done.
Ditto the above.
> Keep the language small and simple.
It is in some ways, but it's "smallness" leads to a serious lack of simplicity as seemingly simple things are incredibly hard to do right consistently. For instance, avoiding indexing past the end of an array or rolling over an integer.
> Provide only one way to do an operation.
That's nice, I'll grant you, although there are of course exceptions that prove the rule, like:
a[b]
is synonymous with
*(a + b)
*((uint8_t *)a + (b * sizeof(*a))
> Make it fast, even if it is not guaranteed to be portable.
The funny thing about rolling over an integer not being easy to catch is hard to ignore. The PDP-11 'add' instruction sets the 'C' bit if there is a carry from the MSB; the Z bit is set if the result == 0; the N bit is set if the result < 0; and the V bit is set if there is arithmetic overflow (both operands were of the same sign, but the result is of opposite sign). By simply making these bit available as, say, special names that could be tested (e.g. C N Z V) after an operation, you could determine what happened (if appropriate) and take action. HP's SPL had such a feature (used on the HP3000 series). C is not a well-designed language, but an improved 'lifeform' along the way. Today, I would like to see a c-like language developed specifically for RISC-V; especially one that had many many fewer edge cases.
> Today, I would like to see a c-like language developed specifically for RISC-V
When I've learned that RISC-V had no carry bit, I couldn't help but think it might have been designed for C to begin with. Sure, they give reasons for this choice, none of them linked to C. Still, it hurts multi-precision arithmetic any language with BigInt would have benefited from (I recall Python, Haskell, and Scheme at the very least).
if (((a > 0) && (b > 0) && (a + b) < 0) ||
((a < 0) && (b < 0) && (a + b) > 0)) { / * Overflow */ }
Now you may be saying, well, isn't there a branch-if-arithmetic-overflow instruction in practically every single architecture ever? To which I would say simplicity matters.
IMHO, C really should just standardize the standard checked overflow intrinsics. It's a lot saner than having users try to guess the correct overflow matching pattern (it's worse for multiplication), and many architectures make detecting overflow pretty trivial.
I think I might be missing the intent of this exercise, but that expression risks overflow (and undefined behaviour) when b is not equal to zero, right?
You might be interested in a related challenge: write a function in standard C++ that returns the difference between any pair of int32_t values. This cropped up on StackOverflow. It's tricky enough to trip up the incautious.
The gotcha is that the operation is commutative on type system level. C could - as many languages do - restrict the type of the left operand of [] to "something indexable".
there's one ? the "C" ABI is just the ABI of whatever platform it's running on, which may or may not have funky behaviour that vendor-provided compilers kindly hide for you - functions being prepended with '_' on macOS, the two-dozen calling conventions on windows with i386, sysv and itanium ABIs...
Do you think you can tell what's the ABI of
struct foo some_function(struct bar);
?
will bar be passed in a register, on the stack ? who knows, that depends on your platform, your compiler, etc etc. Things going cleanly on the stack is just a convenient lie that your first year comp. sci teachers tell you because it's too early to talk about how the real world works yet.
The leading '_' before symbols isn't just a macOS thing, AFAIK this is the only "name mangling" that C does and from what I've seen so far it's the same across platforms and compilers.
As far as the ABI goes: the important thing is that there is a standard ABI on a specific platform that all compilers on that platform agree on. Sounds kinda obvious, but it's not common in other languages.
the only cases you stumble over it is if you're trying to do linking / symbol loading yourself manually (e.g. in my case it was because I was looking into JIT compilation mechanisms)
ABIs have always been platform-specific in that sense. C ABI is beneficial in that on any specific platform, it's interoperable - which is good enough, because native code has to be compiled for a specific platform, anyway. In your specific example, I don't really care how "struct foo" is passed, so long as all shared libraries compiled for that platform agree on how it's done. And we have that on C level today on all major platforms.
Sure, there are compiler switches and language extensions that can break the ABI if you use them. But, well, you don't have to use them (at the interop boundary), and neither do your API clients.
C was one of those write once compile anywhere bits they used to tell us. Then when we actually tried it we found every CRT was different on all of the platforms. Even something as ubiquitous as printf I think I have encountered at least 8 different versions of it. At least from compiler chain to compiler chain they are usually similar in calling conventions (but not always). But try mixing a msvcrt with a glib one and you are in for some fun...
What is the nit of it is it almost works. You have a good shot at getting it to compile in a short amount of time. The rest of the work will be lots of time in ye old debugger and going over the docs for your platform. The fun part is you will find bugs that were there already, or are they just part of the platform, or were you using it wrong?
In effect it is yet another CRT and the idea is sound. But many times what I found was you may have one that works on say linux and windows and bsd. All the same 'code' but you dig under the covers a bit and it is a maze of ifdefs so each platform has its own quirks. For example threading between fork and createthread is on the surface not too different and you can wrap createthread with it (several libs did). But you dig into it a bit and you find portions that just do not map at all between the systems (usually with IPC and locks). At best they do not compile, slightly worse they return error codes, at worst they act like they work.
A real good example of what I am talking about is the pthread library. It works up to a point but it is a very linux/bsd orientated library. There are some gaps in there from windows that just do not map and the other way around. What is worse is the docs on some of these do not talk about cross platform issues. Luckily you can see the source code of most of them and can tell what is going on. Annoying but one of the things I learned moving code between platforms is that each one has its own way of doing things. You can try to work against it or sit down and unwind what is going on, which takes time. I have even seen this sort of issue in python and java. Where you get down to some low level thing and it just is different on different platforms.
C is write once, compile anywhere... so long as you restrict yourself to standard C, including the library. Different implementations have different levels of standard compliance, but you are still able to say, "this will do X on any conformant hosted C89 implementation".
The moment you start doing things like threads or shared libraries, yeah, it all breaks down very quickly. But that isn't standard C.
Even things like printf/sprintf act differently from library to library. trust me, read the docs on your lib.
Most of the time they are the same but not always. It is one of those things that looks like it works but in practice there are a ton of gotchas. Most of the more mature libraries are getting better but there are some edge cases out there.
They are different mostly because of different standards that libraries support - C90 has a baseline list of %-specifiers, but C99 added a bunch more, and then I think POSIX also has some now? Plus extensions.
But I can't think of any implementation that doesn't conform to C90 in that regard. So long as you don't venture into implementation defined / UB category...
An advantage of Rust and Golang (and OCaml actually[1]) is they let you write to the C ABI. We've written both components that call into existing C ABI (nbdkit-{rust|golang}-plugin), and also Rust/Golang/OCaml shared libraries that present C ABI functions / structures to the world.
[1] But you need to write a bit of C glue code for OCaml so it's not quite so seamless.
Some form of C FFI has been standard for most languages that appeared after C became ubiquitous. Even something as dynamic and high-level as Python has ctypes.
The ABI may come from C, but it's not limited to C. Many other languages have FFI and use the C ABI as the lowest common denominator.
But the C ABI is awful to work with. The C language itself offers no help to guarantee ABI compatibility. What ABI it compiles for depends on headers, which may depend on a jungle of ifdefs and typedefs.
The part 'if you want to programm video games use C++' is a bit weird though. I remember a talk from a videogame designer, when asked:'why do you use c++' the answer was actually 'because we have to' in the sense everybody does it and management requires it. The features used we basically C.
One reason would be template metaprogramming. In game development and simulations we use numerical computing most of the time, a domain where templates come in handy. The rest is just C. Modern C++ for highly efficient systems is not C with classes anymore, it is more like C with templates. And also functional constructs for specific components (physics) where it makes sense to use it. I have high hopes that a language like Zig will prove less complex and as powerful as C++ for game dev. It won't be a replacement for C++, just an alternative, hard to believe that anything will dethrone the king.
It's certainly more than just templates, but it's also not the whole hairy c++ - it's C + templates, RAII, and namespaces.
It's less template metaprogramming and just templates to generate efficient code - stuff used to be done by abusing the proprocessor can now be handled by a (slightly) more elegant templating engine rather than a string pasting engine.
Classes are used for resource management/RAII, like we have a AQUIRE_MUTEX_IN_SCOPE() macro which will release mutex when scope is exited, this is supremely useful and generalized to many resources.
Lastly namespaces are huge. In big C codebases you have to be super pendantic about naming modules and APIs consistently because otherwise it becomes a nightmare.
C++ does get in the way still sometimes, like when you want to do something slightly dirty for perf reasons, say aliasing between structs. You first write it in a way that makes sense, basically how you would write it in C, but it's UB in C++, so you rewrite it with virtual calls or memcpys such that the compiler should be smart enough to arrive at the same result as C would have with the straightforward implementation. This works great until it doesn't, your last option is to try to solve it with templates, and that hole is very deep.
C++ templates have a poor reputation in the gamedev community. Years ago some companies went too far with it and got burned. EA has their own fork of the STL that's gotten increasingly crufty and behind in performance. AAA game engines also get big, so there's a lot of pressure on avoiding explosions of build time with more advanced template techniques.
Gamedev tends to use its own patterns, particularly arena allocation for long lived fixed size tables, or their own internal object / entity / component model . It's not really c with classes or c with templates, just kind of its own dialect.
From what I've heard in presentations, many game developers seem to be some of the biggest opponents of templates, especially because of the horrible performance on non-optimized builds of template-heavy code, and in general because of the opaqueness of performance of template-heavy code.
For numerical computing:yes. For games, dont you just use float (for non-integer values)? Found the video I was thinking about: https://www.youtube.com/watch?v=rX0ItVEVjHc
from 1:24:00 onwards.
The "we have to" is because all the game development middleware is written in C++ (and sometimes C), the same way you "have to" use Javascript for writing web applications (even though WASM is beginning to change that, but it took 25 years to get there).
Actually, only half of the universe is using C++ to write games, the other half is using Unity and writes their game code in C#.
If language interopability would be dramatically better, this "lock-in" into a specific programming language wouldn't be half as bad as it currently is, and it would be much easier and less risky to use "fringe languages" for game development.
Would anyone have an up-to-date book recommendation for someone looking to learn C? Is K&R still the best way, or are much more recent things like Modern C [1] the way to go? I am concerned that if I learn techniques that are too "modern," I might not be able to contribute to open source projects that employ older conventions.
I still consider K&R to be "the" book to learn and appreciate C. Over two decades, I have seen/read several books, but keep coming back to K&R. It isn't just about the language that makes K&R great -- along with the ride also comes valuable (programming) life lessons, experience with terse language (minimum words to express maximum thought), and a lot of historical perspective that most modern books omit.
And don't skip the exercises at the end of each chapter. The discover of solving the exercises on your own is a revelation that far surpasses any benefit obtained by being told how to do it! :-)
I just read it and like it very much, especially for its brevity and because it is on-point.
If you have experience with low level languages, pointers, memory and actually have some programming experience, then this is your book, especially because it is so up-to-date.
But I would not recommend it to someone, who has no experience and wants to learn C as a first language.
The latest K&R edition is from 1988, C has changed quite a lot since then... especially with C99 which almost feels like a new and much friendlier language because of the new initialization features (compound literals and designated init).
I guess we should ditch C from "Programming 101" classes, yes? Better started with friendlier languages like Java/C#/Python etc where beginners can focus on implementing their algorithms without messing with lower level details like pointers, manual memory management, etc.
Of course I'm not saying C is not useful anymore (it still is, for example if you are doing system/kernel programming, very likely you'll deal with C). But this is not the typical case for beginners.
I was trolled into learning C by 4chan when I was 15. I would say I am grateful to them. I feel like I owe my programming career and very deep understanding of computers and computer science to that. Learning pthreads, file descriptors, client/server architecture made me a really good back-end developer.
Most other languages are implemented in C, so there's an obvious reason that it has some advantages over everything else. That the OS typically depends on the C ABI makes it mandatory. You can find warts, but as of now, there isn't a broadly proven alternative for some use cases.
I'm not entirely sure what you mean here, but this is true for neither of the interpretations I have. Most compilers are not written in the C language (indeed, even the major C compilers are all written in C++ now!). Alternatively, most languages compile down into a bytecode language, or into native assembly, without converting into anything that is or looks like C in the process. Indeed, for languages targeting native assembly, C is an unwelcome intermediate step precisely because its semantics can be too constraining.
That doesn't clear up any of my confusion, although that may well be because your experience is mostly with managed languages where the distinction between compiler and runtime is less obvious.
Even looking at major languages [1], what do we have:
* Self-hosting languages: C#, Go, Haskell, Java, most LISP implementations, OCaml, Pascal, Rust, Swift
* C++ implementation: C/C++/Objective-C compilers, Fortran compilers (using the same toolchains as the former), JavaScript, probably Visual BASIC (although that may be C# instead)
* C implementation: Perl, PHP, Prolog, Python, R, Ruby, Shell (although note that many of these languages have their libraries largely written in their own language).
* Not sure: ALGOL, APL, Cobol, Erlang, Forth, Kotlin, SQL, Simula, Smalltalk. Although I suspect that many of these are self-hosted.
7 of 32 is a far cry from "most languages".
[1] Using the list of programming languages in Wikipedia's category box at the bottom here.
"Self-hosting" can be a bit tricky to define for Forth. A typical implementation would start with a few words defined manually in assembly, and build the rest up from there.
GCC is written in C++ these days, and LLVM has always been C++. Perhaps some languages are written in C, but C itself hasn't been for some time, if you look at the most popular compilers.
Ah so, "implemented in C" usually does not mean "uses the C ABI even though it's implemented in another language." If that's what you meant, then I very seriously misunderstood you.
Perl too. But yes, it's certainly not "most languages". It's "most interpreted languages" at best, but even then C++ is probably about tied with C. And for compilers, using the language itself is much more popular, often targeting LLVM which is written in C++.
C developers are denying sad reality of obsolete archaic ideas, poor ergonomics, crappy tooling, security nightmare, unreliability, and language that is impossible to scale in a bigger team settings (don't bring up Linux; I am a kernel dev; Linux kernel is relative small, carefully engineered core, and majority of the weight are a leaf nodes of device drivers, implementing small and carefully engineered interfaces).
Wrong. Counter-example: locales [1]
Also: String handling, which is responsible for so, so, so many security vulnerabilities. The underlying cause is zero-terminated strings (instead of using start/end pairs or start/length pairs). You can't even tokenize a string without either copying or modifying it!
Also also: just a single, apologetic mention of undefined behavior. Responsible – in cooperation with over-zealous compiler writers – for so many more bugs not already caused by improper string handling.
[1] https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02...