Best aspects of C

adwn · on Nov 17, 2020

> There is little of needless code in standard library. And what there is, believe me, over the decades was really polished.

Wrong. Counter-example: locales [1]

Also: String handling, which is responsible for so, so, so many security vulnerabilities. The underlying cause is zero-terminated strings (instead of using start/end pairs or start/length pairs). You can't even tokenize a string without either copying or modifying it!

Also also: just a single, apologetic mention of undefined behavior. Responsible – in cooperation with over-zealous compiler writers – for so many more bugs not already caused by improper string handling.

[1] https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02...

flohofwoe · on Nov 17, 2020

I love C but I think most of the C standard library is useless and outdated rubbish (pretty much everything above the mem...() and math functions). The entire IO and memory-management areas should have been either thrown out of the standard library, or updated 20 years ago to keep up with low-level operating system features.

But one of the best features of C is that you can mostly ignore the standard library and still enjoy "C the language", e.g. nobody ever choose C for its standard library ;)

(also re UB etc...: use the mighty trio UBSAN, ASAN and TSAN!)

bhaak · on Nov 17, 2020

What do you use then if you don't use the standard library?

What is the standard library for the C which ends all the other standards libs?

flohofwoe · on Nov 17, 2020

For IO, memory management and similar "low-level services" it's often fine to call the underlying operating system functions directly and with this also make use of more powerful features than the standard library can provide, such as async IO, virtual memory etc (or wrap those OS calls into your own cross-platform wrapper functions). There's also plenty of libraries which do this for you, like libuv (but integrating such big libs is often not trivial).

Same for string handling/processing, there are specialized libraries out there which do a specific job better than the rather generic standard library functions (this is also true for the C++ stdlib), one just has to find those libraries.

PS: I don't think that a "batteries included" standard library (like python has) would even make sense for C, such a library would need to be opinionated by definition, better to keep (too many) opinions out of the standard, as this road just leads to another C++ ;)

arcticbull · on Nov 17, 2020

C++ is IMO defined by a complete lack of opinion. They just standardized every opinion. For everything. C++ would be a much better language if it showed some opinion, even if I disagreed with it.

anfilt · on Nov 18, 2020

That's honestly how I also feel about C++ in when I look at how often they are adding new things.

WalterBright · on Nov 18, 2020

In my long C career, I invented string library after string library. They all turned out to be garbage. The common cause of their failures is their inability to interact with any other C code.

flohofwoe · on Nov 18, 2020

Agreed, but that's an issue for all "interface types" on API boundaries (another similar issue is vector and matrix types in math libraries). IMHO a C string library must still be able to efficiently consume and produce vanilla zero-terminated "const char*" string data (and at a stable address, unlike for instance std::string) even if they mostly work with {pointer,length} ranges internally.

WalterBright · on Nov 18, 2020

C strings are so ubiquitous that the ABI impedance mismatch is just too annoying.

asveikau · on Nov 17, 2020

> Same for string handling/processing, there are specialized libraries out there which do a specific job better

I actually can't think of a lot of these that are super common across projects.

ICU is probably the most notable one, with its particular focus on unicode correctness.

A lot of large projects end up writing their own string library of sorts.

bigbubba · on Nov 17, 2020

There's bstr, I've come across that a few times.

einpoklum · on Nov 17, 2020

You can use something like GitHub or Conan to find specific useful C libraries.

There isn't a universally recognized "standard library replacement" (and people argue about what parts of the standard library they don't like), but here:

https://stackoverflow.com/questions/486383/safer-alternative...

are a few options:

* Glib (the basis for Gtk): https://developer.gnome.org/glib/stable/glib.html

* The Apache Portable Runtime (APR): http://apr.apache.org/

Caveat: I haven't used them.

1vuio0pswjnm7 · on Nov 17, 2020

djb's libraries (answer to first question)

adrift · on Nov 17, 2020

Your own.

lmm · on Nov 18, 2020

> But one of the best features of C is that you can mostly ignore the standard library and still enjoy "C the language", e.g. nobody ever choose C for its standard library ;)

C is one of the hardest languages to do this with given its anaemic dependency management.

> (also re UB etc...: use the mighty trio UBSAN, ASAN and TSAN!)

All of which will miss some cases, even in combination.

pornel · on Nov 17, 2020

There's a lot of subtly broken stuff in C stdlib. For example, time.h calls that may touch the TZ env var aren't thread-safe, because getenv is unsafe (it gives a pointer to its internal globally mutable data structure without locks).

Because C is so bare-bones, it usually leans on POSIX as its extended standard library, and that is also full of old cruft.

einpoklum · on Nov 17, 2020

One thing that is not-so-subtly broken is re-entrancy (e.g. in functions like `strtok()`; fixed later with `strtok_s()` in C11:

https://en.cppreference.com/w/c/string/byte/strtok

About getenv: This sort-of fixed with `getenv_s()` in C11

https://en.cppreference.com/w/c/program/getenv

loeg · on Nov 17, 2020

Re: getenv_s: Annex K is (still) optional. Compliant C11+ implementations may not have getenv_s.

josephcsible · on Nov 17, 2020

This is an understatement. The only relevant platform that supports Annex K is MSVC, and this is unlikely to change any time soon.

loeg · on Nov 18, 2020

And even MSVC does not actually implement the API specified in Annex K. Annex K was inspired by MSVC's design, but they diverged:

> Microsoft Visual Studio implements an early version of the APIs. However, the implementation is incomplete and conforms neither to C11 nor to the original TR 24731-1. For example, it doesn't provide the set_constraint_handler_s function but instead defines a _invalid_parameter_handler _set_invalid_parameter_handler(_invalid_parameter_handler) function with similar behavior but a slightly different and incompatible signature. It also doesn't define the abort_handler_s and ignore_handler_s functions, the memset_s function (which isn't part of the TR), or the RSIZE_MAX macro.The Microsoft implementation also doesn't treat overlapping source and destination sequences as runtime-constraint violations and instead has undefined behavior in such cases.

[0]: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1967.htm

icedchai · on Nov 19, 2020

This makes sense when you consider the history of C. Multi-threading was a relatively late add on, and outside of the standard library.

When I learned C (early 90's), multi-threading wasn't even something you considered. It was either multiple processes or an event loop with select/poll.

kevin_thibedeau · on Nov 17, 2020

strtok_r() is everywhere and easy to retrofit. It's also easy to implement using strspn() and strcspn() if you don't have it.

Cloudef · on Nov 17, 2020

Locale handling and most of the str* function in libc indeed are quite bad. The only sane function for "C strings" is snprintf.

kazinator · on Nov 17, 2020

Ones like strspn, strcspn, strpbrk, strchr are also sane.

asveikau · on Nov 17, 2020

Except when you consider what a char is. Those don't handle multi-byte sequences.

kazinator · on Nov 17, 2020

Yes they do. What they don't do is search for multi-byte characters. You would not use them for that; you use them to look for ASCII delimiters. You can tokenize a UTF-8 string on ASCII delimiters using functions that are oblivious to UTF-8, and this is in fact a preferred technique in many programs.

asveikau · on Nov 17, 2020

I think I was going for more like: strpbrk and friends can't use a multi-byte sequence as a delimiter. And if a delimiter appears as part of a multi byte sequence, you may see strange results.

In a localized world searching for delimiters also starts to make less sense. Eg. I have been told it doesn't make sense to break on whitespace for Chinese text.

To be clear, I am not saying these functions are bad or evil or to blame for their limitations (a bunch of the problem space wasn't invented yet when they were introduced), just noting they have limits. A bunch of more recent languages and libraries have the same or similar issues, too.

kazinator · on Nov 18, 2020

strstr can find a multi-byte character in a multi-byte sequence. There is no ISO C function to find the first occurrence of any one of a bag of multi-byte characters, though it's not difficult to write one.

> I have been told it doesn't make sense to break on whitespace for Chinese text.

It could make sense to break on whitespace in some programming language or data format that allows Chinese (and other) identifiers.

A command interpreter that allows Chinese arguments (such as file names) wants to break on spaces, as usual.

> if a delimiter appears as part of a multi byte sequence, you may see strange results.

UTF-8 was designed by a dyed-in-the-wool C-and-Unix engineer, who ensured that such a thing can't happen. No character in the 0x00-0x7F range can occur in a multi-byte character.

The byte which starts a UTF-8 character cannot occur anywhere other than at the start, which is why we can use strstr to look for it.

jesuscyborg · on Nov 17, 2020

Use wcspbrk.

UTF-8 continuation characters are limited to the range \200 through \300 so there's basically zero chance that if you choose something like comma as your delimiter that it's going to tokenize the middle of a multibyte sequence.

Also take into consideration that, under the hood, functions like strpbrk() are typically accelerated by CPU instructions such as PCMPISTRI which doesn't support UTF-8 natively but it does support UCS-2.

loeg · on Nov 17, 2020

> so there's basically zero chance that if you choose something like comma as your delimiter that it's going to tokenize the middle of a multibyte sequence.

Not just "basically;" there is no possible collision between ASCII characters and any valid multibyte encoding. This can be seen somewhat visually in this table[1] and is an intentional aspect of the UTF-8 design.

[1]: https://en.wikipedia.org/wiki/UTF-8#Encoding

asveikau · on Nov 17, 2020

How about with joiners and combining characters? Eg. If you encode é as U+0065, U+0301 (\x65\xcc\x81), then search for 'e' and act on the result somehow, you fail to consider the whole glyph.

loeg · on Nov 17, 2020

Sure. You're talking about glyphs that are composed of multiple unicode codepoints; my earlier comment is true of single codepoints only. The comment I was responding to is also talking only about single codepoints (wcspbrk cannot represent delimiters longer than a single codepoint).

On joiners / combining characters: I'd encourage using composed normalization (NFC) rather than decomposed normalization (NFD).

Just curiosity: are there any glyphs that lack a single codepoint representation, where one of the joined codepoints is an ASCII character? (That only helps after normalization, of course.)

jesuscyborg · on Nov 17, 2020

Yes. ASCII uses \b as the combining character mark which is a convention that's always been widely supported by typesetting programs such as less and nroff. For example, A\b_ is A̲, and you can do the same thing with apostrophe and tilde for accent marks. There's also UNICODE emojis where two codepoints in sequence get joined together as a single glyph. Never underestimate the creative ways text can be used, or that standards just codify a long history of practices.

loeg · on Nov 18, 2020

Er, I was asking about unicode joining, not this roff \b thing. Sorry for the confusion. I'm aware that multiple-codepoint unicode glyphs exist; I'm asking if any of those involve a codepoint in the ASCII (1-127) range which cannot be normalized to a single codepoint (e.g., e + ' normalizes to a single codepoint é).

jesuscyborg · on Nov 18, 2020

Of course. Take for example mͫ (m+m) there's no way to represent that as a single codepoint. Combining marks can also be overlaid multiple times, e.g. m͚ͫ (m+m+∞) so the number of glyphs you can create is limitless. There's only a tiny number of the combinations that are possible which have a tinier normalized form. The new UNICODE combining marks work by almost exactly the same principles as the \b ASCII combining mark. That's why I mentioned it earlier.

loeg · on Nov 18, 2020

Thanks!

kazinator · on Nov 18, 2020

In what data format or programming language is 'e' a delimiter? One situation is floating-point constants, where 'e' is a delimiter indicating the exponent. However, if an é occurs in the middle of such a constant, whether as a single code point or a combined character, that is an error. The 'e' must be followed by an optional sign and one or more decimal digits.

The ISO C library string handling stuff is for systems programming, not for scanners and parsers for natural written language.

Cloudef · on Nov 17, 2020

True. With the snprintf comment I was mostly pointing out str* functions that are designed to either modify or concatenate strings.

ringshall · on Nov 18, 2020

It's been a long time since I've used C, but I recall strncpy being safe. I assume from context that I'm wrong, though.

WalterBright · on Nov 18, 2020

Every time I review code and see strncpy, I look closer because it's always used incorrectly. It's always about the terminating 0. Is the 0 there or not? Is the 0 part of n or not? Does the destination have to be n+1 in size for the 0?

I quit using it myself because I could never remember just what the exact protocol was for 0.

Gibbon1 · on Nov 18, 2020

Yeah used naively strncpy leaves you an unterminated string. Also like all of them it's up to the caller to predetermine if they will fail if called. So instead of having the checks in one place inside the string function. You have them scattered all over the code if at all.

WalterBright · on Nov 18, 2020

The irony of strncpy is it's supposed to eliminate memory corruption errors, but instead is a prolific source of them.

Gibbon1 · on Nov 19, 2020

I think I read somewhere the provenience of strncpy was to copy strings into a fixed length field which is why it has the deranged behavior of not terminating the string. Think file systems where the max file name is 8 characters. Or compilers that truncated variable names at 31.

johntb86 · on Nov 18, 2020

Strncpy doesn't add a \0 if it truncates the string, so the user has to remember to do that.

Cloudef · on Nov 18, 2020

strncpy is one of the trap functions. It's better avoided. Write your own or use snprintf.

jjgreen · on Nov 17, 2020

That [1] is a top-quality rant, but I fear for the author's blood-pressure

donquichotte · on Nov 17, 2020

Exhibit B: yes, the command that repeatedly prints `y` to stdout. Exhibit C: all handling of time, with time_t, struct timespec, timeval etc. We should just have a 64 bit uint representing nanoseconds from epoch, which would give us 585 years and get rid of much nonsense.

[1] https://github.com/coreutils/coreutils/blob/master/src/yes.c

wool_gather · on Nov 17, 2020

...? `yes` is not part of the C standard library.

bigbubba · on Nov 17, 2020

I think they meant to draw attention to the implementation of `yes` to showcase crustiness the C stdlib imposes on even a trivial program.

But of course GNU is kind of notorious in this regard. Compare their `yes` to OpenBSD's. It's night and day.

https://github.com/openbsd/src/blob/master/usr.bin/yes/yes.c

_28jh · on Nov 17, 2020

It's not that GNU is incompetent or overly bureaucratic or anything. It's that they really didn't want to infringe on Unix, and they presumed that Unix's yes would have been implemented in the most obvious, dead-simple way. And, as a plus, GNU yes can stream y's at like 10 GB/s, if you need that.

throwhypothetic · on Nov 17, 2020

Time should be a signed int with enough bits to go backward from the epoch at t=0 to the big bang at t=negative whatever.

aw1621107 · on Nov 17, 2020

> enough bits to go backward from the epoch at t=0 to the big bang at t=negative whatever.

For reference, the universe is about 13.787 billion years old [0]. That's about 13.787 * 10^9 * 365 * 24 * 3600 = 4.348 * 10^17 seconds, which (I think?) is a 59-bit number [1]. 10ths of a second will require 62 bits, which is right about at the edge of what a 64-bit signed integer will allow.

If you want milliseconds, you'll need at least 69 bits. For nanoseconds, you'll need at least 89 bits.

So you'll either need an integer type that's wider than what's natively supported in most hardware (thus potentially sacrificing performance), or you'll have to sacrifice precision.

[0]: https://en.wikipedia.org/wiki/Age_of_the_universe

[1]: https://www.wolframalpha.com/input/?i=13.787+*+10%5E9+*+365+...

syrrim · on Nov 17, 2020

struct timespec already exists and stores time as two 64 bit integers. Should last a little while.

aw1621107 · on Nov 18, 2020

Ah, that's something I wasn't aware of. Yeah, that scheme could work if you didn't need a single integer to represent time.

AnimalMuppet · on Nov 17, 2020

Just go for it. In the larger scheme of things, it's not that long until 128-bit CPUs.

I'm mostly kidding. But if you're thinking of representing nanoseconds since the big bang, the wait for 128-bit CPUs is not very long...

shakna · on Nov 17, 2020

GNU has extensions to support 128bit number types [0] already, for systems that can handle it.

[0] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fint128.html

jasonwatkinspdx · on Nov 17, 2020

So, the problem is a 64 bit int only gives you ~500 years around the epoch at nanosecond resolution. That's sufficient for timestamping, but not great for a general purpose time library where even ordinary civil calculations might exceed that range, particularly in intermediate arithmetic.

Since larger than 64 bit ints are a disaster for portability, the reasonable solution is to go with a 64 bit signed seconds, 32 bit nano offset field. A lot of language std libs have adopted something along these lines.

DJB was advocating for everything to be in a format like this, referenced to TAI (UTC without leap seconds, basically). Sadly that didn't get any traction.

ClumsyPilot · on Nov 18, 2020

I remember in my University years i was astounded to find out that none of the programming languages we were taught could represent the Jurrasic period, or the day the earth was formed, as a date.

I was horrified even more when i learned that future leap seconds are undefined, and we literally can't tell what is the time on the clock lik3w a million seconds from now.

marcosdumay · on Nov 17, 2020

In what resolution?

What do people that need higher resolution use? And the people that don't care about that amount, do they pay the performance penalty?

Every physical or humane value is hard to some extent.

fennecfoxen · on Nov 17, 2020

This only really really works if you do TAI. Down with leap seconds.

Gibbon1 · on Nov 18, 2020

In my firmware I keep time as a 64 bit signed number that counts 32.768 khz ticks since the unit powered up. I think it'll roll over in 8 million years or something like that.

If you want to know what's broken? Most real time clock modules. They almost all want to store time as HH:MM:SS MM:DD:YY and sometimes 1/256 of a second but sometimes not.

andi999 · on Nov 17, 2020

the over-zealous compiler writers is getting a big problem. I remember a while ago there was a proposal of a friendly c standard.

wool_gather · on Nov 17, 2020

Indeed; by John Regehr: https://blog.regehr.org/archives/1180

I don't think it has really been proposed anywhere though, unfortunately.

andi999 · on Nov 18, 2020

I have been surmising there might be copyright issue of the actual standard?

mrkeen · on Nov 17, 2020

> The underlying cause is zero-terminated strings (instead of using start/end pairs or start/length pairs).

What type (and how big) should length be?

masklinn · on Nov 17, 2020

> What type (and how big) should length be?

size_t.

mikepurvis · on Nov 17, 2020

Probably something like how msgpack (or even variable length unicode encoding) works, where the low 127 bytes is a single uint8 char in place, and then the next 121 signals that a string of length 1-121 bytes follows, and then after that are a handful of sentinel values for "length follows and it's 1/2/4/8 bytes" plus maybe some special cases like zero-length string.

I can understand that this might have been too much implementation complexity/risk to contemplate 40 years ago, but this kind of pattern is very well established at this point, especially in scripting languages with loose typing.

jjav · on Nov 18, 2020

> Also: String handling, which is responsible for so, so, so many security vulnerabilities.

That's not really a counter-example. Sure the string handling was an unfortunately choice but it is the standard. The standard library must implement it as defined. Being standard-compliant is not lack of polish.

aw1621107 · on Nov 18, 2020

Depends on what one means by "polished", I suppose. You can have a polished implementation of a not-so-polished specification, and vice versa.

adamnemecek · on Nov 17, 2020

> You can't even tokenize a string without either copying or modifying

rust-lang saves the day, yet again https://www.brandonsmith.ninja/blog/favorite-rust-function

ncmncm · on Nov 17, 2020

One word: strtok

It takes more code to use it correctly than to hand-code what you think it is supposed to be doing for you. strtok is Cursed.

strlcpy is similar. If you don't write that much more code, you are not using it correctly, and it is not giving you the value that is the reason you thought was why you were using it.

hazeii · on Nov 17, 2020

A single counter-example doesn't make it all wrong.

Perhaps a committee was involved.

jcranmer · on Nov 17, 2020

> There is little of needless code in standard library. And what there is, believe me, over the decades was really polished.

Ha ha ha ha ha ha ha ha ha ha.

Locales are a massive clusterfuck, basically too simple to handle localization if you actually care about it, but supports enough of it to screw you over if you don't care about it. The "wide character" support is also a nightmare. The time library support is also quite a bit wonky (years are measured as years since 1900 because Y2K is definitely not a pressing issue in 1989!).

> inline Assembly

Fun fact, here is the C specification's entire mention of inline assembly:

> The asm keyword may be used to insert assembly language directly into the translator output (6.8). The most common implementation is via a statement of the form: asm (character-string-literal);

There is no discussion of what inline assembly can and cannot do, how it interacts with the rest of the code in term of semantics, how to pass arguments to and form inline assembly, etc. You might get some of this information from the manuals of compiler implementations, but even that can be surprisingly free of necessary information. Compare this to Rust's inline assembly documentation: https://rust-lang.github.io/rfcs/2873-inline-asm.html (which is more detailed than even gcc's or LLVM's inline assembly documentation).

coliveira · on Nov 17, 2020

I have been programming in C for the last 25 years, and every year or so someone comes with a new shinny thing that will "replace" C. First it was C++ (we know how well it did...), then Objective-C, then Java, then C#, then Go, and now rust.

Everyone of these language brought new ideas, but they don't stand a chance because their designers don't understand the point of C. The C language didn't win because it was the "best" language or had the best set of features. Far from it. Even in the mid 70s it was a backward language compared to other cool languages of the day like Algol and Lisp.

C won the competition because it just gives programmers the bare minimum functionality to put an operating system and a compiler in place! It is flexible, you can provide your own library if you want, and therefore gives your easy portability. OS writers will chose C any hour of the day or night because it makes their job easier.

By comparison, other languages will require a huge library to be available, and sometimes a complex runtime system, just for you to write a simple "hello world"! Imagine if you need to write a new OS, a compiler, a linker, or a shell interpreter... you get the idea.

My conclusion is that language designers still didn't get what made C so successful and therefore keep coming up with shiny complex things that don't stand a chance to become the next C.

jcranmer · on Nov 17, 2020

> C won the competition because it just gives programmers the bare minimum functionality to put an operating system and a compiler in place!

Not really. C won because it was the standard compiler for Unix, and was a free compiler for a free operating system in a time when both were highly unusual.

But in the past few decades, C has lost a lot of its market share to other programming languages. In the realm of desktop (and mobile) applications, C is basically unused for new projects--its standard library is truly anemic here, and major support libraries (e.g., GUI toolkits) are often in C++ and not C. Where C is still the dominant language is the land of embedded applications, and it's not had a lot of competition here since most languages don't bother trying to define a freestanding implementation.

Rust is really the first language to try to contend this space. And there are signs that it may supplant C: Intel is apparently looking to move its firmware to Rust; Linux is allowing Rust for device drivers and kernel modules. Hell, even some OS programming courses (e.g., Stanford, Georgia Tech) have moved their curriculum to use Rust instead of C.

Cloudef · on Nov 17, 2020

Rust doesn't compete with C, rust competes with C++. Rust doesn't get why C programmers still use C at all.

loeg · on Nov 17, 2020

As a C (not C++) programmer, I understand where you're coming from — Rust is a big complicated language, like C++, in comparison to C — but personally, I really want to use Rust in some applications where I would otherwise use C. I don't think it can be said universally that Rust has no value to C practitioners or that Rust's designers don't understand why people use C.

Cloudef · on Nov 17, 2020

For sure. I do see the strengths of rust as well. And would most likely use it if i had to write software that absolutely has to be memory safe. It wont prevent other kind of issues however.

alserio · on Nov 17, 2020

This comment irks me in a strange way. I've found very little software where what you care about doesn't build upon the fact that you trust your program's memory safety. You can use C or any other language, but if your tools can't free you from that worry, memory safety should be something that your programs "absolutely" have.

Cloudef · on Nov 18, 2020

Not every program is connected to internet nor does they run elevated priviledges, or they run anything where people lives would be at stake.

In addition operating systems gives you bunch of safety measures if you design around them. (Obvious being separate processes)

One has to also remember that even if your program would be memory safe it still does not guarantee its flawless nor secure. One very good recent personal experience I can give you is certain popular swift SQL library. It was written in swift which also boasts the "memory safe by default". However, that library due to its design was subject to the very basic SQL injection attacks. Needless to say I ended up writing my own SQL bindings. I hope that library is now fixed or something replaced it, as since at the time I had to write swift (and no I didn't like it), it was the only and most popular (scary) option on github.

I could also write long rant about threads and how the implementations and the threading primitives often have subtle bugs or are broken in different ways. And how using threads in general makes your application basically undefined unless you only absolutely know every piece of code that runs on that thread.

alserio · on Nov 18, 2020

I wasn't talking about security, but correctness, although statistically people underestimate the level of security required by their projects, more than not. Memory safety does not imply correctness, but if you want to reason about correctness you really have to trust that your program doesn't do something funky with your bits. And, at least my programs, like to prove me wrong about what I believe is possible.

Cloudef · on Nov 19, 2020

If you want correctness, you need proofs, or generate the code from specification. And even then cosmic rays and bit-banging and faulty hardware can break your application. Or simple change of stack size :)

jnwatson · on Nov 17, 2020

It does indeed compete with C in a lot of places. Device drivers and embedded are a couple of places where Rust will replace a lot of C.

Cloudef · on Nov 17, 2020

It may replace some software, but it won't really lure many of the C / embedded developers. The problem is that rust developers miss the point of C. People who program with C want the bare minimum, they don't care about classes or other OOP features. If there was something that sort of grogs what C programmers want, it's probably zig-lang.

To put it in short, C programmers hate needless abstraction and work with raw memory and hand-crafted data structures whenever possible. More data, less code.

arcticbull · on Nov 17, 2020

Putting on my embedded engineering hat, it's not that I wanted the bare minimum it's that I couldn't afford anything more than the bare minimum.

I couldn't afford abstractions, memory safety guarantees, algebraic data types, generics, pattern matching, thread data safety (at least in the context of interrupts). The languages that had these things were hulking languages with giant runtimes and exactly zero support for embedded development. Not to mention no vendor toolchains.

Rust supports all these things, with no allocator, no standard library, and often with zero additional cost -- in terms of compute and in terms of memory. Then, being built on LLVM means that the vendor toolchain support is quickly becoming a non-issue. I suspect we'll start to see more and more of Rust in embedded, but only time will tell.

And as others have called out, Rust has very few OOP features like an optional notion of a `self` on a function bound to a structure. There's no classes, no subclassing, no message passing, no inheritance at all (structure, interface or implementation), limited dynamic dispatch, no polymorphism.

scoutt · on Nov 18, 2020

> Rust has very few OOP features like an optional notion of a `self` on a function bound to a structure. There's no classes, no subclassing, no message passing, no inheritance at all (structure, interface or implementation), limited dynamic dispatch, no polymorphism.

As an embedded developer for a living, if you put it that way, then the first thing it comes to my mind is "so why should I bother learning Rust" for embedded.

Other than the usual "please consider UB and network/memory handling/security issues" (reasons that don't really affect me), so far nobody could provide a convincing answer.

It's not that the usual answer I get is wrong or invalid or doesn't have a point. But if I ask "ok, what else?" there is really little motivation for me to move on.

Someone once told me I'll become an outdated curmudgeon here on HN, but I think I'll be long gone before something truly deserves to replace C for embedded (and low level).

arcticbull · on Nov 18, 2020

Personally, I always encourage people to learn new and (particularly) different languages from time to time and see what concepts and design patterns they can learn and take with them into their day to day. All I can say is give it a shot sometime and see what you think. Worst case you've learned something :)

> Someone once told me I'll become an outdated curmudgeon here on HN, but I think I'll be long gone before something truly deserves to replace C for embedded (and low level).

You (and I) may become an outdated curmudgeon before long, but it won't be because you refused to learn Rust haha.

int_19h · on Nov 19, 2020

> the first thing it comes to my mind is "so why should I bother learning Rust" for embedded.

Generics alone are a huge time saver compared to C, even if you use hacks like macros in the latter to do something similar already.

adwn · on Nov 17, 2020

> C / embedded developers

Embedded developers use C because they have to, not typically because they want to. And they have to use C because of proprietary toolchains and existing libraries, among other similar reasons. Good reasons, sure, but not because C the language is so great.

> hand-crafted data structures [...] More data, less code

You're paying lip service to the principle of datastructures-over-algorithms, yet you're advocating for a language which has neither algebraic data types nor tuples? Come on. That's like praising functional programming but using Fortran.

jolux · on Nov 17, 2020

Rust doesn’t have classes or indeed many traditional OOP features, and it has an extremely strong focus on data.

loup-vaillant · on Nov 17, 2020

> To put it in short, C programmers hate needless abstraction and work with raw memory and hand-crafted data structures whenever possible.

I wrote Monocypher in C for one reason: portability.

From a systems point of view, crypto libraries are trivial: you don't need any dependency, code is pathologically straight-line, there is no almost data structure to speak of beyond arrays of bytes and arrays of words.

Yet I can tell you that if not for portability, Rust would have been a better fit. So I could group buffers and their size in a single argument. So I could provide genuinely high-level interface. So I could use types to avoid silly mistakes and enforce some invariants. Portability won over all that goodness: worst case I can have a Rust wrapper. Heck, someone else already wrote one for me.

typ · on Nov 18, 2020

These zero-cost-abstraction languages don't pay much attention to binary code size. They can't compete with C in that regard. For the low-cost, high-volume embedded devices, code size relates to the unit cost, and thus the profit margin.

ordu · on Nov 18, 2020

Just read this.

https://www.ecorax.net/as-above-so-below-1/

mrwoggle · on Nov 17, 2020

> To put it in short, C programmers hate needless abstraction and work with raw memory and hand-crafted data structures whenever possible. More data, less code.

This. Even if it is less safe.

jjgreen · on Nov 17, 2020

C says "Rust? Who?"

ghostwriter · on Nov 17, 2020

> Rust doesn't get why C programmers still use C at all.

Because Rust is redundant, its safety guarantees are a subset of safety guarantees of ATS[1]. A project in plain C integrates more naturally with it at any point of its development cycle[2], and it doesn't require giving up safe pointer arithmetics[3]

[1] http://ats-lang.sourceforge.net/DOCUMENT/ATS2FUNCRASH/HTML/c...

[2] http://ats-lang.sourceforge.net/DOCUMENT/INT2PROGINATS/HTML/...

[3] http://ats-lang.sourceforge.net/DOCUMENT/INT2PROGINATS/HTML/...

EDIT: those who downove, let's discuss the topic in substance and let's avoid zealotry. If you promote and pitch Rust to the audience of C by advertising its safety guarantees, zero cost abstractions, and how well it integrates with C ABI, at least be consistent when it turns out that there's another $TECH that does it safer and more consistently with C programmers' reliance on certain useful features of C.

Inityx · on Nov 18, 2020

Are you really arguing that Rust is currently redundant because of a language that hasn't hit v1.0.0 yet, has no industry support, and doesn't use a modern build system or package manager?

ghostwriter · on Nov 18, 2020

I do really argue that Rust is redundant for C developers who want to bring some extra safety into their projects without disrupting their toolchains and practices. I also argue that this extra safety can bring more than Rust is capable of providing. Now, what's the technical merit of Rust that justifies spending time learning it when the same time can be spent on bringing ATS into the same codebase/toolchain one function at a time?

What's the point of waiting for 1.0.0? It's just a tag that doesn't save you from bugs and breaking changes.

What is industry support? What does it have to do with a team where everyone can read the documentation of the tool that is already built on top of a mature GCC ecosystem and that adheres existing approaches to debugging, profiling, releasing and maintaining C codebases?

> and doesn't use a modern build system or package

Why do I need a separate solution such as Cargo if I can build, package, and distribute everything with Nix and get reproducibility, distributed builds, transparent caching, and environment isolation along the way for free?

WalterBright · on Nov 18, 2020

C won because it was the only low level language that was competently implemented for DOS for years. DOS was were 90% of the programming action was in the 80's, and C was a perfect fit for DOS. C also was easily extended to handle segmented memory.

int_19h · on Nov 19, 2020

(Turbo/Borland) Pascal was comparable to C in terms of closeness to hardware on DOS - I mean, it even had language facilities specifically for interrupt handlers! A lot of DOS software was written in that. In some countries, it was the case for most DOS software made in them.

So I'd amend this to: C won because it was the only cross-platform low level language that was competently implemented for DOS.

witherk · on Nov 17, 2020

I'm enrolled for a masters at Georiga Tech right now and last I checked the OS course is in C, I would kinda prefer to learn rust anyway so I hope you are right.

bluecalm · on Nov 17, 2020

Why would you prefer to learn Rust at this point, especially during OS course when all relevant OS code you will encounter is in C and understanding C is crucial to understand low level programming concepts. You can always learn Rust later when/if it becomes more relevant. Without C a lot of programming areas will be a closed door forever.

jcranmer · on Nov 17, 2020

> understanding C is crucial to understand low level programming concepts

That is a patently false statement. To understand low-level programming concepts, you need to understand fundamental notions about how machines represent state in registers and memory, how memory is organized (including primarily the concept of function calls and the stack), and the indirect referencing of memory via pointers. Note that nowhere in that list did I describe a concept that is unique to C.

In fact, one of the more common approaches to introducing developers to low-level programming is to introduce them to these concepts via assembly (say, Nand2tetris). In my own experience TA'ing such a course, I am more than willing to translate code into whatever language the student is most comfortable with to express the concepts as necessary. You can absolutely learn these concepts in other languages, and my own suspicion is that unsafe Rust does a slightly better job of it than regular C does.

C does not have a monopoly on understanding the low-level organization of code, and quite frankly, C's lack of coverage here can be frustrating. C has no concept of multiple return values, functions with multiple entry points, unwinding the stack, computed goto, SIMD vector types, nested functions, discontinuous structures, or tail calls, and these are all concepts that are present in other languages that cannot be expressed in standard C or often even in vendor-extended C.

ghostwriter · on Nov 17, 2020

> You can absolutely learn these concepts in other languages, and my own suspicion is that unsafe Rust does a slightly better job of it than regular C does.

> C has no concept of multiple return values, functions with multiple entry points, unwinding the stack, computed goto, SIMD vector types, nested functions, discontinuous structures, or tail calls, and these are all concepts that are present in other languages that cannot be expressed in standard C or often even in vendor-extended C.

ATS does a better job than safe Rust and you can think of it as a vendor-extended C (http://ats-lang.sourceforge.net/DOCUMENT/INT2PROGINATS/HTML/...).

steveklabnik · on Nov 17, 2020

https://tc.gts3.org/cs3210/2020/spring/info.html

Dunno how often this class runs, and if it was just for one semester or more than that.

jcranmer · on Nov 17, 2020

GT just switched over in the spring semester, I believe.

coldpie · on Nov 17, 2020

> Linux is allowing Rust for device drivers and kernel modules.

Oh, did this happen? I remember some discussion some months ago, has it actually been merged?

jcranmer · on Nov 17, 2020

Nothing has merged yet, but the basic plan to start allowing Rust has been approved at a high level.

jamil7 · on Nov 17, 2020

One area I still see it pop up frequently is audio related code on mobile.

loup-vaillant · on Nov 17, 2020

You probably don't realise it, but you're cheating.

Operating systems are currently written in C, and therefore have a C interface. Which language is best at interfacing with C? (No trick here, just a rhetorical question.) C of course. Other languages would need some sort of FFI, which is generally unwieldy enough that the designers hid it behind a comprehensive standard library.

C doesn't need an huge library to be available, you say? Oh but it does. It's called the kernel. Comes with a freaking huge runtime too.

> Imagine if you need to write a new OS, a compiler, a linker, or a shell interpreter... you get the idea.

I think I do, but I'm afraid you don't. Writing an OS (and the rest) in Pascal (I'm thinking of Oberon specifically) is no harder than to write it in C. If you write your whole OS in Blub, interfacing with Blub will be easiest in Blub, you won't need an extensive Blub standard library because you already have the kernel, all the tools (debuggers, editors…) will be Blub friendly…

Lisp machines used to be a thing, you know.

> My conclusion is that language designers still didn't get what made C so successful […]

Language designers can't even address what makes C so successful: network effects.

josephcsible · on Nov 18, 2020

> Operating systems are currently written in C, and therefore have a C interface.

Not true. Operating systems are written in C with some assembly, and the interface to userspace is universally written in assembly. E.g., https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux...

loup-vaillant · on Nov 18, 2020

> the interface to userspace is universally written in assembly.

Which by some eery coincidence happens to conform to the C ABI of the platform. Come on.

josephcsible · on Nov 18, 2020

The point is that it has a C interface despite being written in assembly. Your original post said the only reason it had a C interface was that it was written in C.

loup-vaillant · on Nov 18, 2020

My original sentence was "Operating systems are currently written in C, and therefore have a C interface."

I stand by that claim: the only reason operating systems have a C interface is because they are (at least originally) written in C. The fact that the kernel has some parts in assembly is immaterial, even if those parts happen to comprise the userland interface.

And it's not just the kernel: when UNIX was re-written in C, everything was written in C. The compiler, the core utilities, the editor, the shell… the whole OS, not just the kernel. Of course it was easier to interface to that in the same language everything else was written in.

Likewise, the Oberon operating system, which was written almost entirely in the Oberon language (which should have been named Pascal-3) has an… Oberon interface. Want to use C to interface with it? My, you'd have to write a whole compiler, perhaps a non-trivial runtime, debugging tools, and of course an FFI to interface to Oberon, the de-facto linga franca.

C looks much worse when it's not already the king of the hill. Its strength lies in its ubiquity more than in any quality of the language itself.

gautamcgoel · on Nov 18, 2020

I have to say, this is one of the most patronizing comments I've come across. It's rude even by HN standards.

loup-vaillant · on Nov 18, 2020

That certainly wasn't the intention. What part rubbed you the wrong way?

gautamcgoel · on Nov 18, 2020

This line: "I think I do, but I'm afraid you don't. "

PS. I appreciate you reaching out and making an effort to understand why the comment seems snarky to me.

randomNumber7 · on Nov 18, 2020

> Writing an OS (and the rest) in Pascal (I'm thinking of Oberon specifically) is no harder than to write it in C.

You can't because it is impossible to escape the type system. Without the ability to cast pointers you can't write a memory allocator. You can't write a function like dlopen()...

loup-vaillant · on Nov 18, 2020

The "impossible" has been done: http://www.projectoberon.com/

Not sure about the memory allocator, Oberon may have avoided the problem by having a GC.

randomNumber7 · on Nov 18, 2020

This is a nice article about the shortcomings of pascal. I don't know oberon so maybe it changed a bit.

http://www.cs.virginia.edu/~evans/cs655/readings/bwk-on-pasc...

int_19h · on Nov 19, 2020

This article talks about Wirth's Pascal as originally defined. But various language dialects evolved way beyond that, and some of them became effectively dominant.

Here's what low-level programming in Pascal looked like on DOS: http://www.retroarchive.org/swag/TSR/0022.PAS.html

loup-vaillant · on Nov 18, 2020

Thanks for the link, I'll go take a look. I'm pretty sure Oberon addressed some of the pitfalls cited there, if only so Wirth could write an OS with it. Not every language is OS worthy, after all.

jcelerier · on Nov 17, 2020

> First it was C++ (we know how well it did...)

it did well enough that C stdlibs (at least MSVC's, LLVM's) and compilers (... pretty much all the big ones) are implemented in C++ and just export C symbols nowadays, likewise for newer OSes like Fuschia.

SerenityOS (https://github.com/SerenityOS/serenity) was written from scratch in two years in C++ and goes as far as having a custom web browser & JS engine. Where is the equivalent in C ? Where are the C web browsers, C office suites, C Godot/Unity/Unreal-like game engines ? Why is Arduino being programmed in C++ and not C ?

qppo · on Nov 17, 2020

Language designers are pretty clever people and understand a great deal about the successes and failures of languages. I think if you look at a lot of what Rust is doing today (and C++ yesterday, for better or worse) you'll see a lot of inspiration and influence from the successes and failures of C. Particularly when it comes to safety and generic programming.

Objective-C, Java, Swift, and C# have become massively successful as application programming languages because C is/was terrible at it. They learned a lot about how painful it was to do basic higher level programming tasks when you are restricted to C's semantics and memory model.

C is great but I don't think it's worth romanticizing since history has shown that C isn't that great for writing anything but systems code. Which is a restricted domain to begin with, and isn't even that attractive for it anymore.

The one thing C has over anything else is interop. The language of FFI is C. There's no inherent reason for that other than history, and it's not super broken so we're not going to fix it.

bluecalm · on Nov 17, 2020

Games (many in C++ with one or two features beyond basic C). All kind of solvers. GPU programming.

C sucks when you need convenience of big standard library or safety above performance. There is nothing like C when you care about speed, memory footprint and efficient memory management. It's great other languages took over in areas C is terrible at but it's not like areas when it's the best and often they only option disappeared.

qppo · on Nov 17, 2020

These specific claims are flaky at best but hard to argue over since anyone can construct enough cases for or against their position to be right enough not to change their opinion of C.

You can absolutely beat C in everything you list. And you also can't. I don't agree that C is the end all be all of performance or code size, except in a handful of cases where there's nothing else available.

kazinator · on Nov 17, 2020

C won over because of the mystique of the syntax in which you can load multiple side effects into expressions. The intuition was that this leads to faster code, and in fact, with naive compilers, it did lead to faster code. C's terseness won over programmers who hated typing things like BEGIN and END for delimiting blocks. Unlike Pascal or Modula, C came with something very useful: a macro preprocessor. This is such an advantage, that it's better to have a crappy one than none at all. Those programmers who did not hate BEGIN and END could have them, thanks to that preprocessor. The preprocessor also ticked off a checkbox for those programmers who were used to doing systems programming using a macro assembler.

C started to be popular in the microcomputer world at a time when systems programming was being done in assembly languages. For instance, most arcade games for 8 bit microcomputers, were written in assembler. Some applications for the IBM PC were written in assembler, such as WordPerfect.

The freedom with pointers thanks to arithmetic would instantly make sense and appeal to assembly people, who would find a systems language without pointer flexibility to be too strait-jacketed.

bluetomcat · on Nov 17, 2020

I see C as a minimally-viable HLL. Just the essential high level stuff – expression-oriented syntax (a huge thing over assembly), structured programming support (conditionals, loops), automatic storage management (no worries about what goes in registers and what goes at certain offsets in the stack frame), abstraction over calling conventions, and a rudimentary type system around scalars with a certain size and pointers pointing at them.

coliveira · on Nov 17, 2020

That's my view too. The surprising thing is that a minimally viable language is still able to accomplish so many things.

coldpie · on Nov 17, 2020

I'm in a similar position (only 10 years professionally though). I agree with everything you said, except I think Rust genuinely has a shot here. It doesn't have a runtime, and you can choose to use just the core libraries or even no standard libraries. It exports to the C ABI, so it's compatible with existing software and libraries. It isn't a garbage pile like C++. And it solves a real problem C has, code safety. It's silly to say any language will completely replace any other language, but I think Rust fits into the slot pretty well for the vast majority of cases where C is currently the best choice.

optymizer · on Nov 17, 2020

I tried to learn Rust 4 times. I'm sorry, but I can't get past the syntax.

Just _look_ at all these short keywords and special symbols. It's legitimately hard to read without focusing on each character.

    #[bla(foo)]

    fn print_refs<'a, 'b>(x: &'a i32, y: &'b i32) {
        println!("x is {} and y is {}", x, y);
    }

    impl<'a> Default for Borrowed<'a> {
        fn default() -> Self {
            Self {
                x: &10,
            }
        }
    }

This old fart thinks Rust is the new Perl.

coldpie · on Nov 17, 2020

Well, that is a particularly complicated example. I actually shipped my first Rust code just a couple months ago and it rarely used lifetimes. There is definitely a learning curve, but that true of any language. Rust's insistence on safety makes it less forgiving than other languages, but I think that's a plus. We have enough unsafe languages already, and we're paying the price for it now.

Here's a representative example of the code I wrote. This bit is outputting stuff to a file in some binary format: https://github.com/ValveSoftware/Proton/blob/proton_5.13/med...

E: If you want to take another stab, the way I learned Rust was the book: https://doc.rust-lang.org/book/ Actually type out every code example, it will help your fingers learn the "feel" of the language, and give you an opportunity to break things on purpose to test your understanding. Learning something new is always a challenge, but I really do like Rust and think it's worth the trouble.

optymizer · on Nov 17, 2020

Thanks for providing a real world example of code that looks fine in Rust. If you're not using lifetimes, do you still get all the safety guarantees?

coldpie · on Nov 17, 2020

Yes, definitely. You only need explicit lifetimes for when it's impossible for the borrow checker to figure out the lifetimes for itself. For example if one struct references another, the borrow checker can't know which struct should outlive the other. The vast majority of the time, the borrow checker can figure it out automatically without your help. It's discussed here[1] but note that it's Chapter 10 of the book, so it's probably not the best place to start reading if you're new to Rust :)

[1] https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html

E: There are only three instances of using explicit lifetimes in the entirety of the project I linked, if you want to see some real-world examples of it. In all of these, it is used to indicate that the struct being declared references another struct which must outlive that struct. That way we don't end up with a dangling reference if the referenced struct failed to outlive this struct.

https://github.com/ValveSoftware/Proton/blob/proton_5.13/med...

lalaithion · on Nov 17, 2020

You're always using lifetimes, but you rarely have to write them out. Writing out lifetimes is only needed when you have a particularly complicated function or data structure; usually the compiler can just compute what the right lifetime should be from the control flow and type of the function.

arcticbull · on Nov 17, 2020

You don't need to specify the lifetime annotations on print_refs because the references aren't returned and are unambiguously just the lifetime of the function. The compiler will fill those in for you -- even if the lifetimes of x and y are different, just so long as x and y live at least as long as the function.

  fn print_refs(x: &i32, y: &i32) {
      println!("x is {} and y is {}", x, y);
  }

In general you only need to specify lifetime parameters when there's an ambiguous situation. For instance, the following builds and runs just fine.

  fn make_substring_of(input: &str) -> &str {
    &input[1 .. ]
  }

I'll be the first to admit it takes a while to ramp up into Rust. Part of that is learning to let go and trust the compiler. Unlike many other languages, it won't hurt you haha. It's on your side. Sometimes you need to you know, give it a little more info so it can do it's job.

The Rust team has made huge strides in improving writability of the language, especially with non-lexical lifetimes.

Soon, `fn` and `impl` and `Vec` fade into the background, like `char` and `short` and `long`.

dkarl · on Nov 17, 2020

I think it's a subconscious habit of how your eyes traverse the code. Your mind has been trained to skip over a lot of those symbols and get the meaning from the surrounding code, and in the near term, it costs extra work to suppress that habit and actually see the symbols.

Over the long term, I don't think symbols are harder to read than keywords. C itself provides some evidence for that: imagine reading C code if experience with another language had conditioned you to look past * and &. That would be terribly confusing, but for an experienced C programmer, those symbols leap out of the code at you because you know that they carry a lot of meaning.

leetcrew · on Nov 17, 2020

aside from the lifetime annotations, how is that worse than C to an unfamiliar reader? the format specifiers for printf and friends are way worse than anything in your rust example.

optymizer · on Nov 17, 2020

keeping in mind that this is about syntax, here's what I think is actually worse (but I understand that everyone is different so if you like these, it doesn't really matter what I think):

#[] - why the [] if # already makes that line different from the usual code? seems superfluous

'a - is that thing next to 'a' a smudge on my display? Did I forget a quote? better wipe the display with my finger

'&a - "mut" and "ref" but ' ?

foo! - yelling out function calls. "print!" "exit!" "macro!". Angry Codes!

foo? - when you're done yelling, make sure to ask existential questions of the return result. This one I have the least problems with since it actually makes me question the return value ("hm, something's weird here, it could be null, pay attention"), but when coupled with the yelling, just makes the whole thing look dramatic. "do_it_now!(); did_we()?"

fn - by itself not a huge deal, but the list of truncated words that are used frequently is "impl", "mut" and "pub". I save some characters (am I really in such a rush?) at the cost of reading this broken English "f-n impl moot pahb". At least C doesn't have that. Pascal had "interface" "implementation", "begin", "end", etc. Java has "public", "interface", "extends", "class".

Default for Borrowed - suddenly English! No time to type 'function' or 'mutable' but "Default for Borrowed" is a-ok?

underscores all over the place in the standard library. Even C doesn't have that many, mostly in the _r variants that were added.

unwrap() - what does that word have to do with errors? (I know what it does) It wouldn't be my first choice.

I have a whole list of awesome stuff, meh-stuff and wtf-stuff I noted down about Rust while trying to learn it (on multiple occasions), and there are a lot of excellent things about Rust, but the aesthetics of the language are important, otherwise we'd all have no issues coding in Brainf*ck.

Yes it's possible to go nuts with syntax, but this just feels like the "programmer art" of language syntax. I think it's usable (clearly), it's just not elegant to me.

jcranmer · on Nov 17, 2020

> fn - by itself not a huge deal, but the list of truncated words that are used frequently is "impl", "mut" and "pub". I save some characters (am I really in such a rush?) at the cost of reading this broken English "f-n impl moot pahb". At least C doesn't have that.

int and char aren't exactly words. And let's not forget that a type like "double" makes no sense whatsoever by itself. (It's two of something, but two of what? Oh, it's "double-precision" floating point! How could I miss that?)

Now, let's look at C's standard library:

strcmp, strpbrk, isalnum, ispunct, setjmp, SIGSEGV, SIGFPE (that's the divide-by-zero exception, isn't it obvious?), SIGABRT

Sure seems like C has its own massive issue with "we can't let a name be long"

> underscores all over the place in the standard library. Even C doesn't have that many, mostly in the _r variants that were added.

va_arg, FE_DFL_ENV, int32_t, etc. Continue on into most C libraries, and underscores are pretty common because C has no other namespacing mechanism.

Syntax is pretty strange in any language if you're not used to it. If you're comfortable with C, C's syntax and spelling quirks don't stand out to you.

optymizer · on Nov 18, 2020

I fully agree with you. C, a language designed in 1972, has the issues you outlined.

I personally have higher expectations of a language designed in the 2000s by people standing on the shoulders of 40 years of computer science and language research.

Here's a contrived analogy:

Imagine if you bought the newest Tesla truck meant to replace old Ford model trucks and you had to vigorously shake the steering wheel in order to lower the driver's window.

You're saying: "Ford trucks have always used a hand crank and that feels strange too if you're not used to it!"

I'm saying: "Why did they choose to make it equally strange in the first place?"

arcticbull · on Nov 17, 2020

> Sure seems like C has its own massive issue with "we can't let a name be long"

Ahh yes, I believe that came about due to the great "Vowel Bowl" of the 1970s. There was a serious shortage of vowels.

hermitdev · on Nov 18, 2020

I realize you're joking, but probably some of it came from limitations of the Era. As recently as early 2000s, I had to use a 68k assembler that had a limit of something like 8 or 10 characters for a label. The really annoying part was you could have longer names, but it would silently ignore any extra. We're still stuck with other decisions that were driven by size constraints (like the separation of /bin and /usr/bin, IIRC).

int_19h · on Nov 19, 2020

It comes from the limitations of some early pre-standard C compilers. It was fairly common to use fixed-length arrays, to avoid heap allocations for perf and memory usage reasons. This included identifiers; but then you had to decide on the maximum size - since most identifiers would be fairly short anyway, so a large array would be wasted space.

Since the first edition of ANSI/ISO C was trying to codify the already-existing common practices for maximum portability, it reflects those existing limits (5.2.4.1 "Translation limits"):

"The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:

...

31 significant initial characters in an internal identifier or a macro name

6 significant initial characters in an external identifier"

If you look at many abbreviated function names from the C stdlib, they are specifically 6 characters long - strcpy etc. I wouldn't be surprised if the 6-char external identifier limit goes all the way back to the first K&R C compilers.

prewett · on Nov 17, 2020

When you have to manually wrap wires around iron cores for each individual bit of memory you get a visceral desire to keep things short... :P

steveklabnik · on Nov 17, 2020

Just a few small comments. Maybe understanding some rationale will help.

> #[] - why the [] if # already makes that line different from the usual code? seems superfluous

# does not "make that line different from the usual code." The whole #[] construct is it, it has nothing to do with lines. You could put "#[foo] #[bar]" on one line if you wanted, you could write "#[foo] fn lol()" if you wanted...

> foo! - yelling out function calls. "print!" "exit!" "macro!". Angry Codes!

This helps both humans and computers parse; macro invocations don't have to follow regular Rust syntax, and the ! helps indicate that that's true.

> Default for Borrowed -

This is not language syntax, this is the name of two types. You can name your types however you'd like.

leetcrew · on Nov 17, 2020

> #[] - why the [] if # already makes that line different from the usual code? seems superfluous

I don't even know what #[bla(foo)] does, or why all that punctuation is needed. maybe nesting is allowed?

> 'a - is that thing next to 'a' a smudge on my display? Did I forget a quote? better wipe the display with my finger

this is a lifetime annotation, which is a genuinely noisy bit of syntax.

> foo! - yelling out function calls. "print!" "exit!" "macro!". Angry Codes!

I guess they really want to make sure you know when a macro is being used. probably a result of ptsd from debugging c and c++ code :)

> fn - by itself not a huge deal, but the list of truncated words that are used frequently is "impl", "mut" and "pub". I save some characters (am I really in such a rush?) at the cost of reading this broken English "f-n impl moot pahb". At least C doesn't have that.

the c keywords aren't too bad, but the standard library is full of this kind of thing. stdio.h and string.h immediately come to mind.

I'm not so much defending rust as I am pointing out that c's syntax isn't that great to begin with. I've been writing c and c++ code every day for several years now, so it's usually pretty easy for me to skim and understand what is going on. but if I try and place myself in the shoes of a newcomer, I don't think the syntax is much better than rust. remember the first time you tried to parse the type of a nontrivial function pointer?

cocochanel · on Nov 18, 2020

lol I laughed harder than I should

StillBored · on Nov 17, 2020

Yah, my take on rust is that its needlessly verbose in places where it doesn't make sense (structures anyone?) and to terse in places where it needs more verbosity and the rules frequency don't make sense (implicit types). Much of it appears to just have been handicaps in the original compiler implementation. At this point someone should step away from the compiler implementation look at how people are working around the language and do a 2.0/"use strict" type thing where they redefine much of the language. Then like what happens with new C++ revisions let the compilers catch up.

icedchai · on Nov 17, 2020

I have the same problem. I found Rust too syntactically noisy.

turndown · on Nov 17, 2020

I only know Rust well enough to read simple programs, and I had no problem reading your example.

secondcoming · on Nov 17, 2020

Stop salting the syntactic sugar!

dnautics · on Nov 17, 2020

you might like zig.

coliveira · on Nov 17, 2020

> I think Rust genuinely has a shot here.

People proposing every other language that tried to replace C thought the same. Only time will tell, of course, but I wouldn't bet on it. Nowadays C++ is seen by many as a "garbage pile", the same can happen to rust.

johannes1234321 · on Nov 17, 2020

Thee is an important difference between C++ and Rust: C++ tried to play the game by being a superset of C (even started as a precompiler to C) and many C programs are valid C++ (well, many C programs aren't valid C, let's ignore that here) Thus C++ always carries C's legacy along so that there is always the C way of doing things, and then the C++ way of doing things. Rust cut that part of, while allowing to call C functions.

einpoklum · on Nov 17, 2020

Also, Rust has taken the path of backwards-incompatible changes, while the C++ language designers/committee go to great lengths to maintain near-perfect backwards compatibility with older versions of the language (and refuse to allow for ABI-breaking changes! Grr!)

MaxBarraclough · on Nov 17, 2020

> and refuse to allow for ABI-breaking changes!

Did you mean API? The C++ ABI situation is famously unstable.

einpoklum · on Nov 17, 2020

No, I did mean the ABI. See this paper, for example, about the argument:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p186...

"there is a non-trivial amount of performance that we cannot recoup because of ABI concerns. We cannot remove runtime overhead involved in passing unique_ptr by value, nor can we change std::hash or class layout for unordered_map , without forcing a recompile everywhere etc. etc."

MaxBarraclough · on Nov 17, 2020

Thanks for the link, interesting skim-read.

I thought passing template instantiations over an ABI boundary was generally discouraged and thought to be asking for trouble. I guess this isn't really at odds with what the paper is saying though - it could still be that a lot of people are doing so.

edit Thinking about Hyrum's Law, [0] mentioned in the article, makes me think perhaps there was an upside to Java firmly refusing to support any kind of ahead-of-time compilation for so long. It fully closed the door on any funny business distributing Java packages as brittle precompiled native-code blobs, ensuring the bytecode format remained the way that Java packages were distributed, presumably avoiding some fraction of the issues C++ now faces.

Of course, Java still has backward-compatibility obligations, but unlike in C++ they align pretty well with API compatibility, if I understand things correctly.

[0] https://www.hyrumslaw.com/

int_19h · on Nov 19, 2020

There's nothing particular about template instantiations that makes them any different from any other object. The only problem with using them at ABI (or rather, linker) boundary is that instantiations have to be explicit - but that's what "extern template" is for.

jcelerier · on Nov 17, 2020

> The C++ ABI situation is famously unstable.

is a couple ABI breaks for std::string between C++98 and C++20 that unstable ? You can write code that uses std::string today and links against a .so built a long time ago (modulo compiler bugs of course, thus the various versions here: https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Dialect-Optio...). GCC & libstdc++ go to great lengths to preserve ABI compatibility.

masklinn · on Nov 17, 2020

> Nowadays C++ is seen by many as a "garbage pile", the same can happen to rust.

C++ was seen as a garbage pile from its inception by a large number of people.

A significant number of people kept using C because the only alternative was C++ and that wasn't acceptable.

bluecalm · on Nov 17, 2020

Rust has a terrible, verbose syntax and solves a problem that is a non issue in many applications where C shines while making ordinary things difficult (like recursive data structures).

I am sure it has its place but I think it's just too ugly to be attractive. There is something pleasing about writing C, Python or even Javascript which you will never get with Rust. It will never be a language a lot of people enjoy writing imo.

arcticbull · on Nov 17, 2020

Ugly is a matter of opinion of course, but once you start to write it you get a sense of what "looks good" and what "looks bad" and you start to write nice looking Rust if you want to. I can show you some powerful ugly C haha.

0xffff2 · on Nov 17, 2020

I would argue that Java and C# completely succeeded. Who is out there writing Desktop applications or web services in pure C anymore?

C++ too to a lesser extent. I work on spacecraft flight software, and there's a significant push to move from pure C to modern C++.

No single language is going to (or wants to IMO) replace C in every single use case, but replacing C in specific use cases has been a huge boon for productivity.

cogman10 · on Nov 17, 2020

Rust, I'd argue, is the language trying to really take on most of the C use cases. The only one it's really not trying to take over is old embedded work where C is pretty much the only option.

Otherwise, I can't think of any circumstance Rust isn't trying to muscle in on C and C++'s territory.

einpoklum · on Nov 17, 2020

I like C fine, and I also taught it when I was in grad school.

New languages mostly don't replace existing ones. Rather, they supplant them for some uses, and open up new kinds of software which are easier to implement or to conceive with the new language. Now, you referred to C++ specifically, and since I'm somewhat familiar with it I'll address some of the points you made with respect to just that one:

C++ was not intended to "replace C", but rather to combine features of BCPL (later C) and Simula. See: https://www.youtube.com/watch?v=69edOm889V4

Bjarne Strousup said: "If you want to create a new language, a new system - it's quite useful not to try to invent every wheel." For a long while now, C++ teachers/trainers encourage their audience _not_ to think of C++ as an "augmented C" or "C with feature X Y and Z", and to avoid most "C-style" code in favor of idioms appropriate to what the language offers today.

Also, C didn't "win the competition" because there isn't a "bestest language for everything" competition. It has been, and is, a popular language with many uses. Writing operating system kernels is one kind of programming task, where C is the most popular. Even at this level (and lower still), other languages are potentially interesting and often used. See, for example:

* IncludeOS - Running C++ without an operating system: https://www.youtube.com/watch?v=cQPrtTsM7Zg * Generating optimal assembly in an embedded setting at (C++) compile time: https://www.youtube.com/watch?v=CNw6Cz8Cb68

Finally, C++ doesn't require a huge library nor a complex runtime system because it has a "freestanding mode" in which the requirements are very limited (although more is required than for C). See: https://en.cppreference.com/w/cpp/freestanding

WalterBright · on Nov 18, 2020

If you use D as Better C, it does not require anything other than the existence of the C standard library. Here's "hello world" in DasBetterC:

    import core.stdc.stdio;
    int main() {
        printf("hello world!\n");
        return 0;
    }

and yes, it is calling C's printf. To build it:

    dmd -betterC hello.d

young_unixer · on Nov 18, 2020

Thank god for C winning that battle.

Imagine a world where most the languages follow Lisp's syntactic conventions, or Pascal's. What a nightmare.

int_19h · on Nov 19, 2020

When it comes to declarations, the world is already moving over to Pascal's syntactic conventions in some ways. In newer languages, you see something like this:

   var x: *int;

more often than:

   int* x;

i.e. type name follows variable name, and type modifiers work more like unary prefix operators on types. And this is because it's less ambiguous to parse, and makes more complex types a lot easier to read, since you simply go left-to-right, instead of C's "spiraling" declarations.

The rest of Pascal's syntax is not particularly problematic, either. I'd say that the two biggest problems with it were begin..end for blocks, and having all local declarations in a block separate from code. But Modula-2 already dropped "begin", and various Pascal dialects added inline variable declarations eventually. So, on the whole, I think we'd actually be better off in terms of code readability if Modula-2 rather than C became the standard systems programming language.

Gibbon1 · on Nov 19, 2020

Thing that makes me somewhat sad is there isn't much reason why you couldn't fix some of C's syntax and other issues. I feel like any breaking changes wouldn't be that big of deal as long as you have a backwards compatible ABI.

Say you add fn and var as keywords. Exactly how hard would that be to fix? You could probably write a tool to do that.

Banana699 · on Nov 18, 2020

As opposed to C's syntactic conventions, the absolutely gorgeous mush of *s, &s, __MACRO_LIKE_FUNCTIONS___() and #preprocessor directives

The default is not the best, the default has just beaten the world so many times on the head that anything else became foreign and weird and got laughed out of the room before it even had the chance to say anything.

One programming language history book/blog/paper a day keeps the nonsense notions away, C "Won" like people win the lottery or the roulette.

arcticbull · on Nov 17, 2020

> Trust the programmer.

Programmers have been show, time after time, not to be particularly trustworthy. Have we not learned the lesson that it's really easy to make mistakes, and we should trust tools instead of people to check our work?

> Don’t prevent the programmer from doing what needs to be done.

Ditto the above.

> Keep the language small and simple.

It is in some ways, but it's "smallness" leads to a serious lack of simplicity as seemingly simple things are incredibly hard to do right consistently. For instance, avoiding indexing past the end of an array or rolling over an integer.

> Provide only one way to do an operation.

That's nice, I'll grant you, although there are of course exceptions that prove the rule, like:

  a[b]

is synonymous with

  *(a + b)
  *((uint8_t *)a + (b * sizeof(*a))

> Make it fast, even if it is not guaranteed to be portable.

Well... I mean...

GeorgeTirebiter · on Nov 17, 2020

The funny thing about rolling over an integer not being easy to catch is hard to ignore. The PDP-11 'add' instruction sets the 'C' bit if there is a carry from the MSB; the Z bit is set if the result == 0; the N bit is set if the result < 0; and the V bit is set if there is arithmetic overflow (both operands were of the same sign, but the result is of opposite sign). By simply making these bit available as, say, special names that could be tested (e.g. C N Z V) after an operation, you could determine what happened (if appropriate) and take action. HP's SPL had such a feature (used on the HP3000 series). C is not a well-designed language, but an improved 'lifeform' along the way. Today, I would like to see a c-like language developed specifically for RISC-V; especially one that had many many fewer edge cases.

loup-vaillant · on Nov 17, 2020

> Today, I would like to see a c-like language developed specifically for RISC-V

When I've learned that RISC-V had no carry bit, I couldn't help but think it might have been designed for C to begin with. Sure, they give reasons for this choice, none of them linked to C. Still, it hurts multi-precision arithmetic any language with BigInt would have benefited from (I recall Python, Haskell, and Scheme at the very least).

I'm no hardware designer, though.

arcticbull · on Nov 17, 2020

What's so hard about:

  if (((a > 0) && (b > 0) && (a + b) < 0) ||
      ((a < 0) && (b < 0) && (a + b) > 0)) { / * Overflow */ }

Now you may be saying, well, isn't there a branch-if-arithmetic-overflow instruction in practically every single architecture ever? To which I would say simplicity matters.

[edit] </sarcasm>

jcranmer · on Nov 17, 2020

That is undefined behavior.

arcticbull · on Nov 17, 2020

Well son of a bucket, you're right. What about this?

  T const highOrderBitMask = (1 << ((sizeof(T) * 8) - 1));
  T const hobA = (a & highOrderBitMask);
  T const hobB = (b & highOrderBitMask);
  T const hobR = ((a + b) & highOrderBitMask);
  if ((hobA == hobB) && (hobA != hobR)) { /* Overflow */ }

Or is it undefined behavior to assume a 2's complement representation also.

[edit] Yep, 2's is only required for C++20, not C, and worse, it's permitted for an implementation to trap on overflow. Back to the drawing board.

jcranmer · on Nov 17, 2020

It is undefined behavior if a signed integer arithmetic overflows. If T is signed, the first and fourth lines are undefined behavior.

The representation of signed numbers is implementation-defined, not undefined.

arcticbull · on Nov 17, 2020

Ok, I think I've got it this time -- and incidentally, a new interview question.

  if (((a > 0) && (a > INT_MAX - b)) ||
      ((a < 0) && (a < INT_MIN - b))) { /* Would Overflow */ }

Not standard, but not undefined either are the checked intrinsics:

  __builtin_add_overflow(a, b, &x) == false

Thanks, that was a fun exercise :)

jcranmer · on Nov 17, 2020

IMHO, C really should just standardize the standard checked overflow intrinsics. It's a lot saner than having users try to guess the correct overflow matching pattern (it's worse for multiplication), and many architectures make detecting overflow pretty trivial.

MaxBarraclough · on Nov 17, 2020

You might like this blog post, We Need Hardware Traps for Integer Overflow.

https://blog.regehr.org/archives/1154

MaxBarraclough · on Nov 17, 2020

I think I might be missing the intent of this exercise, but that expression risks overflow (and undefined behaviour) when b is not equal to zero, right?

arcticbull · on Nov 17, 2020

Ohhh, I think it also needs to check if (b > 0) in the first arm and (b < 0) in the second arm. -_-

MaxBarraclough · on Nov 17, 2020

You might be interested in a related challenge: write a function in standard C++ that returns the difference between any pair of int32_t values. This cropped up on StackOverflow. It's tricky enough to trip up the incautious.

https://stackoverflow.com/a/61711253/

mlyle · on Nov 17, 2020

INT_MIN + b right?

asveikau · on Nov 17, 2020

> Provide only one way to do an operation.

I thought this was funny because these two are the same:

    a[i]
    *(a + i)

Likewise all of these are the same:

    (*s).x
    s[0].x
    s->x

As are these:

    if (x)
    if (x != 0)

But I guess the point can be applied elsewhere.

jcranmer · on Nov 17, 2020

Don't forget that these two are the same:

   0[x]
   x[0]

mlyle · on Nov 17, 2020

So are a+b and b+a. ;)

huhtenberg · on Nov 17, 2020

Unless one of them is a sneaky macro. Ha!

Inityx · on Nov 18, 2020

Sure, but addition is commutative everywhere...

mlyle · on Nov 19, 2020

Sure... I'm just pointing out that reordering operands on a commutative operation doesn't really count as "multiple ways to do things."

int_19h · on Nov 19, 2020

The gotcha is that the operation is commutative on type system level. C could - as many languages do - restrict the type of the left operand of [] to "something indexable".

mlyle · on Nov 19, 2020

Oh, sure, I'm not arguing that 0[a] isn't "weird". I just don't think it's an example of multiple ways to do something.

timw4mail · on Nov 17, 2020

As a programmer, I'm not very fond of C, but C has two huge advantages over other languages:

* It's everywhere

* There's a standard ABI

jcelerier · on Nov 17, 2020

> * There's a standard ABI

there's one ? the "C" ABI is just the ABI of whatever platform it's running on, which may or may not have funky behaviour that vendor-provided compilers kindly hide for you - functions being prepended with '_' on macOS, the two-dozen calling conventions on windows with i386, sysv and itanium ABIs...

Do you think you can tell what's the ABI of

    struct foo some_function(struct bar);

?

will bar be passed in a register, on the stack ? who knows, that depends on your platform, your compiler, etc etc. Things going cleanly on the stack is just a convenient lie that your first year comp. sci teachers tell you because it's too early to talk about how the real world works yet.

pavlov · on Nov 17, 2020

> the "C" ABI is just the ABI of whatever platform it's running on

But typically the C ABI is the only stable ABI those platforms provide. That's a huge benefit.

brandmeyer · on Nov 17, 2020

Apple's custom AArch64 ABI has entered the chat...

https://developer.r-project.org/Blog/public/2020/11/02/will-...

https://developer.apple.com/documentation/xcode/writing_arm6...

setpatchaddress · on Nov 17, 2020

"platform", not "ISA". Windows on Intel doesn't use the same ABI as linux.

StillBored · on Nov 17, 2020

Windows doesn't even have a single calling convention...

https://docs.microsoft.com/en-us/cpp/cpp/argument-passing-an...