Hacker News new | comments | show | ask | jobs | submit login
Some Were Meant for C: The Endurance of an Unmanageable Language [pdf] (cam.ac.uk)
165 points by ingve 97 days ago | hide | past | web | favorite | 240 comments



This is a long paper and the author has 2 main claims:

1) C Language popularity is more to do with cognitive ease of memory addresses as a conceptual model for inspection and change. Author claims memory address mental model overshadows runtime performance.

2) switching to "safe" languages like Java/C#/Rust is not necessary. With no changes/violations to existing C Language specification, a new/different implementation (compiler) can add more runtime safety checks similar to managed languages. An example from the paper:

>Consider unchecked array accesses. Nowhere does C define that array accesses are unchecked. It just happens that implementations don’t check them. This is an implementation norm, not a fact of the language.

Those 2 ideas look orthogonal but he ties them together at the end.

I'll take some poetic license (e.g. a little exaggeration) to reword the author's idea to help spur discussion...

Consider the idea of the Sufficiently Smart Compiler[1] that claims that a "slow" and "high-level" language like Python/Ruby could be theoretically analyzed and compiled to be as fast as C or handcrafted assembly.

In a way, the author is coming from the opposite direction. If you had a "Sufficiently Smart Runtime" for a new C Language compiler implementation, it could (theoretically) do all sorts of extra checks and bookkeeping that wouldn't require any changes to C source code and wouldn't violate the existing C Language standard. (E.g. Imagine a new C runtime that did many checks similar to Valgrind + UBSAN + ASAN + debugger memory fences, etc.)

Would the program execution be slower? Well yes, but that's not really an issue because according to author's claim #1, what programmers really like about C is the mental ease of accessing memory addresses. The performance is important, but it's a secondary benefit -- according to the author.

[1] http://wiki.c2.com/?SufficientlySmartCompiler


Excellent, I think the author would do well to re-frame the question as you have. If nothing else to put it more clearly into the space of provable compilation.

When I transferred into the "Oak" group that later became the "Java" organization, the team I was on was looking at whether or not you could write an OS in Java sort of in spite of its safety rules. This sort of concept has been revisited by Rust with its safe/unsafe modal operation.

What both of those efforts have in common is that determining safety may be impossible at the construct level but provable if you were to exhaustively search all possible outcomes.

What the paper and your comment add to the discussion is the intriguing idea that you could create a 'safe' backend (say the equivalent of the JVM) as a target for a C compiler. And code that could not be compiled would be flagged for later analysis. Much like VHDL can express hardware that cannot by synthesized you might end up with a C compiler that could compile code that could not be executed. I could be fun to spend a bit of time poking around that rabbit hole.


It's a very intriguing idea and you may be pleased to know that such research lines are still being explored:

http://ssw.jku.at/General/Staff/ManuelRigger/ManLang17.pdf


The idea you can have a safe c compiler or runtime seems totally absurd to me. Why is any of this even considered seriously?


Indeed. Even the "obvious" example of the compiler inserting bound checks in the generated code does not work with the well-known method of marking the beginning of a variable-length memory block at the end of a struct using an array of some fixed size, say, 1 (or even 0).


For that situation, you would need to mark the end of the variable-length memory block at run-time.


The problem is that it isn't a new idea. People keep trying it as shown below. Unfortunately, C wasn't designed so much as a modified version of something (BCPL) that was the only collection of features Richards could get to compile on his crappy hardware. It's not designed for easy analysis or safety. So, all the attempts are going to hit problems in what legacy code they can support, their performance, or even effectiveness if about reliability/security in a pointer-heavy language. Compare that to Ada, Wirth's stuff, or Modula-3 to find they don't have that problem or have much less of it because they were carefully designed balancing the various tradeoffs. Ada even meets author's criteria for safe language with explicit memory representation despite him saying safe languages don't have that.

To back that up with references, first is a bunch of attempts at safer C's or C-like languages with performance issues. The next two are among most recent and practical at memory safety for C apps far as CompSci goes. The last one is an Ada book that lists by chapter each technique its designer used to systematically mitigate bugs or vulnerabilities in systems code.

https://pdfs.semanticscholar.org/a890/a850dc78e65e26f8f4def4...

https://llvm.org/pubs/2006-06-12-PLDI-SAFECode.html

https://www.cs.rutgers.edu/~santosh.nagarakatte/softbound/

http://www.adacore.com/uploads/technical-papers/SafeSecureAd...


1% inspiration, 99% perspiration. Needs more sweat.


> mental ease

I'm a long time C programmer, and I was struck by how clumsy and error-prone any manipulation of C strings turns into. It's really hard to look at a mass of strlen/strcpy/memcpy/etc. and see just what is happening. Contrast that with, say, BASIC or Javascript, where string manipulation is easy, natural, and bug-free.

I'm going to disagree about the mental ease of programming in C, and a large part of that is difficulty in building useful abstractions around the pointer model.


I think the mental model isn't the issue, it's that the C Standard Library is very anemic. When writing a C application either you're using a big library like APR or GLib or you're rolling your own, and since rolling your own is a pretty big, complicated, and fraught proposition it's no surprise bugs creep in. Furthermore you can't really interop with other libraries if they also rolled their own data structures because theirs probably aren't like yours. Consequently libraries tend not to do that at all, setting for things like NULL-terminated lists and special, opaque data structures.

I feel like if someone wants to throw C a life vest, they should start with a meaningful standard library that engineers can build on to provide functionality we pretty much consider standard now (HTTP libraries, JSON libraries, database libraries) with a consistent interface.


Completely agree. It's like when people say programming Python is fast. No it just has almost everything pre built and you glue it together.

C's standard library is just sad.


That particular problem (strlen/strcpy/memcpy) comes from the problems of the standard library string functions. It can be solved by creating your own string library. Then string manipulation is easy.


That falls over as soon as you integrate with anybody else's C code, including the operating system APIs, and with C string literals :-(

If it was as easy as you say, it would have happened.

And heaven knows I wrote my own string packages, one after the other, and so did everyone else. I eventually abandoned all of them. C's abstraction abilities are simply not good enough to do a decent string encapsulation.


No other language solves this perfectly either, certainly not in a way that interoperates _across_ languages and environments.[1] Which is pretty much the whole point of the article. But what C excels at is the ability to write code which can examine and work with the representation of most string-like objects exported from any environment. The difficulty of doing so is a function of how opaque and complex the alien implementation.

I gave up on trying to solve strings in C applications a long time ago, too, much as you have. I did so not because I found C too inexpressive, but because I realized that I was trying to shoe-horn too many concepts into a "string". A string is almost by definition the wrong data structure--either too abstract or not abstract enough--for almost everything. Not coincidentally, that was about the same time I stopped abusing regular expressions for parsing data.

[1] Even C++ didn't solve this. We're still in the midst of a std::string ABI compatibility break in the C++ ecosystem. Granted, it's been about 12 years since the last one, but these last fairly long because systems software (i.e. infrastructure software) has a really long tail.


Not to mention that in C++ there are plenty of string implementations predating std::string (e.g QT's QString, ROOT's TString)


shrug It doesn't fall over. I've done it, the openBSD team has done it. DJB has done it. Maybe something is wrong with your implementation that I can help you with?


I'm curious. Got links?


OpenBSD takes a fairly minimalist approach, which is vaguely described here: http://www.freebsdforums.org/forums/showthread.php?threadid=... They basically replace the unsafe functions with things that are easier to use. Their idea is that it isn't the format of the C-string that causes security issues (null-terminated string), it's the poorly defined functions (with weird corner cases that are hard to get right). It's worked well for their use cases.

DJB did something similar in qmail, I don't recall the details but you can look at the source code as easily as I can, and it eliminated security problems.

When I'm working in Java, I find that most of my string parsing uses the split() function. This is a pain in C, because even if you had a split() function you'd need to deal with memory allocations. Most of these are solved with a memory pool. In my own library, I also added runtime, grammar-based parsing functionality. So to parse a CSV line you might do something like this:

    char *g = " S   -> WORD | WORD , S;"
              "WORD -> [^,]";
    results = parsegram(g, inputString);
Grammar parsing + memory pools makes string parsing in C easier than in Java. The biggest difficulty with this kind of library is to do it right, you need to be something of a unicode expert, and that's tough.


I used snprintf(), too, but it is only a minor improvement. Problematic in C is something as simple as concatenating strings:

    Mystring s,t;
    t = "hello";
    t = cat(s,s);
    t = cat(s,s,s);
    t = cat("hello",s);
    t = cat(s,"world");
    t = cat("hello","world");
Even such a simple use case is fraught with major problems:

1. who allocates needed memory?

2. who free's it?

3. can the compiler constant fold cat("hello","world") ? Does the result wind up allocating memory anyway?

4. what about the lack of function overloading to handle the permutations?


Here's roughly what that would look like using Bernstein's C string library (which was not only used in qmail).

    #include "stralloc.h"
    ...
    static stralloc s, t;
    ...
    if (!stralloc_ready(&s, 0)) die_nomem();

    if (!stralloc_copys(&t, "hello")) die_nomem();

    if (!stralloc_copy(&t, &s)) die_nomem();
    if (!stralloc_cat(&t, &s)) die_nomem();

    if (!stralloc_copy(&t, &s)) die_nomem();
    if (!stralloc_cat(&t, &s)) die_nomem();
    if (!stralloc_cat(&t, &s)) die_nomem();

    if (!stralloc_copys(&t, "hello")) die_nomem();
    if (!stralloc_cat(&t, &s)) die_nomem();

    if (!stralloc_copy(&t, &s)) die_nomem();
    if (!stralloc_cats(&t, "hello")) die_nomem();

    if (!stralloc_copys(&t, "hello")) die_nomem();
    if (!stralloc_cats(&t, "world")) die_nomem();


Yes, that does work. But it's not without problems, not the least of which it's just not attractive to look at. For example, concatenating "hello" and "world" allocates memory, when it should instead give you a "helloworld" string literal. In fact, simply initializing `s` with a string literal needlessly allocates memory, and that's anti-ethical to performance. Calling die_nomem() leaks memory if it does anything but terminate the program. All those tests for memory exhaustion are tedious.


> Even such a simple use case is fraught with major problems: > > 1. who allocates needed memory? > > 2. who free's it?

That's also a major feature. It allows people to write systems that are resilient in the face of tight memory limitations. It's not cool when a language forces string operations to allocate & duplicate memory willy-nilly.

> 3. can the compiler constant fold cat("hello","world") ? Does the result wind up allocating memory anyway?

I fail to see how that's a major problem. Why are you concatenating string literals? How common is that?

> 4. what about the lack of function overloading to handle the permutations?

I consider lack of overloading to be a feature. Overloading is one of the things that are way too easily abused, and it makes code auditing harder than it needs to be. Please just type out the different function names so I can see exactly what is going to be called when I read the code. Or use the sprintf family of variadic functions.


It's the opposite. I've seen lots of code written in C that pretends to be out of memory safe. I've never once seen such a program that actually is out of memory safe. Invariably the codepaths triggered by malloc returning null are never exercised.

With a GC and exceptions you can theoretically be quite resistant to OOM conditions, not that anyone really cares.


> I've never once seen such a program that actually is out of memory safe. Invariably the codepaths triggered by malloc returning null are never exercised.

sqlite takes care to correctly deal with out of memory conditions. It has explicit tests for that code too. See section 3.1, Out-Of-Memory Testing, of [1].

[1] https://sqlite.org/testing.html


Now I found my first program that actually tests it properly :)

I knew you had to systematically drive the code through every OOM codepath to even have a shot at doing that in an unmanaged language. Sadly a lot of C code is written by people who think:

    if ((ptr = malloc(sizeof(struct foo))) == null)
        return -1;
is the same thing as being OOM safe.


One of the things with tight memory systems is that you don't use malloc to begin with, if you can avoid it. C gives you the option.

When you're concatenating strings, you already have storage for those strings. Maybe you can re-use that storage. Maybe you have a static buffer. Maybe you have a fixed size buffer on the stack and the stack use is bounded.

A language that forces you into making redundant duplicates onto the heap is terrible in these situations.

And yes there are programs that try to deal with failing mallocs. Again, C gives you the option.


Very, very few C programs can handle running out of disk space. This includes the operating system(s). Get close to filling up the disk, and try various things.

Just recently, I was having a lot of trouble with Windows Update hanging. I finally noticed that free disk space was low. Freed up more space, and WU started working again.

For fun, try:

    #include <stdio.h>
    int main() { printf("hello world\n"); return 0; }
and redirect stdout to a file on a device that is full. Amazingly, it succeeds!


I assume you're referring to OpenBSD here, they didn't use snprintf(). They used asnprintf(), which solves the problem of who should allocate (but not who should free).


From the link:

"That means that we have been going through the tree cleaning out all calls to sprintf(), strcpy(), and strcat(). Instead, these things are being rewritten to use asprintf(), snprintf(), strlcpy(), and strlcat()."

Maybe the author made a typo.


Oh yeah, you're right.

Another thing I've done that will work if you have a lot of strcat(), is make a string struct:

    ktString {
       int len;
       int memlen;
       char *str;
    }
It keeps track of the string's actual length, and the size of the underlying buffer. Then you can 'override' the various string functions:

    bool ktStrcat(ktString s1, ktString s2);
    bool ktSprintf(ktString s1, ...);
These functions will take care of buffer-size checking, and reallocation if necessary. For cases where you need to interface with pre-existing libraries, you can return a cstring(). Make it a function/macro to enable you to change the struct definition in the future:

     #define ktCstr(x) (x)->str

then you can pass it into write() or whatever you need:

    write(sock, ktCstr(s), s->len);


... and end up with silent truncation unless you happen to always remember to use only C library functions with explicit length arguments (and which do not assume NUL-terminated strings).

Look, I get that there is a place for C, but string manipulation is absurdly bad and error-prone.


Hi! I can't imagine how you understood what I wrote. I specifically said to not use those C library string functions.

I fully admitted that string manipulation is absurdly bad and error-prone, then built on that by showing a way to make it better. Use ktStrcat() instead of strCat(), then you don't have to worry about truncation. Use ktSprintf() instead of snprintf(), then you don't have to worry about truncation. I wish you had understood.


Yes, I agree. If everyone would just avoid those C stdlib functions everything would be peachy. :)

I was agreeing with you, but just adding caveats. :)

Well, except... some problems surface when interfacing with "things" (libraries, OS'es) written by other people... and there's no escaping those problems, fundamentally. It's C. Of course UTF-8 was invented with the express purpose of being "C-compatible", but... what happens if you have a string with a NUL in it and you pass that to the POSIX (I think?) printf function as an argument for a "%s" format string? Well, it gets truncated. Did you mean for that to happen, or didn't you? Who knows? That's the problem.

Honestly, I'm not trying to win "internet points" or something. It's just that C, as I'm trying to point out, is a bad language for almost everything that's required for a "user-facing" languages these days. Write the thing in C#, Java, O'Caml, Qt[1], or Haskell, or whatever... but please don't think you need to write in a sort of weird approximiation of the old PDP.

[1] Yeah, yeah, not a language, but it's at least an ecosystem that seems to be moderately successful.


This problem was actually solved, but almost nobody uses it. Safe variants of most of those string, memory, io, wchar, stdlib and misc functions are defined in the C11 standard Annex K (finally after 9 years), but nobody is using it, and rather propose to keep using known unsafe variants like the truncating versions with an n. Like snprintf and not the safe variant sprintf_s. glibc, bsd, darwin, musl, newlib: nobody cares to implement the safe bounds checking variants. They solely rely on the compile time size checks, which fail to check any dynamic boundaries. Only Microsoft, Android, Cisco and Embarcadero implement the safe libc functions. I recently took over Cisco's safelibc (MIT licensed) and extended it to more platforms, all C11 api's, and an improved testsuite. And boy was I surprised to find so many missing API's, upstream libc bugs and wrong API's everywhere. Flawless were only musl and the BSD's. But musl is lacking with it's errno and of course zero C11. Only ReactOS has a proper testsuite for their libc. Glibc is somewhat ok, but I still find crashes daily.

https://github.com/rurban/safeclib

So why is nobody else implementing C11? I'll write a blog post when I finished my C11 efforts. Maybe at least FreeBSD will take it then.


Annex K is not safe, just pretends to be.

By tracking the pointer and sizes as separate function arguments, the possibility of mixing parameters, leading to memory corruption is still there.

This is the major motivation why almost nobody uses it and it was made into an optional annex.


No. The major motivation not to use it was _FORTIFY_SOURCE with it's compile checks for compile-time known buffer sizes and it's accompanying _chk functions. This leaves out all dynamic buffers.

You cannot mix PTR + LONG args without serious compile-time errors


I don't have any idea how _FORTIFY_SOURCE works, other than it is GCC specific and as such no place in ANSI C.

What I know is that having something like strcpy_s() does not provide any actual safety, because with the prototype "strcpy_s(char * restrict s1, rsize_t s1max, const char * restrict s2)" there is no guarantee that s1max is a valid size for s1.


This is what the _chk functions do. In most cases it know the compile-time size of s1. But in dynamic cases the _s functions are far better than the truncating 'n' versions. Read the rationale.



It's not just the mental ease, it's also the physical typing ease (and in some cases, the possibility).

For example, he points out that to connect C to existing parts of the system (which is the OS and OS level tools), all you have to do is call the functions. If you want to call a C library from a Java program, it's a lot more work. Furthermore, C has the capability of understanding Java structures (although it's awkward), but Java has no way of understanding C structures from within the language. There is no way to model a driver I/O port in Java, but in C there is.

The paper is worth thinking about. If you are creating a language, take interoperability between already existing languages into consideration. JNI is ok, but think how much better it could be if it did auto-marshalling of objects!


That's not inherent to the Java language. You can implement garbage collectors and kernels in Java if you extend the JIT compiler:

http://jnode.org/


I've been out of the C world for a long, long time but it seems to me that anywhere that C's pointer arithmetic and ability to cast pointers to/from other types is objectively appealing, that's going to be one of the cases that a compiler can't understand.

Of course there's always the subjective "everything looks like a nail" usage as well, which makes every problem seem like a pointer problem because you've never tried to think of them as anything other than a pointer problem. I'm sure you could cater to that usage with a proper runtime but really, it doesn't hurt to try new things sometimes...


In my case, nearly 100% of the C code I write is for embedded systems. Casting a hex literal to a pointer type that is a volatile hardware register is better than dropping into asm....

So yes, compilers will always have a hard time understanding device drivers and such unless you turn hardware device concepts into language primitives.


A more correct thing would probably be to create linker scripts that expose symbols for the registers. It's probably not worth the trouble now but the hypothetical compiler would understand it better.


There still needs to be a description of the underlying hardware behavior somehow. The hardware engineers often give you a somewhat correct Excel sheet or force you to look at the HDL to figure it out.


C's popularity is due to the fact that it is predictable within certain bounds (single thread or limited concurrency).

No GC pauses, no weird runtime crashes due to a strange constructor, no gigantic exception chains, etc.

The only languages in the TIOBE index that can even try to make that claim are: C at #2, C++(if you subset it) at #3, Objective-C/Swift(#18/#11), Assembly at #14, Ada at #29, and maybe FORTRAN(#35).

That's not a lot of options if you need runtime predictability. Basically C, C(with additions), C(with additions), assembly(hack, spit), Ada (okay), and FORTRAN (God help you).

Even now, that means C or Ada--and the first free Ada compiler was 1992.

Yes, Rust is coming. But it's got a way to go yet.


The idea that C is predictable is in my view a sign of someone who hasn't got to know C really well.

The trends around undefined behaviour will hopefully put a bullet in the head of this idea for good. It's extremely hard to look at C and reason about what an optimising compiler will turn it into.

Malloc is not more predictable than a GC pause. Both malloc and free can take unpredictable amounts of time. If anything it's less predictable because modern GCs at least have pause time targets, but mallocs never do. You just don't notice it because people don't tend to measure malloc latency. In turn that's because malloc pauses only affect memory allocation operations, they don't stop every thread, which is a benefit it's true, but it's not about predictability and more about UI latency.

C not having exceptions doesn't make it more predictable. It just means that if something goes wrong you get a useless and probably corrupted core dump. The number of times I've been able to fix a bug in a piece of managed code given only a stack trace from the end user is huge. The number of times I've been able to fix a bug given "Segmentation fault" with no other info is zero.


> The trends around undefined behaviour will hopefully put a bullet in the head of this idea for good. It's extremely hard to look at C and reason about what an optimising compiler will turn it into.

Sure when you turn on -Oinfinity. Nobody does that in embedded unless they are hard pressed on some metric (RAM size, generally, or CPU flops occasionally).

Overall, though, C is really fairly predictable. Unsigned arithmetic does what you expect--the fact that signed arithmetic doesn't under higher optimizations is a fairly recent phenomenon (and not an uncontroversial one). Variables go where you expect. Pointers act like you expect. Casting and precedence sometimes sneak up on you, but parentheses generally manage that.

Const has issues at the boundary cases. Trying to stuff something into ROM and then telling the rest of the system that "really-no-you-cant-cast-that" can make things tricky with "incompatible pointer" issues.

Floating point arithmetic, though, is just a disaster.

> Malloc is not more predictable than a GC pause.

Ayup. And what's the first thing real-time embedded folks do? Throw out malloc (which is library, not language, but that's pedantic). Real-time-embedded systems tend to allocate all memory statically, up-front. Or they use a custom malloc that they control the behavior of.

> C not having exceptions doesn't make it more predictable. It just means that if something goes wrong you get a useless and probably corrupted core dump.

Predictable and useful are orthogonal.

And, the fact that I can't attach to running state of a crashed program is a failure of TOOLS not the language. The fact that I can't attach to a system that crashed, examine the state, fix what I need to, and continue is a fault of the people who make C IDE's. There is no reason other than lack of monetary incentive that this cannot be done.


C's popularity is due to the fact that it is predictable within certain bounds (single thread or limited concurrency).

Your post, and reading a discussion further down about Rust's reference counting, has made me realize something primitive that Rust is getting right--a real move forward--which even those who don't enjoy the default "safety switch" being flipped from C (like me) may agree.

The C model for memory in time and space is so clean for heap data and the function call stack for one thread (plus global registers), but C has no community-understood/concurring model when it comes to concurrency.

Rust, older C++ libraries, C malloc implementations, and other are all alluding toward the simple memory model for multiple threads, which is reference counted pointers, IMO. Basically use a separate type of pointer for heap data, where the max size of the heap is divided by whatever binary power of 2^p processors exist.

Rust folks or other languages are welcome to add more ownership semantics or whatever, but the whole family of languages could benefit by this extension to the lingua Franca of C. We may not even need to add a new nominal pointer type to C, just by fiat understand and expect shared, free store objects to always live inside the lowest 1/p th portion of the word address space.


Once upon a time it was common to write non-consing Lisp code precisely in order to get predictable behaviour; I think that it worked pretty well. Non-consing code won't have GC pauses; it won't have weird runtime crashes; and it probably wouldn't have gigantic exception chains unless it needs them.


The fact that there are these other languages with the same properties means that predictability isn't the real reason, right? It's that it also is sparse in its specification and easy to implement a compiler for.


Please name those languages. I'm serious.


Ada, Modula-2, Pascal dialects, Algol 68, PL/M, PL/8, PL/S, NEWP...


Isn't this a circular argument: C is popular because no other language on the top popularity chart does <x>.

There are many languages with the these properties and better safety, but they aren't popular like C.


Really? I'd love their names. I'm not being snarky here.

I'd love to have a nice language alternative to C.


I'm going to link to pjmlp's comment who knows more about these: https://news.ycombinator.com/item?id=14700251#14701140

I was more referring to the historical perspective of how C became popular, many have fallen by the wayside. Though there are certainly current alternatives to C besides those on TIOBE.

Also there are real-time extensions to current GC'd languages like Java.

(Though standard C isn't very predictable timing-wise either, or suited to real-time work)


And, with the exception of Ada and Pascal, most of those language have been dead for at least 20 years--for various good reasons.

And, please do remember that Apple switched away from Pascal when writing its operating systems in spite of an enormous code base. That's pretty damning--apparently C's "undefined behavior" didn't seem to matter.

So, we're back to: the only alternative to C is Ada.

> Though there are certainly current alternatives to C besides those on TIOBE.

Let me make it easy. Give me a list of languages that have been used to build an operating system in a product in the last 20 years. It doesn't have to be Linux, even a small RTOS counts.

I'll start the list:

C family--C, C++, ObjC/Swift

Forth(?)--probably counts as it runs on pretty bare metal

Ada--not sure anybody has used it to build an OS, but I don't debate that they could

Rust--has a feature set of articles about this

Pascal--the original Lisa and Macintosh OS (probably stretching that 20 year limit a bit).

And?


Apple switched away from Pascal due to UNIX market pressure.

http://basalgangster.macgui.com/RetroMacComputing/The_Long_V...

http://basalgangster.macgui.com/RetroMacComputing/The_Long_V...

And it was mostly to C++, not C.

Pascal is used daily for embedded system work by MikroElektronika customers using mikroPascal.

https://www.mikroe.com/mikropascal/

Any embedded application using Ada's Ravenscar profile, is an OS.

https://en.wikipedia.org/wiki/Ravenscar_profile

Additionally regarding Pascal, it was used to build Corvus Systems OS, MicroEngine and Solo OS.

Modula-2 was used to build Lilith and Delco's engine control units.

Mesa was used to create Xerox Star workstation.

ESPOL followed by NEWP was used for Burroughs B5500 in the 60's, nowadays still sold by Unisys as ClearPATH.

IBM created their RISC architecture OS using PL/8, with a compiler architecture that now re-resurfaced in LLVM.

OS/400, nowadays known as IBM i, was originally written in PL/I.

If it wasn't for UNIX's adoption, C would have joined many of those languages many moons ago.


So, C, Pascal, and Ada with a smidge of effectively dead languages.

Okay, sad to know I'm not missing anything.


I am happy with your list of languages to write an OS in, maybe add D and Oberon. I'd point out that you can also use managed languages, see MS Singularity, or the various Lisp and Smalltalk operating systems, or the UCSD P-system, etc - there is a list at https://en.m.wikipedia.org/wiki/Language-based_system .

Counting new commercial operating systems is not a useful benchmark as they are very rare, and we already agreed that the alternatives are not popular.


"Consider the idea of the Sufficiently Smart Compiler[1] that claims that a "slow" and "high-level" language like Python/Ruby could be theoretically analyzed and compiled to be as fast as C or handcrafted assembly."

Nim seems to be trying to fit that space.


This is another article overanalyzing the success of C, when in fact the reason for the success of C is very simple and obvious: Unix was free and in a lucky position in 1973; Unix got popular; C is the language of Unix; therefore C got popular. There is no inherent benefit in C that, for example, a somewhat modified version of Pascal or Algol wouldn't have inherited. And these kinds of articles always ignore the fact that in the past decade or so, C and C++ have been declining in popularity. By and large, new programmers are not learning C the way they were in the '90s. For better or worse (personally, I think, for the better), they're starting with JavaScript, Python, Ruby, or even PHP.

I'm highly skeptical of the conclusion that what we need is a new safer implementation of C, too. Switching to a new compiler is a very high burden for a lot of projects, and at the end of the day they're still left with all the problems of C, like header files, no namespaces, terrible standard library, etc. etc. (Even adding compiler switches is a high burden, which is why Linux distros took so long to widely deploy basic things like -fstack-protector.) By contrast, switching to a new language (or incrementally writing new components in a new language, which is how this always goes in practice) is also a very high burden, but the benefits are larger: you don't have to deal with all the problems of C.

In my view, this is why safer versions of C have repeatedly failed over the years, while new languages have flourished. Migration to a new language or a new compiler is expensive no matter what, so teams will only do it if they see enough benefit to justify the expense of doing so. Merely adding some amount of safety to C isn't worth it, but the large safety and productivity gains you can get from a different language can be.


> Switching to a new compiler is a very high burden for a lot of projects ... By contrast, switching to a new language ... is also a very high burden, but the benefits are larger

I guess switching to a new compiler (or newer version of the same vendor) is much less burden than switching to the new language.

Don't forget that all "new safe languages" are simply new. Why people like C is familiarity: known practices and known issues to avoid ironed out through 30 years of usage.

Although Rust/JavaScript/your-favorite-new-language brings on table fixes for known C issues, they introduces many unknown things. Remember Java? It promised compile-once/run-everywhere, automatic memory management approach, but introduced bloat, extremely hard to catch GC leaks no one talks about, JVM implementation differences (Oracle JVM vs IBM JVM vs OpenJDK speed) and JVM security issues only few can fix.

> why safer versions of C have repeatedly failed over the years

I guess this will be like giving to skilled hunter a toy water gun - simply a different mindset. Imagine unsafe python with pointers and mallocs; how python devs would deal with it?


> I guess switching to a new compiler (or newer version of the same vendor) is much less burden than switching to the new language.

If that were true, then Linux distros wouldn't still be using GCC. Switching to a new compiler (like clang) is a huge burden.

Both switching compilers and switching languages are enormously expensive, to be sure. But I think people (especially people in academia) consistently underestimate the cost of switching compilers and overestimate the cost of writing new components in a different language.

> Don't forget that all "new safe languages" are simply new. Why people like C is familiarity: known practices and known issues to avoid ironed out through 30 years of usage.

Most programmers haven't been programming for 30 years. New programmers, by and large, don't even know C anymore.

The problems with C haven't been so much "ironed out" as ignored since C99.

> Remember Java? It promised compile-once/run-everywhere, automatic memory management approach, but introduced bloat, extremely hard to catch GC leaks no one talks about, JVM implementation differences (Oracle JVM vs IBM JVM vs OpenJDK speed) and JVM security issues only few can fix.

You bring up Java as though it were a failure! Java has in fact been beating C++ in total usage for years. If I could point to one thing that was responsible for kickstarting the slow decline of C++ that has continued to this day, Java would be it.


If that were true, then Linux distros wouldn't still be using GCC. Switching to a new compiler (like clang) is a huge burden.

FreeBSD switched to clang, and could (with some work) be made to use whatever safe C compiler people come up with. That's much easier than rewriting the entire FreeBSD system in a new language.


> If that were true, then Linux distros wouldn't still be using GCC.

Simply because there is no better alternative (clang doesn't bring anything new). However, distros switched to egcs fork when gcc wasn't up to date.

> New programmers, by and large, don't even know C anymore.

New programmers are interested in web, just like they aren't interested in desktop GUI development. Should I say that desktop is dead? I don't see we are booting usable machines in browsers yet.

> The problems with C haven't been so much "ironed out" as ignored since C99.

I think you need to hang more with embedded/kernel/C devs more and get insight into their mindset. They aren't interested in new stuff as much as in language stability.

> You bring up Java as though it were a failure! Java has in fact been beating C++ in total usage for years.

You read it wrong - I haven't said Java failed, but introduced new stuff to cope with. Java should thank it's popularity to huge ecosystem and library, Sun's aggressive marketing, jvm and extreme language stability.

> If I could point to one thing that was responsible for kickstarting the slow decline of C++ that has continued to this day, Java would be it.

Did I mention C++ here?


> If that were true, then Linux distros wouldn't still be using GCC.

You're presupposing they want to switch. Most of the switchers to clang seem to have done so for ideological reasons more than anything.

Never the less, most of debian can be built with clang: http://clang.debian.net/


Great overview. To support it on the design side, the video below shows the evolution of the language from CPL to BCPL to B to C. In it, you see they don't start with what's great for analysis, safety, or efficiency so much as what can compile on terrible hardware. Thompson's modifications are a mix of arbitrary and what will make it work on a PDP. Ritchie enhances a bit for operating systems. This is start contrast from the careful design of languages like Ada or Modula-3 balancing expressiveness vs safety vs performance. No surprise a bunch of problems followed. And same ones today, 30+ years later, in average app even with better tooling available since the language itself defaults on making simple stuff require extra work to do safely. Not necessary as Wirth, Morrisett, and others showed.

https://vimeo.com/132192250


I've used both C and Pascal in embedded systems. Pascal is painful compared to C. A "somewhat modified" version might help, but I doubt it would be enough. To steal a phrase from my friend Michael Pavlinch: Pascal was like picking your nose with boxing gloves on. A modified boxing glove isn't really going to solve the problem.

For that matter, once we weren't on Unix but rather on the PC, and we had a nicely-modified Pascal (Turbo Pascal), why did C/C++ win there, too?


> For that matter, once we weren't on Unix but rather on the PC, and we had a nicely-modified Pascal (Turbo Pascal), why did C/C++ win there, too?

Turbo Pascal was quite successful in its day. But Microsoft chose C, and the rest is history. Absent Microsoft's decision, Pascal might still be around.

If you look at early Mac development, for instance, Pascal was actually preferred. C only ended up winning due to being better known, which was a result of the critical mass of programmers trained on Unix and Microsoft's offerings.


That doesn't agree with my experience. I switched from Turbo Pascal to Turbo C in the late 80s while doing DOS development because it was a better tool for the job. It had nothing to do with microsoft or windows (v3.0 was not yet out and few people developed windows apps before v3.0). Pascal (the language) was definitely not preferred for DOS development at that time - it's just that until 1987 there wasn't really C development environment that could compete with Turbo Pascal.

I did some Amiga development back then also and that was exclusively in C with some 68k assembly. I don't really recall anyone hoping for a pascal environment to replace their C tools, but the Amiga OS was more C-oriented than DOS at the time.


> Pascal might still be around.

Well that's a bit rude. Pascal is still around! And I love it. Behold:

https://www.freepascal.org/

https://www.embarcadero.com/products/delphi


> But Microsoft chose C

Ironically, the early Windows API used Pascal calling conventions.


> why did C/C++ win there, too?

Maybe C because of its legacy an ubiquity, and C++ because it was one of the few languages with 1) serious compatibility with C, 2) good featureset, and 3) became an ISO standard


Maybe C because of its legacy an ubiquity

Not in the 80s.


Basically because Microsoft picked C++ to be the main language for Windows development.

Turbo Pascal was all very well but from Microsoft's point of view it was Not Invented Here.


C++ wasn't invented at Microsoft, either. The DOS C++ train had already left the station (Zortech C++) and Microsoft wasn't about to be left behind.

(Zortech didn't invent C++, either, I don't want to give that impression.)


Actually if I remember correctly they were the very last PC C compiler vendor to add support for C++.


> Pascal in embedded systems

aka Modula-2

http://cms.edn.com/ContentEETimes/Documents/ESC%20Proceeding...


I never liked Pascal and I love C, but Bill Atkinson did amazing things with Pascal (see for example https://www.folklore.org/StoryView.py?story=Hungarian.txt) so it must have something going for it.


You are ignoring the fact that many problems can't be solved in the higher level languages.

Also, for some, having the c/c++ level of control is prefered.


This doesn't have to be zero sum. We don't need to choose between safe and unsafe. Safety should be a default, with unsafety being something you opt into.

C/C++ are both unsafe. Rust is safe by default, and for the cases where you want/need the C/C++ level of access to the system, you can opt into unsafe. I believe Rust is more safe than Go and Java as well, because of the type safety in the threading model.

There are existing applications in C out there, there will be for a long time. Personally I frown upon anyone starting a new project in C or C++, or any unsafe language, especially when you have such a strong language in Rust that exists with a very healthy and growing community around it.


Notice that "scoff at" turned to "frown upon" here, but same difference.

I like Rust, but the kind of attitude you've shown in this comment is a) distressingly common among Rustaceans, and b) makes everyone in the embedded world who would like to get away from C/C++ keep the whole Rust ecosystem at arms-length.


Scoff: speak to someone or about something in a scornfully derisive or mocking way.

Frown: furrow one's brow in an expression of disapproval, displeasure, or concentration.

Those are definitely not the same thing. Language is deliberate. I'd prefer it if you don't change mine to mean something I do not.

I frown at a lot of things in software I review, then someone explains to me why they decided to do something the way they did, in which case they may convince me that they are correct. In the case of C/C++, you can convince me easily in the embedded space that C is still the best choice, and I'd agree. I wouldn't even debate it, I might personally go try and see if there is an option there, but it's clearly a space that Rust is still getting bootstrapped into.


You wrote "scoff" first before you edited it, that's why gens wrote "You can scoff all you want". From the tone of your comments, I think that's what you're really doing.


No, he did not. Not as far as i remember.

I am not British, and i did want to dramatize it a bit. In my defense, words are to convey meaning as much as to describe action. (you don't literally "beat a dead horse" or "fart in your general direction")

EDIT: In bluejekyll's defense, we here are a bit chafed (hehe) by extreme fans of certain programming languages (IMO even some functional programming fans get a tad too.. unrealistic in their talks)


hmm... I did not. But ok.


You sure would frown a lot in the embedded world. Rust isn't even a contender for most hard-real time or safety critical projects. Is it even supported by any RTOS today?


Recently I had the pleasure of learning how to develop for this https://www.tockos.org/, which was a lot of fun.

But yes, I can still frown at the C in the embedded world, and bite my tongue when forced to use it. It's only a matter of time IMO before Rust gets onto more of these devices.


I don't like to be told what programming language i should use. You can scoff all you want, but i will be using C for most of my projects.


I don't "tell" people, what language to use. But I do encourage them to look at new languages, especially ones that fit in the same place as one that they like.

I love C. It was my first programming language. I love the syntax, I love the semantics. Years ago, I left it though, because I could deliver higher quality applications with fewer unknown bugs with Java, but I always wanted to find a reason to go back to C. And for some projects I did, and it was important (and in each case I'd run into some error or bug that took weeks to track down).

When I grabbed Rust 2 years ago, it was like getting everything I loved about C paired with everything I love about Java, and none of the stuff I hate in either. Feel free to use whatever language you wish, and I hope schools continue to teach C (though that seems to be dwindling), but I highly encourage people to checkout Rust and see what it's like to not worry about pointer management all the time.


When I checked out Rust a year ago, I immediately discovered it has no SIMD (SSE, AVX, Neon). That it has no sane ways to implement a graph structure. Also that it’s hard to compose data structures into higher-level specialized ones.

Also, I highly encourage people who worry about pointer management all the time to checkout modern C++.


> That it has no sane ways to implement a graph structure.

Sure it does. I work with graphs in Rust all the time.

They may not be "sane" in your view because they're different from the way you implement them in C++, but I could equally well say that there's no "sane" way to implement a safe owning pointer in C++ (since there's no memory-safe way to do so).

> Also, I highly encourage people who worry about pointer management all the time to checkout modern C++.

Modern C++ has no protection against the most pernicious memory management errors, particularly use-after-free.


> I could equally well say that there's no "sane" way to implement a safe owning pointer in C++

unique_ptr + std::move gives you that semantically. Sure it won't stop you from dereferencing a null pointer but, in all the years i've been writing C++, finding and fixing null pointer dereferences wouldn't rank very high on my list of things to worry about. They always kill your program and are easy to spot in an IDE or debugger.

Rusts choice to make pointers either mutable and owning, or shared and immutable and garbage-collected, is no doubt the right one, but there are code-styles in C++ where this can be achieved with a very low fuck-up rate.

The modern C++ way is not to use pointers, except as an implementation detail. A pointer (raw or smart) of any type, other than perhaps char*, as a function parameter is a sure sign of code smell, and raw pointers as data members have very limited use.


Unique pointers provide no protection against use after free, because you can take a reference to their contents and that reference can become dangling. Because the destructor of a unique pointer is invoked automatically per the language rules, as opposed to in C where an explicit call to free is required, this makes C++ more prone to UAF than C.


It's unusual to take references to the contents of a unique pointer. There is one idiom which says that if one has a smart ptr and a function taking a ref, the raw ptr should be passed, but that's it. It's frowned upon... nay scoffed at to store references one receives as parameters, so that temporary ref will go away after the function call, leaving the smart ptr as unique owner.

This should not be a problem and it certainly doesn't make C++ more prone to use after free. Null pointers are the problem.

Both can be solved though by creating e.g. a safe smart ptr which does null checks and only exposes operator->.


> It's unusual to take references to the contents of a unique pointer.

No, it's not. It happens every time you call a method on the referent (well, OK, this is technically not a reference, but it doesn't matter to the argument).


One is not taking the reference. When writing ptr->foo() the ref is temporary, not accessible and will be cleaned up when the method call on the raw ptr finishes.

Taking the reference would be "auto foo = ptr.operator->()". This could be forbidden by not providing op->, and instead having an apply function which takes a method name and the parameters. That would be safer, but probably too much effort for little gain.


If you're calling a method on the object referred to by a unique_ptr, then you won't have a use-after-free because the thing will exist. The only way it wouldn't exist would be if you typed "delete myuniquePtr.get()", which would be dumb.

It could be a null unique_ptr of course, but I don't see how this is anything worse than a denial-of-service.


No, you could get a reference to the container of the object and indirectly delete it. For example, if the unique pointer were part of a global std::vector, clearing the vector would invalidate the this pointer.

Keep in mind that you are at this point arguing against the existence of actual zero-days that have occurred in Firefox (and lots of other software). This is not a theoretical concern.


This goes back to what I said about using pointers as implementation details. If you put unique_ptrs in to a _global_ then you're doing something stupid already. Just think about it. Why does something that can only be pointed to from one place need to visible globally in the first place?

My point is all Rust does is force you stop and think, and while C++ lets you do dumb shit, it's hardly fair to blame the language when almost-safe C++ is actually cleaner and easier to read and write than dumb C++.

Build your own handle types with well-defined ownership semantics, use explicit move() sparingly, pass objects by value, use references, utilize the stack and temporaries. Only put pointers inside the guts of classes and your data structure implementations. These techniques go quite far.

And Firefox as an appeal to authority is hardly compelling. As another, I've seen bug fixes in Chromium where the original code quality is so poor it was hard to believe it came out of Google. Of course, since then i've learned most C++ out of Google is total crap.


> Unique pointers provide no protection against use after free, because you can take a reference to their contents and that reference can become dangling.

Yes, it's possible, however references should only ever have local scope so, unless you're dealing with threads or asynchrony it's hard to write sane code where this happens, and if you have those things, and you're passing refs or ptrs, then I don't have to tell you that things are bad.

> as opposed to in C where an explicit call to free is required, this makes C++ more prone to UAF than C.

I don't see this. C++ destructors run after the last line of your code block, if something uses a destructed object then it can only be because you passed a pointer or reference to it to something else that wasn't yet destroyed.

Mentioning free() just implies that you're willing to accept resource leaks to avoid UAF bugs, which is nuts because UAFs can be a lot easier to debug.


> Mentioning free() just implies that you're willing to accept resource leaks to avoid UAF bugs, which is nuts because UAFs can be a lot easier to debug.

If you're focused on security, it goes in the opposite direction: a resource leak can lead to a denial of service, but an use-after-free can lead to remote code execution, which is much worse. From that point of view, it's worth it risking a resource leak if by doing that you prevented a potential instance of remote code execution.

By the way,

> C++ destructors run after the last line of your code block

Aren't there many situations where the C++ destructor runs at the end of the current statement? IIRC, if you call a function which returns a temporary, then call a method on that temporary which returns a reference to within the temporary, and assign the result to a variable, all in a single statement, the temporary will be destructed while the reference to its contents is still live.


> then call a method on that temporary which returns a reference to within the temporary

The obvious answer to this would be never to return references to members (or anything tied to the objects lifefime), but if you really must then you can always use a qualifier to prevent this pattern from compiling.

https://ideone.com/UuZYJe


> They may not be "sane" in your view because they're different from the way you implement them in C++

I saw two kinds of Rust graphs.

One is safe, easy to understand, but slow (e.g. reference counting).

Another one (unsafe Rust) is very hard to implement, thousands lines of code. Also, modern C++ is much safer than unsafe rust.

> C++ has no protection against the most pernicious memory management errors, particularly use-after-free

CRT debug heap / MALLOC_CHECK_ / libefence, depending on the platform/compiler


I can't reply directly to your other comment, so I'll do it here. In the case of pet-graph, I wouldn't say it has large amounts of unsafe (I reviewed it recently b/c I might start using it), but it does use unsafe. Most data structures in Rust require unsafe for performance or memory access patterns. In these cases the developer is the one responsible for guaranteeing that it is safe (no loss over an unsafe language).

> modern C++ is much safer than unsafe rust

This is an interesting comment. I can't say if you are right or wrong, but it's thought provoking. So I'll quote the rustinomicon here:

    Unsafe Rust is exactly like Safe Rust with all the same 
    rules and semantics. However Unsafe Rust lets you do 
    some extra things that are Definitely Not Safe.

    The only things that are different in Unsafe Rust are 
    that you can:

    -Dereference raw pointers
    -Call unsafe functions (including C functions, intrinsics, and the raw allocator)
    -Implement unsafe traits
    -Mutate statics
Point being Rust doesn't just throw out all the rules. But it's a very interesting assertion.

https://doc.rust-lang.org/nomicon/meet-safe-and-unsafe.html


> I can't reply directly to your other comment

Next time just wait 5-10 minutes.

> Most data structures in Rust require unsafe for performance or memory access patterns.

And when I want to compose 2 data structures into my own higher-level one, for performance and memory access patterns I need these two lower-level structures to expose unsafe stuff at their API boundaries. The data structures I saw don’t do that, they’re designed to be consumed from safe Rust instead.

> Rust doesn't just throw out all the rules

I think in modern C++, with these iterators and smart pointers, you’re less likely to screw up dereferencing a wrong pointer.


> And when I want to compose 2 data structures into my own higher-level one, for performance and memory access patterns I need these two lower-level structures to expose unsafe stuff at their API boundaries. The data structures I saw don’t do that, they’re designed to be consumed from safe Rust instead.

Can you give a specific example of something you want to do that you can't?

> I think in modern C++, with these iterators and smart pointers, you’re less likely to screw up dereferencing a wrong pointer.

I don't think this is empirically true relative to C, but even if it is, use after free is still far too common in C++ code.


> Can you give a specific example of something you want to do that you can't?

Compose a hash map + linked list into an LRU cache. Rust how has that in standard library, but they had to implement their own linked list for that. In C++ it’s just a few lines of code, because standard maps+lists compose just fine.

Or (more generic example and thus harder to put in the standard library), add an index to existing collection. I have a large collection of some values. I want to build an index allowing to lookup values by some key. Values are not small, can’t afford duplicating them. If you’ll tell “just move values into a hashmap”, my response is “and I also want another, different index of the same set of values by different key”. Again, very easy in C++, encapsulate both the original collection, and a hashmap from key to value pointer.

> use after free is still far too common in C++ code.

In my experience, use after free = instant crash in debug build. Quite easy to detect and fix.


> In my experience, use after free = instant crash in debug build. Quite easy to detect and fix.

The security track records of major network-facing C++ apps disagree with you.


Use an Rc? If you can't afford the reference count, then use unsafe/raw pointers, which it sounds like you'd do in C++ anyway.


> then use unsafe/raw pointers

Even if I manage to extract an unsafe pointer from that Rust collection, I don’t know for how long will it work. For C++ collections, iterator invalidation rules tell me that.


> Even if I manage to extract an unsafe pointer from that Rust collection,

It's easy, just get the & or &mut to the value (as if you were acessing it), and cast it to respectively * const or * mut.

> I don’t know for how long will it work. For C++ collections, iterator invalidation rules tell me that.

It's the same in Rust. Whenever the iterator would be invalidated in C++, the pointer you stashed above might point to the wrong place. This is not usually documented in Rust, because its borrow rules prevent you from stashing a reference while the collection mutates, but once you start playing with raw pointers, the borrow checker gets out of the way (references have a lifetime, pointers don't).

You just have to be careful when casting the pointer back to a mutable ref ("unsafe { &mut *ptr }" is the trick, see the documentation for std::mem::transmute): mutable references are like C99's "restrict", so you should make sure to only ever have one live for each pointer at every moment, otherwise you're in undefined behavior land.

----

Anyway, going back to the parent comment, you said "Values are not small, can’t afford duplicating them". Might I suggest keeping the values in a Box<T> then, and making both collections point to the box? That way, you don't have to worry about a mutation in one of the collections invalidating the pointer, since the contents of a Box won't move in memory.

And in fact, the usual Rust style for keeping a value in more than one collection would be to use a Rc<T>, which is basically a Box with a reference counter. That way, you don't need to play with raw pointers, and have no risk of a misstep. You pay the cost of incrementing/decrementing the reference counter only when adding/removing from the collection, and the reference counter is small.


> keeping the values in a Box<T> > for keeping a value in more than one collection would be to use a Rc<T>

Indeed, both methods are simple and elegant ways to approach the problems.

The bad thing with both of them is performance.

Box<T> means when I need to iterate through all values in a collection, I’ll get random memory access for each item. Rc<T> is even worse, not only it’s RAM read latency per item, also ref.counting overhead per item (AFAIK even when reading stuff).


> also ref.counting overhead per item (AFAIK even when reading stuff).

That's the beauty of the borrow checker: no, there's no reference counting overhead when reading stuff. The borrow checker guarantees that the reference you used to access the value won't go away until you're done with it, so it doesn't have to increment the reference counter.


You'd have to box your values. And if you didn't want to do that, then I'd just used the indexing method mentioned elsewhere in this thread. I've used such things in performance critical code.


> the indexing method mentioned elsewhere in this thread

https://news.ycombinator.com/item?id=15180649


> For C++ collections, iterator invalidation rules tell me that.

The iterator invalidation rules in Rust are straightforward, more straightforward than those in C++. They have to be, because the compiler actually checks them.


More like 20 it seems ;)

> I need these two lower-level structures to expose unsafe stuff at the API boundary

Two thoughts: 1) I think you can always use unsafe to get access to a raw pointer (I honestly don't use unsafe often) 2) you need someway to express ownership between both data structures, this can be annoying, no doubt.

> C++, with these iterators and smart pointers

Does that make it safer than unsafe Rust? Maybe, but there's a lot less unsafe Rust even in these graphs...


> I think you can always use unsafe to get access to a raw pointer

I’m not sure about that. Also I don’t know for how long will it work, C++ has iterator invalidation rules.

> you need someway to express ownership between both data structures

Not every relation is ownership. Graph modes don’t own each other, an external index doesn’t own the indexed items, etc.


I'm really not sure, but I think there's a miscommunication about raw pointers in this thread. I think other folks are suggesting that you can solve the container composition problems you're talking about by inserting raw pointers into a HashMap. But I think you're reading that as obtaining raw pointers to the storage that a HashMap owns, which is why you're worried about iterator invalidation rules and stuff like that? (My understanding is that any reference into storage that a container owns is / could be completely invalidated by any &mut operation on that container.)


To create an external index for the existing collection, I need to get pointer of the item stored in that existing collection, compute a key, and put both into the HashMap<tKey, tValue*>

So, I need to do both. And also, I need to know when these pointer expire so I can rebuild my index when it happens.


I think I understand. The idea would be to have one HashMap<tKey, tValue> that holds the objects themselves, and then a secondary HashMap<tKey2, *tValue> (with a star this time) that indexes on some other key and points to values stored directly in the first map?

What's the benefit of doing that, compared to making both the HashMaps store pointers to independently allocated objects on the heap, such that insertions into one map never invalidate the other map? Is the hope to avoid paying the cost of an extra pointer dereference when we're using the first map? Or does independently allocating each object hurt cache locality or something like that?


> Is the hope to avoid paying the cost of an extra pointer dereference when we're using the first map? Or does independently allocating each object hurt cache locality or something like that?

Both.

In practice, I probably wouldn’t use a hashmap for the first container that actually owns these items. When I do expect gigabytes of data, in C++ I use something like vector<vector<tValue>>, where the inner vectors are of the same fixed size (except for the last one), e.g. 2-16MB RAM / each. If I need to erase elements, I include a free list such as this one: https://github.com/Const-me/CollectionMicrobench/blob/master...

But the exact container is not that important here. If you don’t have that many values, it can as well be a standard vector.

The point is, C++ allows composing these containers making higher-level ones, such as this indexed array example, using pointers to link individual items across them. This feature allows building sophisticated and efficient data structures that are still possible to reason about.


Right, and I don't understand why you think that same exact approach wouldn't work in Rust either. If you have a `Vec<Vec<tValue>>`, then you can spread all the raw pointers you want everywhere without any additional boxing of `tValue`, and you know exactly when those pointers might become invalidated: whenever you call an `&mut` method on your `Vec<Vec<tValue>>` (or rather, on one of the interior `Vec<tValue>`s). Because of that, you can even build safe abstractions on top of such data structures such that your callers can't possibly misuse it (without themselves using `unsafe`).

The technique of giving stable addresses to things by stuffing them into vectors isn't unique to C++. People do it in Rust too: https://github.com/SimonSapin/rust-typed-arena/blob/master/s...


The exact container is not that important here. The point is, C++ allows composing these containers making higher-level ones, such as this indexed array example.

They can be standard, third-party, my own, I still can compose them.

About my particular example, I’m not sure you can easily implement a free list in rust, to reuse space from de-allocated items. Especially if these items have non-empty constructor and destructor.


> The point is, C++ allows composing these containers making higher-level ones, such as this indexed array example.

What I---and others---are trying to tell you is that it's perfectly possible in Rust too. I don't think you've pointed out anything that isn't possible in Rust. My previous comment was exactly about composing containers to make higher-level ones.

Have you tried building such things? Did you get stuck? Maybe someone can help.

> About my particular example, I’m not sure you can easily implement a free list in rust, to reuse space from de-allocated items. Especially if these items have non-empty constructor and destructor.

I don't see any reason why implementing a free list in Rust wouldn't be possible either.


It seems like one of Const-me's objections is that Rust data structures like HashMap don't document a lot of guarantees about when they would and wouldn't invalidate unsafe interior pointers. That said, for Vec in particular, Rust actually makes a ton of guarantees about its layout (more that C++ std::vector I think): https://doc.rust-lang.org/std/vec/struct.Vec.html


Correct.

While vectors are comparable, C++ also guarantees a lot about the rest of the containers. E.g. unordered associative containers never expire pointers to keys or values. Linked lists never expire pointers nor iterators.

In C++ I can create an efficient LRU cache in a dozen lines of code, combining list<const tKey* > with unordered_map<tKey, struct{tValue, list<const tKey* >::iterator}> (this implies tKey is not an int, otherwise list<tKey> is more efficient). Rust’s built-in LinkedHashMap had to reimplement a linked list instead.


Thanks for the example -- it was hard for me to picture what was going on before. I wonder if the unstable "Placer" APIs will make the standard LinkedList more capable of this sort of thing, but I'm not really familiar with them. Still, I suspect it'll always require `unsafe`.


> I don't see any reason why implementing a free list in Rust wouldn't be possible either.

Is placement new available in rust stable?


Why do you think placement new is necessary to implement a free list?


Object size possibly.


If you click directly on the time of a comment, it's a link to a direct reply page.


> And when I want to compose 2 data structures into my own higher-level one

Could you give a specific example please? I compose data structures in Rust all the time.



There is another kind of Rust graph that uses indices. Indices are just bounds checked addresses, so this is a natural fit. The most popular graph crate, petgraph, is of this type.

> CRT debug heap / MALLOC_CHECK_ / libefence, depending on the platform/compiler

None of these are effective at preventing use after free problems in production.


There’s another problem with arrays: they can’t be too large.

First reason is address space fragmentation, esp. on 32-bit platforms.

Second reason is insert time can be very high. Sure, the average is usually amortized using exponential growth. The worst case however is horrible, you copy 1GB RAM just to insert another 16-byte item.


> I saw two kinds of Rust graphs.

There's a third kind, which uses indexes into arrays containing the nodes and edges, instead of direct pointers to the node/edge.

> CRT debug heap / MALLOC_CHECK_ / libefence, depending on the platform/compiler

Can any of these protect against the scenario where a block of memory is freed, allocated again for another purpose, but still accessed through the old dangling pointer?

Also, are they always present at runtime, or are they used only on debug builds and turned off on production? The use-after-free might happen only after a specific sequence of uncommon operations confuses the code enough that it either frees something before its time, or keeps and uses a stale pointer.


> There's a third kind, which uses indexes into arrays containing the nodes and edges

Pointers are still faster. Also with arrays it’s expensive to reduce RAM usage after a lot of nodes were removed.

> Can any of these protect against the scenario where a block of memory is freed, allocated again for another purpose, but still accessed through the old dangling pointer?

No 100% guarantee, but AFAIR CRT debug heap takes measures to reduce RAM reuse when it can.

> are they always present at runtime, or are they used only on debug builds

Not present. Yes, only on debug builds. Still, these early debug traps are quite helpful while development.


> Pointers are still faster.

Depending on cache effects, indexes might or might not be faster than pointers. With indexes into an array, the nodes or edges will be sequential in memory, which depending on their size and access patterns might increase the cache hit rate. Furthermore, while pointers will always be 8 or 4 bytes, indexes can be as small as 2 or even 1 byte for smaller graphs (reducing structure sizes and potentially leading again to a higher cache hit rate).

As for the costs of indexing, on x86 a single instruction can add the array base, the index, and a constant offset, and do a load or store from/to the resulting address. Other architectures might need a few more instructions, but that is dwarfed by the cost of a cache miss, which can be hundreds of instructions.

Another cost is the bounds check for every indexing into the array, which the compiler can't elide because it can't easily prove that the index is within the array bounds. That is the main reason you saw "unsafe" code on the petgraph crate; there are places where the programmer knows the indexes are within the bounds, since they came from a trusted place (the graph itself), but the compiler isn't smart enough to prove it, so the programmer manually bypasses the array bound checks in these cases.

All in all, I wouldn't know a priori which would be faster for a particular use case, pointers or array indices. I'd have to benchmark first.

> Also with arrays it’s expensive to reduce RAM usage after a lot of nodes were removed.

True, the "array indexes" approach is not as good for algorithms which need to remove many nodes (or edges, depending on how they're represented) from the graph. As long as you don't need the indexes to be stable across deletions, you can use a simple trick to make deletions cheaper (move the last element of the array into the newly freed place, so all empty places are at the end of the array), but that trick can't be used if you need the indexes to be stable (because they're referenced from outside the graph).


Your point about reducing pointer size if the size of allocation pool is smaller makes sense. But I think it is not fair or realistic to imply that simply using an array the size of all RAM starting at zero leads to more fragmentation than referencing all of RAM relative to some other point. Assuming same algorithms working on same sized data sets. The relative locations of objects is not dependent on where the coordinate origin is, although choosing a good zero may make the engineer's life easier.



When I checked out rust a year ago, I tried to implement three things:

- Some OT code. This went ok, but I still have no idea which of the 6 string types I should use for a library like that. I think I ended up settling for Rc<Cow<String>> or something, but it still wasn't ideal. Swift, Go and C all each have a canonical string type.

- First I tried to make a skip list with performance matching the performance of my C implementation. I discovered that even with unsafe there was no way to make a struct with a dynamically sized array at the end, like I can easily do in C.

- Then I tried to make a networked server using tokio. Despite all the hype, adding a dynamic item to the event bus didn't work because it wasn't 'static didn't work. After spending a few hours fighting the borrow checker, I went online and was told that this would get better with impl trait or something.

I'd really like to use rust, but as far as I can tell its not mature enough for what I want. I've started a new server project recently and I'm writing it in straight C, as none of the newcomer C-replacement languages I tried seem good enough to replace C.


> This went ok, but I still have no idea which of the 6 string types I should use for a library like that. None of Swift, Go or C have this problem.

There are two string types: a string that owns its contents and a string that references its contents. This is the same as in any language that uses smart pointers for resource management.

Can you name a string type that you think should be removed, and explain why?

> First I tried to make a skip list with performance matching the performance of my C implementation. I discovered that even with unsafe there was no way to make a struct with a dynamically sized array at the end, like I can easily do in C.

Yes, you can. You can make a one element array and allocate and deallocate manually, just as you do in C. The offset method on pointers allows for arbitrary pointer arithmetic.

> Despite all the hype, adding a dynamic item to the event bus didn't work because it wasn't 'static didn't work.

I haven't used Tokio, but couldn't you use a boxed trait?


> There are two string types: a string that owns its contents and a string that references its contents. This is the same as in any language that uses smart pointers for resource management.

There's String, &str, Cow<?>, Rc<?> and other variants. None is canonical. I spent about 2 hours reading documentation trying to pick the right type to use and I think I ended up with Rc<Cow<String>>. But in this instance my strings represent character edits in a document. 90% of the time they're < 5 bytes long. So in 90+% of cases I should be able to avoid allocations and memory dereferencing entirely, and store the string inside the pointer. What I actually want is an efficient version of enum Str { ShortStr(char[X]), Ref(Rc<Cow<String>>) }, but encapsulated behind a common string interface. Coincidentally, this is exactly how the canonical string implementation works in obj-c and (I think) swift. Despite having 6 different options maybe the string type I actually want is buried in Cargo. I'm not sure - at this point I was tired and I stopped trying.

To me this is a classic symptom of a language trying to do too much. Having all this choice is great for systems development, but for application-level development I don't want X different string options. You want 1. And I want it to be good. Having lots of options would be fine if the language was more opinionated - "Unless you know what you're doing you should just use String, which is efficient, immutable, copy-on-write and ref counted. Click here (link to advanced section of book) to read about your other options if you want more control over allocations."

> Yes, you can. You can make a one element array and allocate and deallocate manually, just as you do in C. The offset method on pointers allows for arbitrary pointer arithmetic.

Does it? At the time even with unsafe there was no way to directly call malloc. Maybe I just couldn't find it in the docs, or maybe thats changed now. I spent weeks on and off trying to get it working, including reading the rust unsafe nomicon and writing dozens of linked list implementations. I tried out all sorts of weird ways to allocate and initialize the array. I kept thinking of new ideas, only to find out a critical piece of syntax was missing. In the end I could allocate the struct I wanted but discovered it was syntactically impossible to initialize, or something silly like that. And at that point I gave up. Maybe this problem has been fixed since. And maybe if I spent even more time trying I would have figured it out. But I was tired and I had work to do.

> I haven't used Tokio, but couldn't you use a boxed trait?

I don't know what that is. Frankly I'm still confused why Rc<> didn't work. I got about 6 different answers when I asked the rust subreddit how to fix this. Some people suggested things that also didn't compile. Some people said I should make my object 'static (no thanks). And others said the problem would be fixed when trait impl lands (whatever that is - is that what you're talking about?). This use case is literally the 'hello world' of nodejs code - attach an event handler to an object, interact with local variables each time the event fires. At least as of a year ago the tokio devs clearly thought all network servers only did request/response style interaction. All the examples on their website were either an echo server or an http server. I need streams.

I really want to be able to use rust. But so far my only experience with it has been one of frustration. It seems too immature to replace C as a systems language, and tokio seems too immature to replace Nodejs for network services. Maybe I'll revisit it in a few years, but at this point I'm more hopeful either someone will bolt decent syntax on top of Go a la coffeescript, or that Swift will add language level support for concurrency. (I'd be happy with either async or go's actor model.)


> There's String, &str, Cow<?>, Rc<?> and other variants.

To be fair, that's like saying std::string and std::shared_ptr<std::string> are two different string types in C++, and that neither is canonical.

In Rust, String/&str are the canonical string types. String is an owned growable buffer, &str is an immutable slice. That's it. Adding Cow<_>, Rc<_> or Arc<_> to the mix is orthogonal to the specific string type you're using. They are smart pointers and can work with various types other than strings.

> What I actually want is an efficient version of enum Str { ShortStr(char[X]), Ref(Rc<Cow<String>>) }, but encapsulated behind a common string interface.

We couldn't get away with adding this as the standard library string type because it would impose non-zero costs on every use of a string. The use of Rc is particularly grating because it's not thread safe, which means you wouldn't even be able to send strings across threads. That would suck. So then you might want to say to use an Arc---atomic ref counting, thread safe---but that's even more costly.

I'm honestly kind of confused at your feedback here. At first it just sounded like you were bewildered by the various string types---which is a fair criticism, getting strings right is hard and everyone has opinions on what they should look like---but it actually sounds like you knew exactly what you wanted, and were frustrated that the standard library didn't have it. Instead, the standard library gives you a fundamental string type that one could use to build other more advanced string types when you need them.

The typical solution to problems like that is to go out and build what you need and put it on crates.io. Or, use one that already exists. :-) https://docs.rs/inlinable_string/0.1.8/inlinable_string/


Have you considered C#? It only has a single string class. With async-await, concurrency is fine too.

Not long ago, they open sourced the compiler and a subset of runtime, making it cross-platform: https://github.com/dotnet/core It’s a but tricky to install on Linux, but for me it works OK, at least so far (an embedded TCP/UDP server app).


I haven't, and its a good idea. I wrote a bunch of C# code back in 2007 and I consistently enjoyed it. - It seems like a very pragmatic language choice.

But if I'm going to move further away from the hardware in exchange for some language comforts & quality of life improvements, Elixir is the next language I want to try. I think both its concurrency primitives and immutability rules might be the right language-level defaults.


> move further away from the hardware in exchange for some language comforts & quality of life improvements

C# has descent native interop, i.e. [DllImport]. On Linux it imports from .so dynamic libraries. When you want to be closer to the hardware, because SIMD, or system calls not exposed to .NET, or integration with third-party C code, it usually works OK.


C# has some SIMD support since version 6.


Very limited support. On x86-64, the only languages that have good SIMD support are C (С++ gets that for free, ‘coz compatibility) and Fortran.


There is also D and Object Pascal, unless you won't consider inline Assembly as having support.

Oh and Fortran of course.

But yeah, I also find it sad having to go down to Assembly to make use of them.


Rust has two string types, not six. If you're referring to the `OsString` type, that exists only for platform interop, and there's no confusion as to when one needs to use it.


Graph in Rust: https://github.com/bluss/petgraph

SIMD in Rust: http://huonw.github.io/blog/2015/08/simd-in-rust/ (yes, still in nightly)


> Graph in Rust: https://github.com/bluss/petgraph

Thousands lines of code. Large amount of that is unsafe rust, even C++ is safer than that :-)

> SIMD in Rust: http://huonw.github.io/blog/2015/08/simd-in-rust/ (yes, still in nightly)

It was already “still in nightly” a year ago. Also it’s harder to do integer math with it, because type safety: very often, even consecutive instructions interpret these __m128i registers as different datatypes, u8x16 / i32x4 / u64x2 / etc.


> Thousands lines of code.

Have you taken a look at petgraph? It does quite a lot of things. The same functionality in C would be thousands of lines as well.

> It was already “still in nightly” a year ago.

SIMD is in Rust nightly not because it's immature, but because the Rust developers would rather design a portable interface than quickly standardize a nonportable one. Given that AFAIK neither the C nor C++ specifications include provisions for SIMD and all support is compiler-specific, the only difference between C/C++ and Rust here is that Rust follows a release model that features a nightly channel.


> neither the C nor C++ specifications include provisions for SIMD and all support is compiler-specific

The support is portable across compilers. You #include <[xepsiz]mmintrin.h>, and you’ll get these SIMD intrinsics as documented on intel.com.

BTW, OpenMP isn’t in the C++ language spec either, doesn’t prevents it from working on most compilers and platforms.


petgraph is a lot of code because it has a lot of features. You could equally well criticize C++ because boost is so much code. A simple graph with indices is a lot smaller, and it doesn't need unsafe either.


I haven't learned rust because of how annoying the rust evangelism is everywhere I read about it.


I am fluent in c, c++, c#. The performance level difference in implementations in some cases is over five orders of magnitude.

I envy people that don't have to worry or use pointer management. I imagine them coding in rust with one hand while drinking martinies with the other :)


This is exactly how I code, well s/martinies/burbon/. ;)

The compiler makes sure those types I'm seeing double of are legit.

Seriously, there's a reason the moto 'fearless programming' has been applied to Rust.


I don't know why it is not common knowledge that different people have different priorities, different ways of thinking about things and, well, that they are different. Not only that but people can even have opinions that contradict their other opinions. I write C, its powerful and that it stays out of the way. I also like assembly for the same reasons. But i also like scheme, that is the most opposite language to C that i know of.

Another reason i like these three languages in particular is that they are simple. There are no added features that would make me look them up, if, for example, i was reading somebody elses code or if i didn't code in a few months.

I like my data to be laid out how i want it and process it how i want it. You can say "but rust lets you do it with this", when in C i "just do it".

The whole memory safety argument is.. i want to say "fine" as people do make mistakes, but tools like static analyzers (aka linters) and valgrind exist.

As for the higher social aspect;

> I don't "tell" people, what language to use.

But you do. If you say to a newbie that their C project is "bad" because it is written in C, be it directly or indirectly, they will take it as if you are telling them to use another language (and when you say that that language is Rust, then.. well..). When the truth is that programming languages don't matter in most cases.

> But I do encourage them to look at new languages, especially ones that fit in the same place as one that they like.

I couldn't agree more, with the first part at least. I like imperative programming more then pure functional, but i did learn a lot by programming functionally in scheme (and by playing with some other languages that i will never use), and now the C that i write is better for it. edit: assembly is a good example of a language to learn just for the sake of knowing it.

Let us mention that learning abstract theory things also influences how we write things in a specific language. Things like the million forms of data structures and various.. ways of doing things (graph theory (that is surprisingly relevant to concurrency), touring machines vs lamda calculus, various ways of sharing state between parts of code, and.. i can't think of more now). In addition to that, how a computer works can also shape how we program. We have limitations, for example it is very important for performance to not trash the cpu cache all the time.

Now to go back to "reality"; Software is (usually) written to be used. Things that matter are what it does and its cpu/memory usage. For a program that is used to, idk, process some text once a month it doesn't matter what it is written in. But if the program is to be used daily on thousands of computers it starts to be more important that it performs well. That is my opinion at least, as the general opinion seems to be "modern computers have so much processing power and gigabytes of memory". Personally it pisses me of when a vital part of a system is hacked together so it "works", where examples are Glib usage in NetworkManager (and many others), and python (i rewrote phwmon in C because it uses too much memory and cpu, maybe some day i'l clean it and post it on github). If something is vital for day-to-day usage of a computer, then it better well not use hundreds of megabytes of memory and 100 times more cpu then it should (note that 100 times more cpu usage for many "system" things is still a tiny amount, but regardless). To paraphrase a quote that i can't find anymore "the best daemon/program is one that does its job but you don't know it is running", where an example would be dhcpcd (currently uses 196kB of memory and it used a total of 0.05 seconds of a cpu core (klogd is even better with 80kB of memory and 0.02sec cpu time)).

Then there are domain specific languages..

EDIT: I would also like to add that as much as we are different from each other, we are also much the same. And that we can change the ways we think, for better or worse (not that anything is either black or white).


Thanks for this post, it's got a lot of wonderful things that we can agree upon.

> I don't know why it is not common knowledge that different people have different priorities, different ways of thinking about things and, well, that they are different.

Obviously there are a lot of things that I will never understand. And people definitely have different priorities. In some cases choosing C (or insert your favorite programming language here) is chosen purely because you've built up a huge amount of experience with it and for the project you're working on you don't want learn something new. That makes a lot of sense. For me the reason Rust is so attractive is for a few reasons, and none of them are about it being zero-overhead, that's just icing on the cake. The Rust design philosophy, type system, threading model, mutability and trait system literally hit every sore spot I've encountered in my career working on distributed systems, especially around concurrency. Things that I started practicing in all of my code, were actually enforced by the compiler, this was so profound that I jumped into it whole hog. To your comment about different ways of thinking about things, it's definitely changed the way I approach problems.

> The whole memory safety argument is.. i want to say "fine" as people do make mistakes, but tools like static analyzers (aka linters) and valgrind exist.

I think back to your comment about priorities, I definitely value correctness over working code (i.e. it can sometimes take longer to get something working in Rust) and performance (I still opt for correctness and working, then go after performing). And clearly valgrind and lints aren't enough for security vulnerable software. But it does really depend on what you're targeting and why.

>> I don't "tell" people, what language to use.

> But you do. If you say to a newbie that their C project is "bad" because it is written in C, be it directly or indirectly, they will take it as if you are telling them to use another language

That's a fair criticism, and I certainly hope I've never said it directly. Here's a recent story I can relate, a friend showed me a really cool embedded system he had built; video, lens, accelerometer, etc. He had gotten it all working, but told me he had wasted an entire day on a use after free bug in his code. I had definitely been strongly encouraging him to look at Rust before this. He showed me what he'd done, it was amazing, I looked at the board type and he could have used Rust, but should he have stopped to learn Rust to do it to save him this one day? Probably not. It took me three hard weeks to learn Rust, but I pushed through and am very happy I did. Should he do this? I guess it becomes a question of how many of those days he thinks he'll run into as he builds in more features... Is it possible he's building the next IoT security disaster? Doubtful as it's a wearable... If he had already known Rust though, I am convinced he could have written the code more quickly and run into fewer bugs, like the one he related.

For the newbies getting started in C, I do think it's a great foundational language to learn. In fact, for no other reason everyone should learn it as it's currently the only reliable lingua franca FFI on most platforms out there. So it's ABI is needed for doing anything between various languages. But at the same time, Rust holds your hand in a way that C does not. It does not let you shoot yourself in the foot in ways that C does, unless you tell it to take the safety off the trigger. So, for newbies getting into systems programming, even though the language can be a little daunting initially, it increases the success rate of producing correct systems code.

> When the truth is that programming languages don't matter in most cases.

Well, clearly in some cases they do, or we wouldn't be having this conversation ;)

> If the program is to be used daily on thousands of computers it starts to be more important that it performs well.

I think we're in agreement through this entire paragraph, and this is where C really excels, but everything you list is also exactly what Rust is really good at. Again, I go back to correctness, this is a huge value of mine in my code. And it's not just the compiler in Rust that helps guarantee correctness, it's also the really excellent #[test] support that is integrated directly into the language. It was the first time I had ever seen that done, where an external testing framework wasn't needed to write tests. It saves so much time in setting up proper tooling and build files, etc. It's freeing.

It sounds like we've had experience with much the same languages, but have come out with some different priorities based on experience. And what person has the right to question another's experiences? Certainly not I.


That's one of the main reasons Ada lost to C and C++ in the DoD world. The other (and more significant) being cost of implementations.

Personally, I prefer my work to be in the language or languages which suit it best rather than making my choice to spite others for a perceived and, usually, non-existent slight.


There is no inherent benefit in C that, for example, a somewhat modified version of Pascal or Algol wouldn't have inherited.

I happen to like C and think it has a lot going for it, but you're definitely right about this. The original Mac OS was written in Pascal (and assembly), not C. And Turbo Pascal was deservedly popular for a good while. In an alternate universe, Pascal rather than C could be the incumbent legacy systems language.


Pascal suffered because its standard was woefully incomplete, and every implementation added mutually incompatible proprietary extensions. Turbo Pascal and all the TP developed code died when DOS died.


Turbo Pascal did not die, it was reborn as Delphi and only decreased its market share because many key developers ended up going to Microsoft, while Borland management decided enterprise customers were more important than indies.


What year did Borland make that call? I remember my HS programming class in the late 1990s used a Borland compiler and development environment. I don't recall it being "enterprisey" at the time. That being said I was in HS, and likely associated the word "enterprise" with starships more than big companies.


When they re-braded themselves as Inprise, 1996.

https://en.wikipedia.org/wiki/Borland


Hmm... I first touched a Borland compiler in the 1998-99 school year. I had programmed before that, but it was my first time programming a GUI with the Win32 APIs. I made a Battleship game with a very rudimentary AI so you could play against the computer. Not sure if Borland was good or evil at the time, but those memories will certainly endear me to them.


You can revive Turbo Pascal code with Free Pascal. Here's a port guide:

https://www.freepascal.org/port.var


You have to wonder, though, why the enduring big dogs of OSs are written in C.

Classic Mac OS was written in Pascal, but it had to be scrapped and replaced.


I remember using System 7, which I later learned was the result of a big rewrite of much of the OS in C++ rather than Pascal. It had some nice features, and the rewrite was probably a wise decision for maintainability and future expansion, but compared to System 6 it was sloooooooooow


Pressure from customers coming from UNIX, and Apple trying to get into UNIX with A/UX.


It's pretty simple: C was already popular -- meaning: at least decent-ish compilers for most platforms.


C got popular because it turned out to be ideally suited for DOS, which was by far the dominant target for programmers for over a decade.

For example, C could deal with near/far pointers. No other language could. Early C implementations were also usable on DOS, early [other language] implementations were unbelievable unusable, and believe me I tried.

There were many diverse, usable, and cheap C implementations available.


The popularity of C had absolutely nothing to do with DOS. The most popular languages on early PCs were BASIC and Pascal. The popularity of C is linked to UNIX, from where it was brought to DOS. The far/near pointers did not play especially well with C view of pointers as integers of some fixed size. The 32-bit flat memory model made C programming somewhat closer to what it used to be in UNIX.


> The popularity of C had absolutely nothing to do with DOS.

You might want to read popular programming magazines from the early 80s, and the attention given to C on DOS. (It was enormous.) At one time, I counted 30 C compilers available for DOS. What other platform came remotely close to that?

> The most popular languages on early PCs were BASIC and Pascal.

BASIC was indeed popular, but generally not for professional programming. Pascal had nowhere near the penetration of C in the early days (1982, 1983, etc.). Microsoft Pascal 1.0 was unusable, the top C compiler was Lattice C.

> The popularity of C is linked to UNIX, from where it was brought to DOS.

Unix was nowhere remotely as popular as DOS.

> The far/near pointers did not play especially well with C view of pointers as integers of some fixed size.

As a DOS C compiler writer, I can attest that near/far mapped very well onto C semantics. The C Standard in 1989 was very careful to not upset that.

> The 32-bit flat memory model made C programming somewhat closer to what it used to be in UNIX.

That was much later. But since you brought it up (!), 32 bit DOS extenders were in wide use on DOS, and were programmed with C. I don't recall Pascal ever existing on them, but perhaps I misremember. C was also popular on the 16 bit DOS extenders, I don't remember Pascal on those, either.


As mentioned on some other thread, there was hardly any C being used outside UNIX during those days in Portugal.

It was all about Pascal, Basic, Assembly and of course xBase notably Clipper.


> For better or worse (personally, I think, for the better), they're starting with JavaScript, Python, Ruby, or even PHP.

These higher level languages are still built with C(Python,PHP) or C++(V8 Javascript engine) though, so C is still a language of choice for that task, and writing a library in C allows integration with all these high level languages at very little cost. So there is still incentive for people using higher level language to write C. I mean, PHP is useless if the goal is to share code with Python or Javascript.


There's an incentive to learn assembly language, too. C is useless if the goal is to, for example, write constant-time crypto implementations.

I'm not saying there is no reason for anybody to learn C, just that most programmers aren't learning C anymore.


Did you even the read the article? How are you comparing JS, Python (or any language "gaining popularity" in your metrics) to C, when the author clearly describes the set of problems only solvable in C?


The point is that those problems are (a) not only solvable in C; (b) not problems that most programmers need to solve. That is why C and C++ are declining in usage.


Are they, really?


A "somewhat modified version of Pascal or Algol" would be just C with a different syntax.

C is remarkable for what it doesn't have. It sits at a nice local optimum point that a portable assembler must occupy.


Is such a safe implementation of C really suitable for systems programming, rather than merely application programming? If we understand system-building as communicativity, then certainly such a system retains communicativity—so long as alien objects can be described to it in a manner sufficient for dispatching the same dynamic checks. If I memory-map a file, say, I can safely access that memory only if the structure and meaning—the bounds and the types, roughly—are described much like those of other in-memory objects. Tools and systems for providing these descriptions are currently lacking—but are a logical extension of the runtime type information already developed in recent work. In the case of file formats, some cases like the ELF example we saw earlier (§5.5) show that the format has already been defined for us, thanks to the manifest layout of objects declared in C.

This is a key point. There are scattered systems for describing the layout of arbitrary binary data—C structs/unions, Erlang binary patterns, ASN.1 Encoding Control Notation, Kaitai Struct[1]—but nothing has really caught on across language boundaries. It's hard not to feel this data format barrier when you're using a C API from another language. We'll need to do something about this barrier if we want a true multi-language system (not just a bunch of awkward C FFIs).

[1] http://kaitai.io/


Certainly, but for instance, take one of your examples: Kaitai Struct. It doesn't have support for C (at least it's not listed among the languages on its homepage). OTOH, for more complex payloads I've often seen Protocol Buffers used (yes, I know they don't have native C support either but there's lots of good libraries for using `protobuf`s with C).

The thing with FFIs is that above all we want them to be fast and simple. C rules for laying out structs generally means no parsing necessary, with direct access to fixed offsets for everything you want. If you're ever having problems figuring out the layout of a struct, it's relatively straightforward to just dump some simple load/store code into a compiler and have a look at what it does (assuming you can understand assembly at a basic level): https://godbolt.org/g/khGPWA


Related: "Safe Systems Software and the Future of Computing by Joe Duffy" at RustConf 2017.

https://www.youtube.com/watch?v=CuD7SCqHB7k

I summarized this excellent talk here [1], but one of the main points is that compatibility with existing systems is important for adoption. (They learned that the hard way -- by having their entire project cancelled and almost everything thrown out.) He advocates unit-by-unit rewrites rather than big-bang rewrites, just like Kell does in this conference article.

And compatibility with C in Windows should be easier than it is in the Unix world, because the whole OS is architected around a binary protocol AFAIK -- COM.

My sense is that Rust may not have thought enough about compatibility early in its life. Only later when they ran into adoption problems did they start talking more about compatibility.

Also, it seems Rust competes more with C++ than C, and there seems to be very little attempt to be compatible with C++ (although perhaps that problem is intractable.)

Personally I don't think Rust will be a successful C replacement. It will have some adoption, but the Linux kernel will still be running on bajillions of devices 10 years from now, written in C. And in 20 years, something else will come along to replace either C or Linux, but that thing won't involve Rust.

[1] https://www.reddit.com/r/ProgrammingLanguages/comments/6y6gx...


> My sense is that Rust may not have thought enough about compatibility early in its life. Only later when they ran into adoption problems did they start talking more about compatibility.

Of course Rust thought a lot about compatibility with C in its early days. I remember fast FFI was in Graydon's very first presentation about the language in 2010. Almost everything about the language changed, but that focus did not.

> Also, it seems Rust competes more with C++ than C, and there seems to be very little attempt to be compatible with C++ (although perhaps that problem is intractable.)

Rust has gone pretty far in wanting to be compatible with C++, with the C++ stuff added to bindgen for Stylo. We've gone further than most other languages. It's not fair to say there's been "very little attempt": we literally couldn't have shipped Stylo to Nightly Firefox without doing the work to bridge C++ and Rust.

From your other post, it seems that one of your main complaints is that Cargo exists instead of having Rust use Makefiles. All I can say is that the reaction to Cargo from Rust programmers is overwhelmingly, almost universally positive, and abandoning Cargo in favor of Makefiles would instantly result in a fork of the language that would take Rust's entire userbase. Not solving builds and package management is not a realistic option for a language in 2017.


Following the logic of the article, Rust has made the exact same mistake every other language has made, which is to conceptualize compatibility with the C ecosystem has merely an issue FFI. Rust is hardly the first language to focus on easy FFI from day 1, but according to the article that's not nearly sufficient. And like most other modern so-called systems language, Rust hasn't gotten around to committing to a stable, exportable ABI. In fact, I think much like Go the general sentiment is that this is largely undesirable, as stable ABIs can cripple evolution of the implementation, especially those that rely on sophisticated type systems.


> And like most other modern so-called systems language, Rust hasn't gotten around to committing to a stable, exportable ABI.

That's not true. The C ABI is stable and exportable, and you can opt into it on a per-function basis. We do that for integration with existing projects all the time.

Again: All of you are talking as though the idea of integrating Rust into a large C++ project is some far-fetched theoretical idea, and that we made some obvious mistakes that make this goal impossible. In fact, we're shipping an integrated Rust-C++ project today: stable Firefox, used by millions of users.


I'm not arguing that it's too difficult integrate Rust with C or C++ projects. I'm simply trying to get at the distinctions that the article is making, which are rather subtle.

One aspect of Rust that fits well, IMO, with the characteristics the article argues are under appreciated is its emphasis on POD--objects as compact, flat bytes. That puts Rust much closer to achieving what C does best (again, according to the article), which is first-class syntactic constructs over memory--namely, pointers. But it falls short in the sense that to _export_ Rust objects (rather than import alien objects into Rust) you have to do so explicitly. And presumably the author would argue that Rust is significantly undervaluing the benefit of a stable ABI that would allow other applications to import Rust objects without an explicit language-level construct (i.e. explicitly annotating APIs with no_mangle).

Obviously when you're building a large application, cathedral style, the requirement to explicitly annotate is not only less burdensome, but quite useful (for many reasons). But in a larger, heterarchical ecosystem of software, that's actually quite limiting. Our first instinct is to argue that permitting such unintended peeking behind the curtain is dangerous and unnecessary, but the article speaks directly to that.

Imagine a Rust with a stable ABI that was exported via Sun's CTF format. CTF is like DWARF but much simpler (and thus little incentive to strip it), and it's being integrated into both OpenBSD and (I think) FreeBSD to facilitate improved dynamic linking infrastructure. Rust could even, theoretically, continue randomizing member fields. And this data could be consumed by any language's toolchain, not simply Rust's toolchain. That sort of language-agnostic, holistic approach to interoperability is largely what I think the article is getting at.


I'd be all for a standard language agnostic ABI. I'm not on the language design team anymore, but I suspect you wouldn't have any trouble convincing them to get on board with such a thing either. The ones you'd need to convince would be the C++ folks, I suspect :)


Yes, that is what I was referring to. Calling sin() is not enough. It's messy but C programs need more than that.

And I was also referring to the similar issue in Go where calling C -> Go and Go -> C isn't symmetrical. Not sure if that's true for Rust or not.


> It's messy but C programs need more than that.

Of course they do. That's why Rust has a sophisticated tool, bindgen, which is used in production right now in Nightly Firefox (among other places) to export complex C++ interfaces in both directions across the language boundary.

> And I was also referring to the similar issue in Go where calling C -> Go and Go -> C isn't symmetrical. Not sure if that's true for Rust or not.

It's not. You just write "#[no_mangle] extern" on your function Rust and C can easily call it, with a stable ABI.

In order to meaningfully criticize Rust's FFI, you need to be aware of how it works.


Well, just saying it has fast FFI doesn't tell me much. Being able to wrap something like sin() was in Python 1.0, but most applications need more help than that. There have been 5+ popular systems since then trying to make the experience better... it still is barely solved.

That said, I admit I'm more on the pessimistic side. Having touched Go before it's open source release in 2009, I didn't think they thought enough about integration either. I think it was worse than Rust, because you couldn't call Go from C or C++, unless the main program was in Go.

Also their build system isn't used inside Google. And they do nontrivial stuff with signals and threads.

But Go seems to be being adopted. However there is an important distinction: Everybody is rewriting new versions of Google-style servers in the open source world. But all the stuff at Google is still in C++.

So I think nobody ever rewrites old software. They write new versions of similar things, and then hopefully those new things get adopted. But the old thing will probably be around for a long time too.

And to be fair C didn't replace Fortran or Cobol either -- scientific applications still use Fortran and old banks (apparently?) still use Cobol on mainframes.

Maybe that's the most you can expect. But in that case there still does need to be a "plan" for making existing C code like the Linux kernel and OpenSSL safer. I think my issue is that some people apparently think that plan involves Rust when it doesn't. Maybe the core team has never pushed that idea but some other people seem to be under that illusion.

-----

This is a different argument, but a language only needs to "solve" package management if it always assumes it has main(). I was looking for something more humble that you could stick in a file in an existing C or C++ project, e.g. for writing as safe parser.

Also the 5+ different Python + C/C++ solutions now need a Python + Rust analogue. For a language at the Rust layer, there's this O(m*n) problem or strong network effect to deal with.

Actually that was thing I was thinking while reading this PDF -- a lot of it can be boiled down to "C and C++ have network effects". Particularly C++.

Asking Rust to break the network effect is like asking Apple to break the Windows monopoly with Mac OS X. That didn't happen -- they built the new thing iOS, and beat Windows with that. So then the question is if Rust is more like OS X or iOS.


> So I think nobody ever rewrites old software. They write new versions of similar things, and then hopefully those new things get adopted. But the old thing will probably be around for a long time too.

That's very true. The most we can hope for is that Rust and other languages, such as Go and Swift, continue to chip away at the market share of C and C++. It'll be a long process.

I'm not a "rewrite everything in Rust" booster; as much as I would like to, that won't realistically happen. Instead, I see Rust as another player in the "programming language Renaissance" that has been going on since the mid-2000s. C and C++ are losing their dominance and instead are becoming part of a broad ecosystem of languages. And that's great: the fact that we have so many choices in languages now has been a very good thing for productivity and security.

> Actually that was thing I was thinking while reading this PDF -- a lot of it can be boiled down to "C and C++ have network effects". Particularly C++.

I agree. That's why I think this paper overanalyzes the success of C and C++. They became dominant because of network effects: simple as that.


I think the article helps to explain why C was able to leverage network effects so well. Neither C nor Unix came out of the gate in a dominating position. Indeed, it's arguably only in the past 20 years that it clearly dominated. Fortran, Pascal, and a bevy of other languages were at times much more widespread and influential. Even today C isn't the most used language. And yet it's influence continues to be outsized.

C isn't just a language, it's an entire ecosystem of toolchains and software that facilitate network effects. "Chance" is far too convenient an explanation. No doubt chance had a significant role, but if C were as useless, unsafe, and devoid of redeeming qualities as many people argue, then I don't see how C could have benefited so strongly from network effects.


C wasn't useless and unsafe at the time it became dominant. It was quite state-of-the-art at the time. We've just learned more about what works well and what doesn't in programming languages since 1978, which is why C is no longer as dominant as it once was.


Hoare did not think like that.

"Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to--they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law."

This was in 1981, and "language designers and users have not learned this lesson." was a jab at C.

Also Xerox and ETHZ were busy using safer systems programming languages.

ESPOL and NEWP, already using UNSAFE blocks were the state of the art systems programing in the 60's.


It was quite state-of-the-art at the time.

Most of the criticisms you hear about C today (type safety, memory safety, no garbage collection) were criticisms that C got when it was first invented.

In fact there are fewer criticisms of C than there used to be: a lot of the early criticism centered around syntax, but the C syntax kind of won, so you don't hear that anymore.


> All I can say is that the reaction to Cargo from Rust programmers is overwhelmingly, almost universally positive

You can prove anything when you introduce a sampling bias that large.

> Not solving builds and package management is not a realistic option for a language in 2017.

Package management was solved decades ago, my OS manages the packages and a lean and mean system is the result. The rust solution results in massive binary sizes for simple command line tools. This is fine if there goal is to replace java, but not if they want to replace c.


Many existing Rust users were extremely skeptical when Cargo was announced, many said they'd stick with Makefiles. In the end, they didn't.

> Package management was solved decades ago

If it was, there wouldn't be new package managers popping up all the time; it's a non-trivial problem. They're not created for no reason.

> The rust solution results in massive binary sizes for simple command line tools.

This isn't exactly true, or rather, you're comparing two different things. https://lifthrasiir.github.io/rustlog/why-is-a-rust-executab... has the details.


> If it was, there wouldn't be new package managers popping up all the time; it's a non-trivial problem. They're not created for no reason.

Notice how all those package managers are for platforms or create platforms in their own right. Rust is meant to be a systems language, that means it's platform is the OS and it doesn't get to be a world unto itself like java.

> This isn't exactly true, or rather, you're comparing two different things. https://lifthrasiir.github.io/rustlog/why-is-a-rust-executab.... has the details.

So if you jump through a million hoops, limit yourself to c libraries you can produce small executables. At that point it's more complicated than just writing an app in c in the first place.

I'm not interested in what it can technically do though, I'm interested in what is practically happening. In practice most rust programmers seem to be writing apps for the cargo platform. In practice rust developers are producing huge executables. In practice rust has no stable ABI so all rust libraries get statically compiled. In practice this is incompatible with the LGPL. In practice a security vulnerability means every app using the library has to be recompiled to be secure.


> Rust is meant to be a systems language, that means it's platform is the OS and it doesn't get to be a world unto itself like java.

Rust is meant to be a cross-platform systems language, and sadly there does not exist a cross-platform package manager. Until one exists (and I'm not holding my breath here), every language which intends to be cross-platform will continue inventing its own package management.


That's a great point: we effectively tried the "just use Makefiles" solution already. It failed.


> You can prove anything when you introduce a sampling bias that large.

I'm confident that programmers don't want to be writing Makefiles. We don't have to take formal surveys to observe the obvious trend away from raw "make" that has been occurring for decades.

Besides, if Rust programmers really had a problem with Cargo, they would tell us. Programmers don't suffer in silence.

> Package management was solved decades ago, my OS manages the packages and a lean and mean system is the result.

I'm glad you like your package manager. Most programmers, including me, don't want to have to deal with it when the goal is simply to put a Rust project together. Besides, we ship desktop software on Windows: we cannot tell our users "sorry, you need to install Ubuntu".

> The rust solution results in massive binary sizes for simple command line tools. This is fine if there goal is to replace java, but not if they want to replace c.

The Rust solution is customizable. You can use dynamic libraries if you like, and earlier prerelease versions of Rust did in fact do that. Dynamic libraries are a single rustc flag away.

The feedback we got was that people preferred the convenience of a single standalone binary to the complexity of dynamic linking managed by the OS.


> I'm glad you like your package manager. Most programmers, including me, don't want to have to deal with it when the goal is simply to put a Rust project together.

This is the same attitude that makes electron so attractive. As a user, I don't care what makes your life easier as a developer, I care that I'm getting a more bloated and less secure result. This is an awful attitude that's creeped into software development lately.

> Besides, we ship desktop software on Windows: we cannot tell our users "sorry, you need to install Ubuntu".

So bundle them on windows, bloated installers are the norm their already. You're probably going to have to include an auto updater and a lot of other stuff that windows doesn't provide as well. Not having to deal with that stuff is part of why I use ubuntu in the first place.

> The Rust solution is customizable. You can use dynamic libraries if you like, and earlier prerelease versions of Rust did in fact do that. Dynamic libraries are a single rustc flag away.

Until there is a stable ABI that isn't a solution because you have to distribute those libraries with the app.


> As a user, I don't care what makes your life easier as a developer

You should. The easier my life is, the faster I can fix bugs and put out new releases.


I'm dealing with the result of this attitude on my phone right now. The end result is I can't even install your app because I'm out of space on my phone. I'm out of space because every other app maker favored developer productivity over being conservative with users resources.

It's a tragedy of the commons.


Seems like you are shifting the goal posts. If I'm building something to run on resource constrained devices, then it makes sense to value use of resources more highly! But otherwise, most of your comments just seem to repeat the same old dynamic vs static linking debate that had been hashed out already for decades. There is no one right answer. Trade offs abound.

People who expect a stable ABI from Rust such that normal Rust libraries can be dynamically linked like you would C libraries would do well to adjust their expectations. It isn't happening any time soon.


> most of your comments just seem to repeat the same old dynamic vs static linking debate that had been hashed out already for decades. There is no one right answer. Trade offs abound.

Rust doesn't let me make that trade off, it's made the decision for me.

> People who expect a stable ABI from Rust such that normal Rust libraries can be dynamically linked like you would C libraries would do well to adjust their expectations. It isn't happening any time soon.

I think it's the rustaceans that need to adjust their expectations, as long as this holds rust won't be a real systems language, it stands a better chance of unseating java than c.


> Rust doesn't let me make that trade off, it's made the decision for me.

Umm, right, exactly, the state of having a stable ABI is one set of trade offs, and even if that were option, electing to use it for dynamic linking is another set of trade offs. I feel like I was obviously referring to the former, but if that wasn't clear, it should be now. An obvious negative is exactly what you say: you can't use standard Rust libraries like you would C libraries. That's what I meant by trade offs. But there are plenty on the other side of things as well.

> I think it's the rustaceans that need to adjust their expectations

Sure! We do all the time! I'm just trying to tell you the reality of the situation. The reality is that Rust won't be getting a stable ABI (outside of explicitly exporting a stable C ABI) any time soon. If that means flukus doesn't consider Rust a systems language, then that's exactly what I meant by adjusting your expectations. But don't expect everyone to agree with you.

From personal experience, a lot of folks don't care nearly as much as you do about things like "the binary is using 2.6MB instead of the equivalent C binary which is using only 156KB." Now if you're in a resource constrained environment where that size difference is important, then that's a different story, and you might want to spend more effort to use dynamic linking in Rust, which you can do. You won't get a stable ABI across different versions of rustc, but you can still get the size reduction if that's important to you in a specific use case.


I've previously suggested here that OSes and OS manufacturers should test and rate apps for tightness, and punish apps that aren't tight by handing them fewer resources - running them noticeably slower.


It's common thinking (often a misconception) that C programmers only grudgingly use C because it does some vital thing that all these other "managed" and "safe" languages cannot not: if only those other languages added that feature, all C programmers, having no more reason to stay, would finally be able to abandon C! This is a good list of positive reasons to prefer C even if other languages are also suitable.


Here's a good quote: "Unless we can understand the real reasons programmers continue to use C, we risk researchers continuing to solve a set of problems that is incomplete and/or irrelevant, while practitioners continue to use flawed tools."

In other words, stop blithely claiming that everyone is stupid for using C/C++. Instead, find out why they use it. Then, if you continue to think that C/C++ needs to be replaced, find a better way for those people to do what they are doing that they currently find C/C++ to be the best way to do.


C is frequently praised (including in some of the posts here) for its suitability for real-time and embedded systems development, but the author appears to be proposing modifying the C runtime and code generation in ways that, when done in other languages, are claimed to render them unsuitable for these purposes.

I think researchers are justified in looking for solutions for common problems, even if many C programmers will be uninterested in them, so I will not reject his proposals peremptorily.


> C is frequently praised (including in some of the posts here) for its suitability for real-time and embedded systems development, but the author appears to be proposing modifying the C runtime and code generation in ways that, when done in other languages, are claimed to render them unsuitable for these purposes.

Don't people think there are things C could improve that wouldn't affect its suitability for these tasks? I mean getting namespaces doesn't strike me as being an hindrance for real-time systems for instance. A boolean type and true|false as keywords instead of macros? tagged unions? multiple return types to deal with errors more easily? more facilities in the language in general to avoid the use of macros to make up for its lack of polymorphism? to me macros always felt like a lazy cop-out.


Tagged unions? -- please no... I could be convinced to completely lose unions in C, though (pointer to member or base of struct can be cast to another struct anyway, so losing unions doesn't gain anything; for the same reason tagged unions just would not be useful)

Boolean type? Sure, but that would be dependent on use. What is wrong with a bitfield one bit wide instead? What may be useful is "packed bitfield" type ("packed" in Pascal). Then, array of packed bit could be expressed.

Multiple return types - yes, "return a b;" (or something would be nice.

Lack of polymorphism - reference "void *" The main problem is that calls cannot be constructed in C (that is, the standard does not have a "C to C" FFI.

Anyway, just food for thought.


Unions would be nice if the syntax for accessing substructure members could be nominally short circuited. For example:

struct ab { int a; int b; };

union c { struct ab ab_short_circuit; int a; };

union c c1; c1.a = 1; c1.b = 2;


That already exists, as a Microsoft extension, and if the struct is declared within the union, in standard C: https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/Unnamed-Fields....

(However, in your example, c1.a is ambiguous, so it won't compile.)


I did not realize this as a Linux user--thanks.

(But this is my point, it is should not be ambiguous.)


In fact, all of the things you propose are fairly simple. Leads one to wonder why C programmers haven't implemented these trivial things. Something about the C ecosystem seems to abhor standardization, modernization, and distribution. May have something to do with the kinds of people choosing to use C every day, despite the existence of an endless supply of potential replacements. I'm fairly sure the people who want all that stuff can already find it elsewhere, after all.


That is true for some things, though the preprocessor (which was not a "lazy cop-out" at the time) will be required for backwards compatibility. Some of the author's proposals, however, such as checked array access and garbage collection, would have runtime consequences.


You mean the best way isn't berating them for using an "unsafe" language, suggesting they're dinosaurs for not using something newer, or any of a number of other things some do? Who'd have thought.


> Language migration: all-at-once or not-at-all. Like any language, C persists partly because replacing code is costly. But perversely, the implementation technologies favoured by more modern languages offer especially unfavourable effort/reward curves for migration. Migration all at once is rarely economical; function-by-function is probably the desired granularity.

D's new "Better C" support allows for function-by-function granularity in building chimera programs that contain any mix of D and C. It's much more than having merely access from D to functions written in C.

https://dlang.org/blog/2017/08/23/d-as-a-better-c/


Started reading the article, once I reached the second page I lost interest and scrolled to the conclusion.

Based on the parts I read, the writing style is needlessly verbose, and the author is not saying anything which hasn't already been said.


I agree with the verbosity. But many of those in the demographic that would do away with C belong more to the current pop-culture of "coding!!1" instead of the carefully-considered, patiently-implemented ancient art of engineering.

The pop-culture "coding" collective as a whole is not generally known for its appreciation of terse explanations. I will admit that it does favor immediate gratification, though; there is that.

Ideally, this would do the rounds with different people excerpting different bits of it. That would spark many little conversations over time, and contribute to keeping the discussion going. That would be nice.


Personally, I appreciated how the author spent a decent amount of time "unpacking" what he meant- for instance, the "To manage or to mediate" section. Terseness is only useful when you already have a shared protocol for understanding the message and a guarantee that it won't be garbled along the way.

On the other hand, if you're trying to communicate to folks without that shared protocol (in this case, to people who aren't familiar with/haven't spent much time using C as a primary language) it's kinda necessary to go a little further to get the point across.

> Ideally, this would do the rounds with different people excerpting different bits of it. That would spark many little conversations over time, and contribute to keeping the discussion going. That would be nice.

Absolutely agree.


> More generally, C’s notion of memory, arranged in an address space, allows code to address (point to) and access (read, write, call) objects inhabiting that space. Unlike most other languages, those objects need not have been defined within our program. In fact they even need not behave in the same way as such objects. Despite this, in all cases we access them in the same, uniform way.

But can't a systems language like Rust do this, too?


Yes, it can and does. A common usecase for Rust is to write libraries that are then linked in to higher-level language VMs and interpreters, as a way of extending those languages, a process which inevitably involves accessing memory that wasn't allocated by Rust and that behaves according to whatever invariants the higher-level language imposes.


I like it. Whenever one of these C-shortcomings articles come up, we get the obligatory "rewrite it in Rust!" and "we already rewrote it in JavaScript!" comments.

Even so, there is A LOT of software already written in C/C++ that isn't going to be converted any time soon, and if you could tweak the compiler in such a way that makes those programs just 1% better, that is a REALLY BIG THING.

So, good on you Stephen Kell for this constructive paper!


You can pry gcc out of my cold, dead hands when your fancy type-safe high level languages will let me do things like:

* Fork a running program to enable analysis or serialisation of program state without blocking, or

* Use mmap to allocate all my datastructures on disk, or

* Have full control over what happens when my program receives a signal, or

* exec another program but have it to inherit all the open file descriptors and network connections, or...


I regularly do all of those things . . . in Perl, with either core features of the language, or ubiquitous, well-supported libraries, in readable, concise code that doesn't "fight" with the language/runtime.

Something like Java definitely makes some of those things very hard née impossible, but not all high level languages are the same in those regards.

(And yeah, I just called Perl "readable". Bite me.)


Half of the things you mention are related to system calls, any language that has a C ffi/provides syscall in stdlib will let you that.


First class communication as a feature. Notice that many of the more popular languages value the ability to link to C libraries. Most languages have a way to call (or even statically link) external C code. It's not as easy to do the same with other languages because they lack this ease of interfacing. It's easy in C because it's low level, everything is plain old data and function pointers.


Low level backward compatibility with portability idioms is great for enduring software assets. Anyone not sweating speed and space between equivalent possible implementations is a different kind of useful software developer with different operative quality criteria and execution risks. Rust will hopefully enjoy a long and boring stable future.


Whilst not ever close to understand the details, I am still wonder on how memory and pointer manipulation especially in embedded system and system programming can be replaced other than by assembler.

Of course once you have the core in c and assembler, you try to move to higher level or domain specific "language". Even word or this box in browser is high level supported ultimately in that.


Hi,

I think C's success is also because it was designed while solving real problems when writing UNIX. Unfortunately the newer languages that claim to be "systems" languages were not designed while building operating systems. Here is Dennis Ritchie's assessment of reasons for C's popularity:

(Extract from http://csapp.cs.cmu.edu/3e/docs/chistory.html).

C has become successful to an extent far surpassing any early expectations. What qualities contributed to its widespread use?

Doubtless the success of Unix itself was the most important factor; it made the language available to hundreds of thousands of people. Conversely, of course, Unix's use of C and its consequent portability to a wide variety of machines was important in the system's success. But the language's invasion of other environments suggests more fundamental merits.

Despite some aspects mysterious to the beginner and occasionally even to the adept, C remains a simple and small language, translatable with simple and small compilers. Its types and operations are well-grounded in those provided by real machines, and for people used to how computers work, learning the idioms for generating time- and space-efficient programs is not difficult. At the same time the language is sufficiently abstracted from machine details that program portability can be achieved.

Equally important, C and its central library support always remained in touch with a real environment. It was not designed in isolation to prove a point, or to serve as an example, but as a tool to write programs that did useful things; it was always meant to interact with a larger operating system, and was regarded as a tool to build larger tools. A parsimonious, pragmatic approach influenced the things that went into C: it covers the essential needs of many programmers, but does not try to supply too much.

Finally, despite the changes that it has undergone since its first published description, which was admittedly informal and incomplete, the actual C language as seen by millions of users using many different compilers has remained remarkably stable and unified compared to those of similarly widespread currency, for example Pascal and Fortran. There are differing dialects of C—most noticeably, those described by the older K&R and the newer Standard C—but on the whole, C has remained freer of proprietary extensions than other languages. Perhaps the most significant extensions are the `far' and `near' pointer qualifications intended to deal with peculiarities of some Intel processors. Although C was not originally designed with portability as a prime goal, it succeeded in expressing programs, even including operating systems, on machines ranging from the smallest personal computers through the mightiest supercomputers.

C is quirky, flawed, and an enormous success. While accidents of history surely helped, it evidently satisfied a need for a system implementation language efficient enough to displace assembly language, yet sufficiently abstract and fluent to describe algorithms and interactions in a wide variety of environments.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: