Coding in C is like camping. It's fun for a while, but eventually you really miss things like flushing toilets and grocery stores. I like using C, but I get frustrated every time I hit a wall trying to build features from other languages. C is a very WET language.
C is basically it for embedded development though. I've gotten so tired of recompiling and waiting to flash a chip with C, that I've started learning ANTLR and making my own language. The idea is to have a language which runs in a VM written in C, and allows you to easily access code which has to be in C. Sort of like uPython, except it can easily be used on any board with a C compiler with a small amount of setup, and C FFI is a first class citizen. Also coroutines masquerading as first class actors with message passing, since I always end up building that in C anyway.
In modern world, any sufficiently complicated C or C++ program embeds a runtime of some higher level language. More often than not, that language is formally specified, well-supported, and fast.
Sometimes it’s indeed LISP like in AutoCAD, but more popular choices are JavaScript (all web browsers), LUA (many video games and much more, see Wireshark), Python, VBScript or VBA (older Windows software), .NET runtime (current Windows software and some cross-platform as well, like Unity3D game engine). These days, it’s rarely a custom language runtime, but it still happens, like Matlab.
Which of those is actually formally specified? Also, calling Python or JavaScript any effieicnt sounds a bit unwarranted. Consider any JavaScript desktop application, its power and memory usage... it's a mess.
Many of them are international standards, see ANSI INCITS 226-1994 for LISP, ECMA-262 and ISO/IEC 16262 for JavaScript, ECMA-334 and ISO/IEC 23270:2018 for C#. Lua and Python aren’t standardized, but even so, they both have a comprehensive formal spec.
> calling Python or JavaScript any efficient sounds a bit unwarranted
Both are much slower at number crunching compared to C or C++ but they aren’t that bad, either. Most JavaScript VMs feature a good JIT compiler. Some Python runtimes have JIT too.
> Consider any JavaScript desktop application, its power and memory usage.
There’re good ones, like VSCode. On average they’re indeed not great, but I think it’s just an unfortunate consequence of low entry barrier. It’s easy for inexperience people to get started with the technology. And the ecosystem is misjudged based on the output of these inexperienced developers.
Doesn't always mean it's particularly performant, unfortunately. The JSON lib in PHP is still C (part of Zend), but it's very susceptible to malloc failures (one big contiguous request for the whole shebang), and it's generally way faster to serialize arrays into a .PHP file you load than into JSON if you're storing it locally. I compared what I read there to the serde stuff for rust and the difference is stark.
Maybe having most of this stuff in c libs with scripts wrapped around them will make it easier to migrate. Keep the same py but swap out the lib from C to some rust that's still useful in a crate for pure rust projects.
> using less than 100mb of memory. Doesn't seem so bad to me
You know... any embedded programmer reading that comment is probably laughing hysterically. In most cheap or energy efficient microcontrolers, more than 4 Mb is considered a luxury.
There is no virtue in using less memory for the sake of it. If a desktop application is taking 100mb, you can have a lot of those running before modern boxes start to struggle.
Whether 100Mb usage is unacceptably large is mostly dependent on the precise features in play:
* 32-bit vs 64-bit application(64 bit will bloat all pointers)
* Methods of loading and editing documents(modern editing attempts a lot of introspection based on highlighting syntax or project environment)
* Character encoding support(Supporting current Unicode rendering would almost immediately take you beyond 4Mb)
Interactive editing is a pretty memory-hungry task compared to a simple viewer or batch processor. There's more reason to keep things cached. If you actually go back to old text editor versions from the days of 4Mb desktops, you'll find yourself missing stuff. Not so much that you can't get by, but enough to give pause and consider taking the hit.
Depends on the exact numbers. For instance, Allwinner R328 chip has 2 ARM cores running up to 1.2GHz, includes 64MB or 128MB RAM, and the price is around $3.
> I'm running VSCode right now with many tabs open and extensions running, and it's using less than 100mb of memory. Doesn't seem so bad to me
uh... is it ? no tabs open and less than 12 extensions, the code process itself takes 120 megabytes, and there's also 1.2 gigabyte of electron processes running along with it
>In modern world, any sufficiently complicated C or C++ program embeds a runtime of some higher level language. More often than not, that language is formally specified, well-supported, and fast.
More often than not? Almost none of the languages mentioned are "formally specified", and most of them are hardly fast either...
> Almost none of the languages mentioned are "formally specified"
They all have written specs. Many of them even have these specs standardized, see ANSI INCITS 226-1994, ISO/IEC 16262 and ISO/IEC 23270:2018.
> most of them are hardly fast either
It’s borderline impossible to be fast compared to C or C++. By that statement I meant 2 things.
(1) They’re likely to be much faster compared to whatever ad-hoc equivalent is doable within reasonable budget. People spent tons of resources improving these runtimes and their JITs, it’s very expensive to do something comparable.
(2) On modern hardware, their performance is now adequate for many practical uses. This is true even for resource-constrained applications like videogames or embedded/mobile software, which were overwhelmingly dominated by C or C++ couple decades ago.
the first rule with no spin: "Any portable, formally-specified, bug-free, quick implementation of all of Common Lisp contains at its core a fair amount of nicely indented C."
That's exactly my complaint. I think it's missing things because if you added them in a way compatible with the C way of doing things then you would permanently cleave C from C++. Since C++ is a runaway train at this point detaching is probably a good idea now.
Me I want range types like Ada. Real array types. I think I want blocks/coroutines.
There’s “mangling” where symbols have an underscore put in from of them. But I think the point is that namespaces functions would need to be mangled and thus be difficult to call.
> But I think the point is that namespaces functions would need to be mangled
Namespaces don't need to be mangled at all if they are interpreted as symbol prefixes.
I'd prefer that, say, namespace foo::bar resolved into foo_bar for all symbols (even that meant risking namespace naming collisions) to not having any support for namespaces in C.
In fact, this approach is already used to implement pseudo-namespaces, so that wouldn't be much of a stretch.
> What's the point? How is typing foo::bar better than typing foo_bar?
You're missing the whole point of namespaces. The goal is not to replace foo_bar with foo::bar. The whole point is that within a scope you can type bar instead of foo::bar, or bar instead of foo::baz::qux::bar, because you might have multiple identifiers that might share a name albeit they are expected to be distinct symbols.
And also because of function overloading in, say, C++. To link code two functions with the same identifier which differ only in signature (e.g. types of arguments), we need to pass them to linker as two different functions with different identifiers.
Pasted the quote into work chat and a waggish co-worker immediately came back with, "on the other hand, Python is like camping with a toilet, but that toilet may have a dangerous snake in it, but you won't know until you try it"
> I've started learning ANTLR and making my own language.
> a language which runs in a VM written in C [...] easily be used on any board with a C compiler with a small amount of setup, and C FFI is a first class citizen. Also coroutines
I've looked at LUA, and it's possible I missed a flavor, but I have a long list of things I want:
-Static types that match C for seamless interop.
-No GC
-First class actors
-Ahead of time declaration/allocation of actors and messages, with automatic async-like passing on a coroutine/actor waiting for an allocation, and automatic disposal of the actor after prolonged allocation failure.
-Hot swapping actors from a running system in the field
-Over the wire debugging and inspection of a running system in the field
AtomVM, an erlang VM for small devices is closer than LUA to what I'm looking for. However, I really want to create an environment made for embedded development, not try to tack a higher level runtime onto something like an ESP32 and have devices drop every 2 months from a memory leak.
I've had a similar list for embedded. Unfortunately I've yet to fit many of those criteria without relying on GC. Maybe Zim, or Nim. But then you won't get introspection.
Rust using one of the actor libraries based on async, might suit embedded really well if cross platform support matures a bit more. Rust cross-compiling requires a C linker & compiler for the target platform but Rust doesn't respect the standard environment variables for setting them using LD/CC/CXX environment variables. Though it'd be really interesting to see if anyone can setup Rust to run on an embedded WASM to allow introspection / real co-routines. IMHO, that'd be awesome.
Currently I've settled on a Forth (in Forth style, an implementation I wrote https://github.com/elcritch/forthwith/) to give me an interactive environment without GC and stable timing (important!) with ability to add C function calls readily. Interactivity is big for me in the area I'm currently working in. Forth can easily be extended to have tasks / co-routines too, though I never tried implementing them. And of course, Forth macros are like exercises in puzzle solving.
AtomVM does look interesting, but early stage. Erlang VM actually matches embedded really well. The actor model in theory lets each process run it's own GC which makes the GC model much simpler than other dynamic languages. Modern OTP is large and not suited for embedded as you mention, but pairing it down to basic actors and processes would fit well on many modern embedded MCU's. Elixir and Nerves is great for SBC/Linux embedded computing and some boards like the Omega2/Licheepi Nano. There's also GRiSP (https://www.grisp.org/) that runs BEAM on top of RTEMS.
Some of those I'm iffy on the details, but Zig (Ziglang) has async/await, static typing, and very robust ability to intermingle with C. It's being designed for systems/embedded/games.
Just program an FPGA. You can get all those things with https://clash-lang.org/ cause basically what you want is first order FRP and hardware does that natively. (Actor model is junk cause the interesting thing is the data/information flow not the "actors" themselves. FRP puts the focus back where it should be.)
You can implement that for embedded systems too, but as there is no obvious best way to implement it (Von Neumann machine is so different) you'll constantly get annoyed if you are the type that likes to use the weakest hardware possible as different implementations have different costs and so you'll always be second-guessing the abstraction.
I'm looking at hot reloading more for debugging difficult to reproduce bugs in the field than for patching critical systems which can't go down. The goal will be to hot reload logic, but not types. I think that should be doable.
Yup, the Lua interpreter is written entirely in pure ANSI C.
Op should definitely see if solves his problem before implementing his own language. Unless of course he just wants to learn how to implement a language as a project by itself, in which case, have fun :).
Why not FORTH? It may not have some features out of the box (no type checking - which isn't as bad as it sounds with FORTH) but newer runtimes tend to put those things in and/or are fairly easily implemented yourself.
FORTH is neat, but I think you need to be too smart to use it. I made a list of other things I want under the peer post about LUA, some of that applies to FORTH.
> C is basically it for embedded development though.
This is the entire reason for me. With the way IoT is blowing up C actually might be gaining market share. I've done my research and things like TinyGo exist, and you can sort-of compile Rust for microcontrollers if you use all of the right crates, but it feels pretty risky to do so.
> I've gotten so tired of recompiling and waiting to flash a chip with C, that I've started learning ANTLR and making my own language. The idea is to have a language which runs in a VM written in C, and allows you to easily access code which has to be in C.
You may want to check out Moddable's XS runtime[1], which is exactly that. It's written in portable C, and "XS in C"[2] makes it easy to interoperate with C code you may write. And even if you decide to roll your own runtime, there's tons to learn from the source code.
Besides embedded applications, C is important because it has a simple binary interface. Libraries written in C can be used in any other language. This is not true for higher level languages, whether compiled or virtualized.
you can actually call golang from C but you have to copy values over otherwise the go garbage collector will free it while the C side may still be using it.
I love writing C when I haven’t written C in several years. It feels nostalgic and simple and elegant in my mind, and then I remember all of the issues I run into and how none of them have good, clear solutions. Things like writing cross platform code or string manipulation or how often things I swear I understand actually have bizarre edge cases, things I thought were well defined behavior are actually undefined, etc.
I’m not talking about being clever, I’m talking about writing code that meets some baseline standard for security, correctness, and cross platform functionality. The issue with C is that standard, “unsurprising” code often behaves unexpectedly.
I didn't get that impression when I used it, if it does that then most likey your code wasn't so standard or your understanding of the language was incorrect.
I personally only got in trouble in C when trying to be too clever.
You're missing the point: writing secure code is categorically impossible, no matter what the language. That's very much different from 'other languages allow some people to write insecure code occasionally'. Every piece of code, every system ever built and every piece of hardware ever designed had bugs in it and if that item was supposed to be secure over time it will turn out that it wasn't. So the best you can hope for is that an item is obsoleted before you (or an adversary) detects its flaws and you're going to be fighting a rearguard action.
The whole idea that many eyes or 'safe' languages or some other magic bullet are going to solve this is so far off-base that it makes productive discussion impossible because everybody starts to focus on the particular tool at hand whereas the real problem is an instance of of the class 'architecture', not of the class 'tool'.
I don’t think I’m the one missing the point. You seem to be rebutting points I didn’t make, like other languages (or magic bullets) afford a bug count of zero. You also seem to be ignoring my point, which is that C is in a different ballpark than other languages in its difficulty of implementing correct software.
you did say that other languages are more secure, and the parent post here is asserting the entire platform is insecure therefor no languages are secure, which is true.
however i disagree with both of you that writing safe C is hard. most of the problems i see relate to incorrect pointer usage or memory management. use stack allocated containers from libs very well tested and pass references. this removes entire classes of bugs from popping up. one thing you can’t do with C is be lazy.
I absolutely guarantee that if you throw AFL at any modestly complex C program that you have written that it will find bugs. Guarantee. This isn't about laziness.
safe and bug free are different things. nobody writes big free code but writing safe code isn’t as hard. also i’m not entirely sure how a program proves lack of laziness, if anything it proves laziness
Many of the most expert C programmers have reported an inability to write secure C code. But you’re right that my understanding of the language is incorrect; that’s largely my point. There are so many oddities and exceptions and surprises that the language is very hard to understand correctly. The people who profess to “not have these problems” haven’t written code that has run the gauntlet of security and widespread multiplatform distribution.
I misunderstood your question. I don't know of the specific reasons; presumably they're the same reasons that intermediate C programmers can't reliably write secure C.
I do recall that in 1991, I was the only student in my "Introduction to Programming Using C" 101 class (as a Freshman in college) who could understand pointer arithmetic. That class ended the careers of many aspiring Computer Science majors.
I eagerly learned C++ a few years later. I did not understand at the time what shitpile OOP is. I was totally in OO because it was the "industry trend." Someday we'll fully recover from that entire folly.
FYI, zig is the language that the most impressive embedded programmer I know is currently interested in to use for his embedded work (he does a lot of real time hard deadline things, like audio processing). I don’t know really anything about the language, but his interest in the language is enough to suggest other embedded programmers look into it.
From my subjective perspective, Zig, of all the languages, is what feels most the modern successor of C.
I think, it even does a better job than C++ which share some of the C syntax.
The problem i guess is complexity and productivity. Zig here is a hit while Rust is a miss as much as C++.
The question Zig was trying to answer is: Can we have a simple, free and powerful language as C but with a more modern approch?
While Rust in its cruzade to be seen as a C++ competitor were not aiming at simplicity and user ergonomics.
So my feeling is, Zig is much more compeling as a C substitute than any other language out there, and probably thats the reason why your friend like it more than others.
You can fancy-up your sleeping bag + tent with a huge motorhome/caravan by using C++. Its like you have this house on wheels, but if you just want to take a sleeping bag + tent with you on a trip - you can. Take whatever you need.
That sounds cool. Does ANTLR help with anything besides parsing your new language? It seems like the substantial effort is making the VM/runtime that the language runs within.
Haven't finished the ANTLR book yet, but that was the goal. I'd much rather define a grammar than hand write a(nother) lexer/parser. It also has some useful tools for debugging and visualizing.
V looks like it fits this bill even better than lua. It looks like a very useful and well designed language, but it is in an extremely rough state so it might be more worth keeping an eye on.
> I like using C, but I get frustrated every time I hit a wall trying to build features from other languages.
Well there's your problem. Instead of thinking about your problem and devising a C solution, you resort to pounding the square peg into the round hole. Your problem isn't C, it's your proficiency at it. What's worse, instead of correcting said ignorance your solution is to produce more code using a new language. This only adds to the gross pile of trash that is every language/framework that tries to solve programmer ignorance and laziness with what amounts to wishful thinking. This is why software really sucks nowadays. Everyone wants results without reading the manual or putting in real effort. They want languages with built in hand holding and magic inference. That's not how any of this works.
I get it, programming and C sucks because it's so damn tedious. I suck at C too. But I actively study books and similar problems people already solved using the language. Programming is very hard. The best programmers I know are also very good with math and puzzles (e.g. Ken Thompson was a chess geek). They're puzzle solvers. They don't cheat and try to smash or duck tape the pieces together. They thrive on studying strategy meaning reading books, papers, and other people's code. Learn how to solve the puzzles strategically instead of fighting them.
When writing interrupt service routines for serial communications on an obscure architecture with multiple heaps (one 16 bit addressable and one 32 bit addressable), I could have written 2000 lines of assembly. Instead, I used a system of assembly macros and wrote about 300 lines of that.
When writing firmware in a C which lacked decent coroutines, I could have just made a giant while loop and put all of my logic in it. Instead, I wrote a quick implementation of coroutines, a scheduler, and a basic async/await implementation using the C preprocessor. This also saved thousands of lines of boilerplate code.
The fact that C has no good method of implementing a generic hashmap has nothing to do with my proficiency in the language. My proficiency in C also has nothing to do with my love of sweet, glorious type inference, or my deep felt appreciation for the conveniences modern IDEs and languages provide. Writing software has never been better!
Maybe I'm misinterpreting, but what I'm hearing is: "You don't need to use Vim to get more done, you just need to get better at typing. Not that I know anything about how fast you type." Or... you know, I could just use Vim.
That is not how your original post read to me. It sounded like you came from the dynamic/scripting language world and are trying to hammer those features into C. You sound quite proficient, I'm not trying to shit on you or anything. I'm tired, it's late, and no one cares but it sounds like you want something like an mbed that runs python or FreeRTOS? Why reinvent the wheel if someone already did this? Or is it simply that you're stuck using C on your "weird" platform? In that case, I'd say have a look at Nim, there's an embedded branch. It's a dynamic language that compiles to C89 so perhaps it's buildable on your platform. If it's a personal project then have fun.
If you just put together a linked list of structs which represent the routines, the time they last slept, and what they're last waiting on, you can just iterate through it and you've got most of the functionality of async/await. I never bothered to implement separate stacks and register restoration, opting to just use global variables for state that needed to be saved between awaits. It would have been neat to do that, but I think it would have turned an afternoon spent saving time into a multiday project, and left me permanently paranoid about corrupting the stack or registers. Would have been fun though.
That's actually how _all_ of this works. If we required everyone to handle the complexity of everything, we wouldn't be able to make technological progress. Yes, there's the tension (there's always a tension) between having a deep understanding of an underlying technology, and therefore be able to use it better, and trying to get to a point where you can get away with not understanding. But it's undeniably considered a success story in technology if the technology gets to a point where people can use it to basically its full potential, without needing to understand it.
A lot of people have a misconception that C is close to the way hardware works. The reality is that hardware engineers jump through lots of hoops so that C programmers can keep pretending that they're coding against a really fast PDP-11. Not only that, but these contortions directly lead to vulnerabilities like Meltdown and Spectre because memory representation in C doesn't map well to modern hardware. C family of languages is basically holding us back from utilizing current day hardware safely and efficiently. This [1] is an excellent essay on the topic.
They jump through hoops so that C programs and the programs of all the other languages designed to run in the same execution model run as fast as possible. It's not to indulge C programmers but to support the vast body of existing software.
Hardware engineers need optimization targets just liker anyone else, and it's an eminently reasonable one.
Also, it's not like loads of improvements not related to the C execution model haven't been made. Demonstrably, the C model is not holding us back.
Anyway, if you want to move on from the C execution model, great. Now you need to introduce a practical transition plan, which should include such details as how and why we should rewrite all of the existing performance sensitive software designed to work in the old model.
the C execution model has demonstrably held us back because the practical transition plan for migrating existing C codebases would be automated translation (of the C Turing tarpit) into sane functional/formal models
Sure. Foreign function interfaces (FFIs), also known as ctypes, are probably the best thing to point to as glaring examples of how C holds us back. Once we start there, then a spidering network of technologies are highlighted, all surrounding C, from dynamically-linked code objects (DSOs and DLLs) to the required usage of '%' characters for printf-style string formatting via templating.
We need to move beyond memory-unsafe paradigms already. Assembly intrinsics aren't great but they're certainly better than C.
In a way this already happened, with GPU programming, which has a more realistic model of what modern hardware is (explicit memory hierarchy for example).
Because the speed advantage is so huge, people went through the pain of learning this new model and redesigning algorithms to better fit it.
Modern hardware is multiple different things. GPU programming does not have a realistic model of what CPU hardware is like, because despite CPUs having more cores and SIMD, they are also very cache-oriented and branch-prediction-oriented, while GPUs aren't that at all.
And the C++11 memory model was added so that the language would map to the execution model of the underlying hardware, not viceversa, refuting the parent.
Not really, C++11 initally got its memory model from .NET and Java memory models, two mostly hardware agnostic stacks, and then expanded on top of them.
I'd argue that modern GPUs pipelines are quite different from the API programming models they support, especially tile based renderers do lots of funny things while maintaining the illusion that you are running on the abstract OpenGL pipeline.
The computer history graveyard is full of architectures that were supposed to be superior by directly supporting the trending programming paradigm of the day.
The current microarchitectures for general purpose computing, that is OoO execution, cache coherent shared memory with a flat memory model, and mostly transparent and coherent caches seems to be optimal. In fact, far from being designed around C, C and siblings had to evolve to support the model well (a memory model, explicit SIMD builtins, support for vectorization and offloading etc).
It is entirely possible this is only a local optimum, but I have yet to see a plausible model for a better architecture.
It is more likely that, now that silicon is cheap and CPU designers are struggling to find ways to use it, extensions to support higher level languages might be added: more fine grained cache control for message passing, extra tags to help GCs, and more stuff that I can think of.
I’m sympathetic to the argument that C isn’t realistically a “high-level assembler” any more, but if there is low-hanging performance fruit available for a language free to dispense with C’s PDP-11-flavored abstractions, I haven’t seen practical examples of it.
If anything, more modern languages (eg Java) have to contort their data representations to build, say, “structs of arrays” that are highly efficient on modern CPUs (and trivial to express in C!)
I’m very curious to hear of any recent languages that are designed expressly with “mechanical sympathy” in mind.
The ISPC language by Intel is designed to give enough information to the compiler so as to allow auto SIMD-vectorization of programs [1]. It opts for an explicitly parallel model of programming, unlike C [2]. It also has easier support for SOA and AOSOA [3].
The trouble is both processor instructions and compilers have been optimized for C (well C and FORTRAN). So no matter how good or bad C is, we're mostly "stuck" with its PDP-11 computing model unless someone does a lot more work than just making a new programming language.
I'm pretty certain the CPU engineers at Intel/AMD/etc aren't pulling out K&R 1978 and then telling themselves this is their new reality. All kinds of new safety, virtualization, and ML instructions have been added to CPUs without consent from the "C language". Infact, its Google, Amazon, Apple, Microsoft, and Facebook who are controlling CPU roadmaps!
Without cache coherency and ILP, programming in any language would be insane. It's not like programming in x86_64 assembly suddenly opens a world of possibilities because its "truly low-level". You gain very little extra control over a given platform by switching to assembly over C, that's what we mean by "low-level".
C maps cleanly onto the instruction sets provided by chip manufacturers. It provides the option to the programmer to optimize structures for use in vectorization if they so choose, or to optimize for some other objective like size for a binary wire protocol or limited memory space.
The very nature of having a choice about memory layout of structures and the ability to cleanly link with the platform ABI is what makes C low-level. Obviously the inner-workings of a modern CPU don't map cleanly to the C virtual machine. However, there's no convincing evidence that greater control over cache invalidation is what's holding back performance of those CPUs.
If finer-grained cache control is needed, then it is no big deal to invoke the appropriate compiler intrinsics (if available) or write your own with inline assembly. My preference is to put these sorts of things in a separate .S file.
> It's not like programming in x86_64 assembly suddenly opens a world of possibilities because its "truly low-level"
It certainly opens a world of possibilities regarding the vector units. That's one place where inner loops hand-written in assembly still have an edge. On the other hand, with contemporary CPUs, high quality assembly coding is a highly specialized job skill by itself, so for general purpose engineers, learning it is most likely to yield a bad cost/benefit.
Right. Exact control of data structure layout is also practically essential when implementing a database system. It is needed for correctness as much as anything else.
The instruction sets provided by chip makers (except vector instructions) are tuned to match what compilers for ancient languages want to emit.
The chips move heaven and earth to maintain the fiction that those instructions actually direct what they do.
The number of fundamentally different kinds of cache, and specialized state machines not directly accessible by instructions, in a modern chip would boggle your mind.
In GPU programming (CUDA for example), every time you allocate memory you need to explicitly specify where (general memory, L3, L2, L1, register). Because C doesn't support this, they added language extensions.
So C is too high level for GPU programming, and needed to be extended.
While I also disagree that C is “holding things back”, I don’t think their comment is a “load of crap”. Talk like that is unhelpful and quite frankly non-technical and unprofessional.
The author seems to trivialize how hard it is to write a compiler that generates efficient code. Like designing a simpler language than C (less general architecture targets) would solve much.
It's not about designing a simpler language than C, but rather one that matches what hardware is actually doing better. The example in the article is how GPU programming works. To make the most out of modern hardware we need to change the way we program to embrace concurrency, parallelism, and eschew shared global state entirely. The way FP works is much closer to hardware than the imperative style.
I have a hard time figuring out how forbidding concurrent task from communicating would necessarily make them faster. Also, how is FP a more accurate model of eg. AMD64 than imperative style? How would eg. "Clear Global Interrupt Flag" instruction be modeled in FP with no states?
I don't know if it is a more accurate model of AMD64/X86 but it may be a better match for the underlying silicon (gates, logic circuits). I can see function pipelines and composition of these in some ways analogous to circuit design. After all those instruction sets in AMD64 are merely an implementation of these. Pure functions and pipelines could model circuits quite well.
Shared state is not a problem with current hardware, mutable state is also bit a problem. The only issue is with shared state that is mutated from multiple CPUs and that's why the single writer principle is a fundamental rule of multithreaded programming.
Similar reasons are given for Golang design in their comparison to C language.
e.g.
>Why is there no pointer arithmetic?
>Safety. Without pointer arithmetic it's possible to create a language that can never derive an illegal address that succeeds incorrectly. Compiler and hardware technology have advanced to the point where a loop using array indices can be as efficient as a loop using pointer arithmetic.[1]
But there is pointer arithmetic in Go, in the "unsafe" package. Like with generics the Go authors want to decide where and when these are used, but they actually made it possible for mere users of Go to use pointers freely, since it's badly needed from time to time, unlike generics which they hide fully.
>Like with generics the Go authors want to decide where and when these are used,
I wish this meme about Go and generics would die already. C allows you to define an array of int and and an array of char and have those be two different types. Go allows you to do this with slices and maps as well, because slices and maps are also built in types in Go. This is not "generics".
I think it's more useful to think of "generics" as applying to type systems that allow abstraction over types. C and Go crucially don't allow you to define a function that, say, appends an element of type T to an array of T. That's where things start to get complicated. C and Go don't have a restricted version of that [1] - they just don't have it at all.
[1] With the largely irrelevant exception of the C11 _Generic stuff.
Surely it's better to confine it to a package labeled unsafe that one must import to use (and which by Go's rules one can't import and then not use) and to make a point of only using it where necessary and paying particular attention to it in testing and code reviews.
Good comment and article, thanks! One thing only, the phrase "The quest for high ILP was the direct cause of Spectre and Meltdown" is slightly exaggerated. One might say ILP was rather a remote indirect cause, not the direct cause. Blaming it on C and ILP maybe is kind of far fetched.
That isn't to say that the article doesn't have a point though, far from it.
Isn’t this abstraction presented at the ISA level, not by C? It’s not that C isn’t close to the way hardware works, it’s that the ISA interface presented to C compiler writers isn’t close to how instructions are actually executed.
I've been writing a large app in C again after a few years away. Originally wrote the app in C++ then again in Python and now in straight C. I enjoy the faster compile times of C (over C++) and the better type checking of C (over Python).
However, when I'm carefully using C++, I don't have to fiddle around with memory management and still get fast performance and better type checking.
Yesterday I wrote a Python script to generate C code test cases. String manipulation in Python is very easy.
> Yesterday I wrote a Python script to generate C code test cases.
I do this quite a bit. Basically, anytime there is something very boiler-platy (like HTTP APIs, configuration management, test cases, etc.) I write metadata in JSON that describes what I'm adding, and then at build time a python script will parse the metadata and generate all of the C or C++ which is then compiled. I even have the python generating comments in the generated C/C++!
I used to use Jinja2 for generating the C/C++ files, but now I just use f-strings in the latest version of vanilla Python3.
This is what Lisp/Scheme based languages use macros for -> to reduce boilerplate code. A lot of programmers hate DSL's, but using Python to generate test code is the same as using a DSL. The difference is, Python is a much bigger and more complicated DSL, than a test-macro will probably ever be.
There is nothing wrong with generating code in a pragmatic way. It is just strange to see all the time people using workarounds for things that a tiny DSL could probably solve better.
There's another advantage to using a high level language like python though - since the "source" is now JSON metadata which is decoupled from the code, I can use that metadata to generate other interesting things. For example, I can use python-docx to generate a Word document, for example (which my company requires for documentation tracking). Or, if the metadata is describing HTTP APIs, I can generate an OpenAPI document which can be fed into Swagger UI[1].
This is really interesting — nice work! Have you published any tooling as OSS or examples? If you haven’t seen it before, the Model Oriented Programming work by Pieter Hintjens is worth looking at.
Not yet. Most of the stuff I've done is internal to my company unfortunately. But just to give you an idea, if I were adding a new configuration parameter to our IoT device, I would add an object to a JSON array that looks like this:
{
"name": "backoff_mode",
"group": "telemetry",
"description": "We don't want to clog up networks with tons of failing attempts to send home telemetry. This parameter configures the device to back off attempts according to the specified mode",
"cpptype": "std::string",
"enum": ["none", "linear", "exponential"],
"default_value": "exponential"
}
At build time, a few things are generated, including a .h file with the following:
The boilerplate for getting and setting this parameter is also generated. Thus, just by adding that simple piece of metadata, all the boilerplate and documentation is generated and you as the developer can focus on the actual logic that needs to be implemented regarding this parameter.
Note that Python's official type checker (mypy) is pretty good (within the limits of type erasure); I started the project I've been leading up at work this past year in statically-typed Python, and it's been a great experience.
I tried to like mypy, really did. But it always ended up in some kind of circular import tailspin for me. C++ gets around the problem by allowing separation of interface and implementation, Haskell by allowing circular imports. Maybe I missed something, maybe it's fixed by now, but that was my experience a couple of years ago.
I've played with mypy a little, and it's certainly not bad, but it does suffer by comparison to typescript, which just has a lot of really great structural typing features that mypy still lacks.
mypy recently got structural typing. They call it Protocols, which isn't the first thing one one put into google when looking for this feature. It's not as powerful and terse as typescript unfortunately.
I haven't tried mypy yet but want to get to it. It sounds very interesting! After using Python so much the last several years, I've gotten more curmudgeonly about strong type checking in my programming languages.
Don't wait for a new project, just start using it (is most beneficial if for every function you specify types for parameters and output). If you use an IDE that understand it like PyCharm, suddenly the autocomplete will start working right, so will refactoring and it will start highlighting potential errors.
IME, 'C with classes', or C++ but skipping the more esoteric features(OOP overuse, template metaprogramming) yields a fairly easy to understand language for most C folks. I'm a C guy and am most comfortable with it, but I like having the advanced data structures out of the box. Granted, most C codebases of any reasonable complexity have their own support for relevant data structures, but they're a lot cleaner to use via the STL.
Python is a great tool for a certain class of jobs. I saw a company doing systems programming in Python and originally thought it was a great idea, and then I learned of the hidden(or not) dangers of Python.
Thanks for the link! I just installed it on my i5 Thinkpad. I can't test filters and more complex stuff right now, but it loads super fast, like 150 ms uncached, and probably less than 50 ms when cached.
I'm curious if there is other software around rewritten in pure C. It would likely be a huge task not worth the effort, but I can't but imagine the speed and size gains by rewriting say Firefox or Libreoffice.
I think what squarefoot is looking for is lightweight (tiny?) C apps that have limited abstraction layers. For example, Gtk+, which is pure C, is no longer considered lightweight. It was with Gtk+ 1.2 because GTK was just a small wrapper around X11 in the 1.x days. That is what AzPainter is. AzPainter basically depends on xlib, xft/freetype, and some image libraries (libpng, libjpeg).
Modern C apps with a GUI typically build on GTK but that toolkit has become massive. Modern GTK3 apps are built on top of dbus, pango, atk, cairo. This is not necessarily a bad thing though and libraries like ATK provide accessibility which is important. At the time GTK+ was created the competing toolkit at the time, under Unix, was Motif and it had the nickname "bloatif". Now we've reached a point where GTK3 is much much larger and "bloated" than a 2019 Motif application.
If we include C++ the Qt and WxWidgets toolkits are also massive. FLTK is still a lightweight option though.
"I think what squarefoot is looking for is lightweight (tiny?) C apps that have limited abstraction layers. For example, Gtk+, which is pure C, is no longer considered lightweight."
Yes, I was also meaning the ui, which in this one is amazingly fast. I tried a few filters and they seem fast too, but being not a gfx expert I have no way to judge. But the ui is incredible; same speed on my home PC which has a mechanical disk. Making an external library of its gui primitives and functions could be an interesting project to be used on small embedded boards.
Since AzPainter v2.1.3, `mlib` sources shipped with AzPainter now licensed under GPLv3 terms, but there is AzPainter theme editor (older `mlib` minimal demo app) which sources licensed under BSD terms:
In the mid-90's, before mod_perl, I used to generate PHP in minutely cron job using a Perl script from a constantly changing database. That came close to the best of both worlds at the time: the flexibility of Perl, fed from a database, and no CGI overhead when serving.
Code generation usually doesn’t get you any support from standard tooling. My autocomplete can see through macros, but it has no idea what C is if I was working with Perl.
One large factor is that when somebody really dislike the result of a code generator, they most commonly cope by editing the generated code by hand. With other metaprogramming techniques people can't do the same, so they fix the issues.
The end result is that code generators are usually a set of mostly-functioning tools that create bad code that is almost, but not exactly impossible to understand but that you will be required to change at some point.
I use them. The executable stack bit doesn't bother me since I'm on an ARM Cortex. Would be nice if they fixed that because they are kinda handy once you get used to them.
It's what it doesn't give me. Doesn't have a memory management unit, so no stack execution protection. So no protection against classic stack smashing attacks.
I saw a talk about weird machines. The upshot of it is if you can clobber the return address on the stack you can usually create a weird machine that you can then program[1]. Leads me to believe that if you are putting entrusted data and return addresses on the same stack you're in a state of sin security wise.
[1] Also saw an exploit on an embedded system where they leveraged that even though the memory was locked you could set the program counter and read/write registers via jtag. Using that they were able in short order to recover the devices security keys and re-flash it with their own code.
This [1] explains how to create them using nested functions. Near the end is a quick blurb about the implementation which is probably why c++ doesn’t do it that way.
I don’t like C++ anymore. I used to love it but now it’s just a bloated mess of a language. I get it, you don’t have to use every part of the language but I really don’t like any of the newer bits and that’s what everyone wants to use.
It's pretty simple. I'm decoding a structure into bit fields (802.11 Information Element). I have a hex dump that I decode into a two structures with almost 100 fields (eyeroll for 802.11 committee). Each field can only be 0, 1, or sometimes up to 15.
I have a list of the structure members. I generate an assert for each member being a certain value. The python script builds/runs the C code. On failure, I parse the assertion failure, get the actual value, change the C assert string, rebuild-rerun. Continue until the program succeeds.
I've visually verified the decode is correct once. I want to keep the decoder tested if I change the code again so I'm generating complete test coverage.
It sounds like your use case is property based testing. Have you tried any of the QuickCheck ports to C++? My favorite is RapidCheck (https://github.com/emil-e/rapidcheck). You define generators for your struct fields and write a pure function to check whether the parsed output conforms to your expectations.
This way you can have the speed and memory tightness of C++ where it matters, but do the other stuff, like initial data parsing and munging or overall control and sequencing in Python, thus avoiding the general clumsiness and unproductivity of C++
In August this year I started a new project. In C. Why? Because I'm quite familiar with it and because it allowed me to focus on the problem rather than on the language that I was writing in (the problem is - for me at least - a very hard one). The codebase has steadily grown, at some point it was 7K lines, then I cleaned it up and refactored it back down to about 5500 right now. At some future date it will be - hopefully - finished and I may use it to power applications and/or services. It's super efficient both memory and speed wise compared to what the competition is doing but that's just a side effect. That side effect does allow for some applications that would otherwise be impossible.
The code takes one file and outputs another, it's a classical 'unix filter' and it relies on a couple of very high performance libraries that others provided, battle tested and ready to be plugged in. As the project matures I come across the occasional wart in the language, something that I know I could have done more elegantly in a language with richer constructs (lists, list comprehensions, that sort of thing).
What surprised me most after working on this project for a couple of months though is how well the language fits the problem, that's something that I did not really think would be the case when I started out with it. But as time passes and things 'find their place' it is indeed just like the author writes, I - still - love coding in C. Even if I know there are better alternatives out there and I should probably get invested in one of them one of these days at the 75% mark or so of my career I'm still very happy that I picked C at the start and never once did I imagine that it would still serve me this well 37 years later.
Most of the other 'hot' languages of the days that went by in the meantime have all been lost, and yet, C is still here, and likely will still be here for years to come. To me it's like an old knife. Not pretty, known to hurt you if you abuse it but still plenty sharp and well capable of doing the job if you treat it well.
Experience shows that almost every time someone writes a project like this in C, applying modern fuzzing tools (e.g. AFL) will uncover bugs that let a maliciously crafted input file hijack execution and act with the privileges of whatever's running the code. It is a lot of work to remove all such bugs (and not reintroduce any) and you never know when you're done. That is a very severe problem with C that is having devastating effects on society.
It is therefore irresponsible to use C for a new project like this, unless you know your code will never be exposed to malicious input. (Note that if you share the code with anyone else, it's hard to have such confidence.)
Passing things by reference is safe. Any other use is mostly just pointless; it what reference types are for. If you are doing something else with them, it easy to stop.
Passing parameters by reference is certainly not always safe. You can, for example, pass a reference to a deref'ed unique_ptr into a function call, while the function call does something that clears the unique_ptr.
If you want to avoid objects containing references (or raw pointers), then you have to avoid using lots of standard library stuff --- `string_view`, C++20 `span`, or even STL iterators.
> Even if I know there are better alternatives out there
I could have chosen Java, Python, Go or C++ instead. Instead I chose the tool I'm most familiar with because that will get me focused on the problem rather than on the tool. It's a disadvantage of getting older: there is less time left to waste.
> To me it seems you know it can be dangerous, but like it too much not to use it.
It is mostly a familiarity thing, not a like or dislike.
Isn't Rust right now (1) still a niche language (2) terribly slow to compile (3) a bad fit in the first place since I don't actually know the language and (4) still immature?
As for libraries, the most important ones that I rely on deal with the loading of .wav files, writing of midi files, decoding of mp3s and fftw ( http://www.fftw.org/ ).
1) Define "niche language". Lots of people are using Rust in production. Rust isn't in the top 10 most popular languages, but it is in the top 100, and rising.
2) Compilation speed is a problem for some projects, but not for small projects like yours.
3) Understandable, but writing exploitable code because you don't know a suitable alternative language very well is not a good long-term situation. Maybe you should do something about that.
4) Define "immature". It's not one of the world's most mature languages, but it is definitely mature enough to write production code for many contexts. In many ways C is the language that needs to grow up.
There are Rust libraries for all those things (including Rust bindings for FFTW itself), though I can't speak to their quality. I can confidently say that using Rust libraries in a new Rust project is a lot easier than using C libraries in a new C project, especially if you're not on Linux.
> Lots of people are using Rust in production. Rust isn't in the top 10 most popular languages, but it is in the top 100, and rising.
I've yet to come across anybody outside of the HN crowd that knows about or uses Rust. Not a single company that I looked at in the last year used Rust, that's 41 of them and I asked every one of them which programming languages are in use.
> Compilation speed is a problem for some projects, but not for small projects like yours
ECT cycle (edit, compile, test) is a pretty big factor during early stage development. Running a test in a second versus running a test in ten seconds would ruin my day. I'm not sure what the current speed of compilation is for Rust but the only time that I looked at it (very early on) it was so slow as to be unusable from my perspective. I'd hope that has substantially improved. Think of what I'm doing as explorative programming, I'm both trying to understand the problem and trying to solve it at once.
> Understandable, but writing exploitable code because you don't know a suitable alternative language very well is not a good long-term situation.
That may be true. At the same time, Rust may not be a good long term solution either, languages come and languages go, and before I invest a year or so into a new eco-system I'd like to see it has staying power. It is interesting that you recommended Rust and not say Java which has pretty good performance, is memory safe and is in production for well over a decade, for this particular use case I would choose that over Rust.
> Define "immature". It's not one of the world's most mature languages, but it is definitely mature enough to write production code for many contexts.
There isn't a month or we see an announcement of the next point release of Rust on HN. I don't have the luxury of tracking a moving target next to working a full time job and having a family, then doing this hobby project besides. If Rust wants to see mainstream adoption for stuff like this because you seriously believe that writing a new project in C is irresponsible (which smacks of FUD, and is one of the reasons why I think Rust is one of the more toxic language communities on the internet, I have never seen a Java proponent use that kind of language) then I hope you agree that getting Rust to some kind of long-term stable form in the very near future is a must. This whole 'the sky is falling' tactic is rather off-putting, at least, it is to me. The same happened to perl, which was supposed to be - and still is according to some - the best thing since sliced bread.
> There are Rust libraries for all those things (including Rust bindings for FFTW itself), though I can't speak to their quality.
That's good to hear.
> I can confidently say that using Rust libraries in a new Rust project is a lot easier than using C libraries in a new C project
For you, as a Rust user, yes. But you are forgetting that for me that is not at all easier, and that using C libraries in a new project is a lot easier than using Rust libraries for me because I happen to know how that works.
Vantage point is a bit of an issue here, I take it that you are fluent in both Rust and C and that you have decided to hitch your wagon to Rust. I'm fine with that and no doubt in the long run you will be proven right but to me it smacks of 'you should do as I do' rather than that it is the best for me.
Learning a new language just for some project is a very high degree of friction to add, it would slow me down tremendously and it might lead to the project being abandoned rather than progressing at a quite acceptable speed given my constraints in time.
> especially if you're not on Linux.
I wouldn't dream of developing software on anything else, but I totally respect other people's choices in their platforms.
> I've yet to come across anybody outside of the HN crowd that knows about or uses Rust.
Interesting. High school students and university students that my kids know are using Rust (in New Zealand, not some high-tech mecca).
> Think of what I'm doing as explorative programming, I'm both trying to understand the problem and trying to solve it at once.
OK. I might recommend Julia then.
> It is interesting that you recommended Rust and not say Java which has pretty good performance, is memory safe and is in production for well over a decade, for this particular use case I would choose that over Rust.
Sure, Java sounds like a fine choice. I didn't know much about your requirements when I suggested Rust. People often choose C because of strict performance or deployment constraints, which Java often can't meet, which makes Rust often a safer recommendation.
> There isn't a month or we see an announcement of the next point release of Rust on HN.
That's because they release every six weeks. That doesn't mean the language is unstable or you will keep having to make changes to your code. It means there's a steady flow of incremental improvements that preserve backwards compatibility.
> you seriously believe that writing a new project in C is irresponsible (which smacks of FUD, and is one of the reasons why I think Rust is one of the more toxic language communities on the internet, I have never seen a Java proponent use that kind of language)
In hindsight I shouldn't have mentioned Rust at all. I do sincerely believe that propagating C code is irresponsible and the software industry needs to recognize this. It sounds harsh, but I think that's partly because software developers have historically taken too lightly the consequences of their choices (even for hobby projects).
People like me who are zealous about the industry moving away from unsafe code tend to be big Rust fans, because Rust finally makes that possible for the systems/embedded domain where C/C++ were for such a long time the only viable option. I guess we have skewed the Rust community to some extent. Mea culpa.
> it smacks of 'you should do as I do' rather than that it is the best for me.
When we write code and distribute it, we have an effect on the world, and then I think it behoves us to consider more than just "what is the best for me".
For hobby projects where the code is never distributed, these issues are mostly moot ... though it's amazing how often such projects escape sooner or later.
> I do sincerely believe that propagating C code is irresponsible and the software industry needs to recognize this.
I believe that you are sincere in this. At the same time I ask you to recognize that the way you - and many other proponents of 'safe' (safe between quotes because computer programming will never be safe) languages approach this is combative and therefore ultimately un-productive.
The old proverb says that you will catch more flies with honey then with vinegar and headbutting with people and calling them irresponsible is not going to get you where you want to be. It will have the exact opposite effect, people will dig in and their resolve will strengthen rather than weaken. This is because people tend to be invested in their tools and their creations. Going for a full on frontal confrontation about this is counter productive, that's just human psychology, which is in cases like these as important - if not more important - than having a technological edge or being right.
Zealotry is the last thing you want to take with you in your toolbox if you want to make change.
That's a good point, but it assumes the only audience for this conversation is developers already emotionally attached to C, which I don't think is true.
GC is a problem for OS/embedded programming. And before Rust, safety required GC (or an unpalatably restrictive programming model like MISRA). Even if Symbolics or the C# Windows Vista work had been more successful, Rust would still be needed.
I love the (seeming) resurgence of C, Rust, and C++ articles on HN. Not sure if it's robotics, IoT, webdevs longing for simpler and stricter frameworks or what. But I love it.
For me, it’s embedded. So I use C, mainly because as a professional company releasing a real world product my only other choice is C++ and I haven’t found a “need” for it yet.
I think we also choose our favorite programming languages based on our psychology, not only the technical features.
The ones who like C and C++ are like Toretto from the Fast and Furious franchise choosing a classic muscle car for a race. They are super fast, give you much more options, but because of that, they are not the safest.
A muscle car requires more skill to drive and win than a modern fast-shiny-and-safe car. But because you know they are more dangerous to drive, is likely that you will try to improve your skills to make it less likely to have an accident with it.
Thats one of the reasons why im still skeptic to jump into the Rust bandwagon, once you turn into a good muscle car driver, its hard to choose for safer but more limiting options, but you totally get it why someone without experience in fasters cars would choose for the safer version.
I write C full time and I love it (Varnish Cache). In our team of about 10 full time C engineers, we spend less than an hour a month dealing with things like “memory safety”. When you are writing C at a professional level, your time is spent on things like performance, algorithms, accuracy, hitting requirements, and delivering software. We have numerous safety and fuzzing systems humming in the background checking our work 24/7. The tooling ecosystem (Linux) is top notch.
(If you want to write C full time professionally too, contact me!)
C was my first love, and I still find myself using it often, especially in library code when I want versatility.
I've gotten into the habit of pushing memory allocations as far back into the user program as possible (where the user asks for the size of the internal struct, allocates and casts, and deals with ownership himself) to allow more flexibility to the users of my libraries, and also to remove entire classes of memory issues from the libraries themselves: https://github.com/kstenerud/c-cbe/blob/master/tests/src/rea...
As a bonus, it avoids the headaches of mixing multiple allocators (malloc, new, [NSObject alloc], JNI, etc) when used in cross-language codebases.
It does, unfortunately, complicate the API a little bit, but I find the tradeoff to be worth it in terms of safety.
What does it take to become a professional C engineer? How did you practice all that. I currently started looking into it, but I'm kind of just getting started (Books/PDF, Exercises).
Time and experience. Learning the syntax and wrapping your head around pointers and memory is the first step. After that, just try and write as much C as possible. Help out with projects, start projects, write tools and APIs. Most importantly, study existing code bases to learn real world techniques. Best case, get a C job where you have to write a diverse amount of C code day after day.
I've spent roughly 20 years going back and Forth between C and C++, trying to decide whether to add the missing pieces my way to C or strip away what I don't need from C++. I'm currently in a C phase after having spent quite some time writing interpreters in C++.
The most common issue I see in C codebases is excessive memory allocation. In most other languages, allocating a chunk per instance is the only way, though the implementation likely does some kind of pooling/slab allocation behind the scenes. In C you have the freedom to arrange your allocations any way you feel like, and treat any segment of N bytes as a value of size N. As an example, most collections (vectors, sets, maps etc); tend to only work with pointers; which is a huge missed opportunity.
My two favorite kinds of collections in C are embedded linked lists [0] and deques. One reason is they both provide stable pointers (which is the main disadvantage of vectors), the other that they minimize the number of memory allocations.
I recently implemented a pool allocated deque library [1] in C that uses embedded linked lists to keep track of memory blocks and provides value semantics, which I hope will make my interpreters easier to implement as well as increase performance. Time will tell.
I thought the OP was a little bit self-undermining, because as the author points out, C feels "close to the metal" but at this point it's actually virtual metal.
But I think "getting close to the lower levels" is still great. The only time I write C-like code is for an Arduino. Turns out that even if your electronics is rusty, you can do some very useful things in the real world by automating a simple motor and a few LEDs. And whenever you have to interface with individually numbered output pins, it feels pretty low level.
tldr: if you want to get close to the metal, it's really useful to start by switching to hardware that facilitates this.
Even assembly goes through a number of layers before it’s run. With microcode, virtual memory, and privilege rings there’s a number of things between the code and what it’s running on.
Yeah, that's why I used expressions like "closer to the lower levels" — you can descend in orders of abstraction/layers of reality, but it's more asymptotic or directional than anything else.
It's not like playing with an Arduino is getting me closer to understanding the underlying chemistry and physics. Which are also interesting to learn about...
But personally I think it's for the best that reality is inexhaustibly complex (from the perspective of human individuals).
> C feels "close to the metal" but at this point it's actually virtual metal.
Since you mention the Arduino, when you start out with it, you're most likely turning on digital pin e.g. D7 with digitalWrite(7, true), it's easy to understand and quick to implement.
But spend enough time on the Arduino (or C or Assembly or embedded in general) and you'll soon realize that all that function call is doing is basically the same as doing PORTD |= 0x80. That's a bit tougher to understand, but even quicker to implement, and once you've understood how that all works together, a lot of everything about computers and digital electronics becomes so much easier to understand.
C has not really been close to the metal for a long time. The C abstract machine is actually very complicated, especially the rules around undefined behavior. John Regehr's blog explains this well, e.g. https://blog.regehr.org/archives/213. It's easy to write code that looks like it will compile to something simple and obvious but that current compilers will turn into code that does something else entirely, or worse, current compilers compile as you expect but future compilers will mangle.
If you want to be reliably close to the metal, I think the best options are either to actually write assembly, or use a language with a simpler abstract machine (preferably, no undefined behavior), that statically compiles to code that maps closely to what you wrote, and lets you generate "idiomatic machine code" (e.g. unboxed integers). Safe Rust fits the bill, but there are probably other more obscure options.
This is timely! I never really learned C deeply (other than from a class in high school where pointers were considered 'too advanced'...so we didn't really get too far).
I decided to work on this year's Advent of Code in C and it's been a blast so far. I finally "get" pointers, malloc/free, etc. (and I have a newfound appreciation for Ruby things like CSV.each).
Since most of my coding work is at a much much higher level (web backend & frontend stuff), I probably won't spend too much time in C in the regular course of business, but it's massively helped me better have an intuitive understanding of previously vague things like "allocate on the stack vs. heap" and "pass by reference vs. value" -- I knew what those words all meant but didn't really have a feel for what they actually did. C helps with that.
Coming back to C always feel very refreshing, until I have to manipulate strings or open a socket. Currently golang is striking the right balance for me. Rust has a place as well, but it still slows me down too much. When writing Rust there's sort of this constant background anxiety that I won't be able to get the compiler to accept the thing I'm trying to do. I'm excited about async/await though, and when you must have safety+performance it's awesome.
I think it really depends on what you are writing. I built up a column-store database in C recently (not for production, just as a project) and while I can imagine rebuilding it in C++/Rust, I don't think it would be possible to do it in a performant way in Go.
Go is nice, but tends to be very verbose and also is garbage collected. It's for a different usecase.
I think the hanging yourself in c comes from the syntactic complexity, not its high-levelness.
Several mistakes in c:
Using string interpolation preprocessor macros makes it so that you have to 1) learn a whole second language to read other people's code (most portable code requires using #defines) and 2) it's very difficult to trace through code where you don't know where some values are coming from.
Header/code segregation seems like a huge mistake. It's far more effective to have a single file with well-defined exports. Even the #include system is broken, because it's not explicit or obvious which directories one must go to to find header files (they might not be local to your project)
Sigil order. The fact that there is a guide saying you should "spiral out" to understand how to read a type should be evidence enough that this is a huge mistake.
I get it that there are professional c coders that will say that's part of knowing the language. I'm not a professional c coders, I program in another language and very rarely need to dip into something lower, say for performance or mutability. Thus the c code that I should write is dangerous, I need safety, and simplicity. Luckily, I found another language that hits those spots.
syntactic complexity is just the surface, just wait until you compile and run the program and watch buffer overflow, strict aliasing, signed overflows, undefined behavior, and other optimizations and "you have direct access to the memory"-cargoculting silently expose all kinds of vulnerabilities in your program.
When you hang yourself with C, at least you're hanging yourself with something real. It's more embarrassing to be done in by a misunderstood abstraction in a high-level language than a concrete memory overwrite in C.
You can hang yourself in C by having code that would dereference a NULL pointer that never actually would be executed, yet the compiler “optimises” your code based on it. Is that “real”?
And is now obsolete, for the most part. Unless you want to do something on hardware that only has C support, there are generally better options. The language has not evolved. There are now several good alternatives in this space.
Edit: Rather than responding to all the comments individually: There's obviously a lot of code already written in C. That doesn't mean it makes sense to write new code in C. Sometimes C is your only option, which I already said in my original comment.
None of that means there is any reason to start a new project in C if it's going to run on Windows or Linux, and it's certainly not the case that you have to write your program in C if you want to call it from another language. Sorry if this disappoints anyone, but it's true. For the traditional uses of C, such as writing a text editor or a library for numerical computing that you're going to call from another language, there's no longer a sensible argument for using C.
Except the entire world basically runs on top of C, and any library that wants to be usable from different languages is still written in it. Which is a pretty weird definition of obsolete. Don't use it if you don't want to for whatever reason, but spreading FUD to protect your bubble doesn't help anyone. C is the most successful programming language ever designed, and a fine tool in the right hands. It's not perfect, nothing is, anyone telling you otherwise is lying.
> any library that wants to be usable from different languages is still written in it.
This part isn't quite true. All that's needed is to be able to define an exported function with a specific name and calling convention. Libraries can be written in any language that supports these features.
That's just not true, because C++ won't even allow many of the techniques that make C the awesome language it is. It's different, and that's about as far as it goes.
Add name mangling, ABI issues and crazy compile times to that.
It was potentially safer until they added Exceptions, which tipped the scales so far the other way I can't believe we're still having this discussion.
There's Itanium ABI, you can use C type interfaces when needed and suppress name mangling. How do exceptions make it unsafe? They're part of idiomatic C++ and solve a problem without anything unsafe about them. And what are the many of the "awesome C techniques" that you can't do in C++?
But you won't when writing C++ unless you absolutely have to, because it's a major pita to encapsulate everything.
Entire books [0] have been written about exception safety in C++. Getting it right in combination with RAII, (copy)-constructors and assignment operators is very tricky. Which touches the main issue with C++ from my experience, mixing foot guns with higher level convenience.
Embedded linked lists [1] is a good example.
Seriously, live and let live. I'm fine with you enjoying and preferring C++, go for it. But consider taking a good look in the mirror and asking yourself why you feel the need to spread FUD about C to feel good about your choice.
I can honestly say that C++ exceptions have never, in decades, cost me a single hour's lost sleep. If you have any trouble with them, I promise that you are Doing It Wrong. You can tell you are Doing It Wrong if it is not super-easy and super-clean.
The key is to make sure all your cleanup code is in destructors where it will be exercised frequently as part of the normal operation of your program. In the event an exception is thrown, a bunch of destructors run, as usual, and your program ends up in a known clean state, with no extra effort.
Linux (like the BSDs) being C is an organizational phenomenon. There is no plausible technical reason for it not to be, or include parts in, C++. History is path-dependent; how we got here matters more than what is around today.
There is really no excuse for systemd being C. It is really a drag on its development.
UNIX and C are symbiotic, to get rid of C we need to get rid of both.
On the OSes that aren't just plain UNIX clones, C has a much less relevant role, and in some of them it is even being phased out, even if it might take a couple of decades yet.
At the time everyone was sure that Pascal was The Future of both OS and app development.
What happened instead was that everybody had their own Pascal dialect with the extensions it needed to be actually useful. Meanwhile, C had them all, already, and was about the same everywhere. Soon after, C++ came along, and there was again only one (although Microsoft delivered workable template support very late).
Lesson in Network Effect there. Also, shooting for usefulness out the gate.
Meanwhile, the Pascal crowd went off with Modula-2, Modula-3, Oberon, Apollo Pascal, UCSD Pascal, Object Pascal, Delphi, and countless others, none compelling. Apparently Delphi (still) has economic importance. The rest, not so much.
It used the Pascal calling convention, however Lisa and Mac OS were implemented in a mix of Pascal and Assembly.
Pascal UCSD also used a mix of interpreter and later AOT compiler.
Most mainframes don't use C as their original implementation language, and the oldest still being sold (Unisys ClearPath MCP) was designed about 10 years before C was born, in an Algol derived language, that uses instrics instead of Assembly.
If one broadens their horizons beyond UNIX, and looks into the history of computing there are plenty of interesting languages and OS architectures to dive into.
The Internet is full of digitalized version of papers and computer manuals from those days.
The language hasn't evolved for the same reason a transistor hasn't evolved: because it didn't need to. Adding for example string manipulation, or graphics, etc. would make it a fantastic language, but it would lose being that minimalist tool which makes it often the best choice when writing low level stuff. To me, Crystal, Nim, Rust, Go etc. already fulfill that need.
About the electronics comparison, I can build a lot of complex stuff more fast and more easily using opamps or logic gates compared to discrete transistors, but if I need to do a simple basic task such as invert a logic signal or amplify an analog one I'm using a single transistor any day because it doesn't just spare all that wasted silicon in chips but also in many cases it will be more fast, more cheap and more quiet noise-wise.
Wrong. Transistors have evolved in dozens of directions. The old transistors are still used, but decreasingly so. It is likely an op amp does better what your transistor and a bunch of other stuff did, and equally as cheaply. The op amp might have bipolar transistors in, but just as likely not.
C has not evolved because C++ has, and because the computers got a million times faster, so even Python is usually good enough. If you need something that does what C does, but better, C++ is right there waiting. Change your makefile, or rename some files, and suddenly you have a C++ program, that is then easy to improve as much as you like.
C has barely changed since 1989. It already had void and function prototypes copied from C++, back then. Only atomics and restrict make any real difference.
I meant functionally, ie what a single part (bjt, jfet, mosfet etc) can do. of course there are better materials and ways to produce more advanced parts.
There is no good alternative for doing some low level thing where you must have fine control over IO or execution sequence.
Rust is the best candidate, but it operates on a completely different level. (That said, I don't think one ever need that much control on a PC or a server. Not even on kernel programing.)
It's the only language your chip vendor will give you, but there is very little reason ever to use their compiler, anymore. And they have no say in what you use to generate your machine code.
Anyone that is open mindend can get Pascal, Basic, Java, Oberon, C++ and Ada compilers, although I admit that the Ada option is only available for deep pockets.
It's still a great language and still the industry standard for embedded. If you can't figure out C, an incredibly small and simple language, you probably shouldn't be programming.
I'm finishing up an embedded side project over the holidays. It's C, FreeRTOS and the STM USB host stack. I spend most of my time with the likes of React JS, Swift and Kotlin so it does give me a good, honest, fresh feeling. Also the feeling that my code could rise up and smite me at any moment! Great read, thanks for posting!
Got as far as playing around with a web site that used Rustler NIFs on a Phoenix website. Why? It was just fun to figure out how to get something from the web frontend through Elixir and into some Rust numerical code.
I don't think it had much practical use and knowing me, would be ridiculous to go back and touch it again (likely). I've been thinking about doing Rust to WASM but I'm still not sure if it's just throwing a bunch of complexity at something just because. It holds my interest a lot more than just doing everything in JS, though.
C was also more enjoyable to me than other languages. Something about compiled languages just feels really nice. But, it's also great to have a REPL available right there. Like with Elixir.
I LOVE coding in Delphi, really. And I start my new project (document-focused website generator @ https://docxmanager.com) in Delphi, while some people think it's dead :)
And I like C (although I don't code in it), because Delphi/Lazarus can interface with C very easily, you know, there is a lot of C libraries around the world :)
C is fantastic for small and very optimized components. If you want it portable, from microcontrolers to GPUs, it is the best choice too.
But if you need rapid prototyping, large scale projects within an organization or some libraries and tools to make the perfect, animated, templated and themed user interface you should think twice before choosing C. Java and .Net are freaking good at database integration and memory allocation/dealocation. Erlang is fantastic for multitasking and multithreading. Ruby on Rails or Python on Django are much better to prototype a website nannying a database.
You already know the saying "if your only tool is an hammer..."
Came to make a similar point. There's lot of love for C and belief it's some superior language to modern ones. The reality is that C hasn't aged well as hardware has evolved. It's not that good as parallel programming, which is more prevalent considering most CPUs are multi-core. The ACM did a great write-up on this, https://queue.acm.org/detail.cfm?id=3212479
The bottom line is to echo your first statement, C is a tool, good for some tasks, not so good for others. Choose accordingly.
That's one of the good points about using C today. Unlike, say, Fortran (or any other language with a high-level semantics) C may be hard for a compiler to optimize for a modern CPU.
I too, love coding in C. I love how you can control almost everything. There's nothing "mysterious" or abstracted away. Maybe I'm OCD, but I love understanding exactly what is going on in my programs at all times
#include <stdio.h>
int main() {
char a = 3;
char b = 8;
char result = a * b;
printf("%u %u %u %u", sizeof(a), sizeof(b), sizeof(result), sizeof(a * b));
return 0;
}
PRINTOUT/x86-64: 1 1 1 4
Surprising results on x86, probably not on Arduino or embedded platforms.
The phenomenon is integer promotion in C.
EDIT: "Char isn't an arithmetic type!" The same is true for short, too.
EDIT2: The consequences are: Might need to manually %= 0x100 instead of relying on data type's inherent modulation in multi-operational arithmetic, making rotation a necessarily pre-scan operation for LE/BE conversion instead of a process-transparent operation and I had a third one... hm. Also, maybe speed loss?
If you want to write a library that can be used from many programming languages in many operating systems, exotic processors and application plugins, use C.
If you want to write a new programming language that does the same from the get-go, make C as one of the compilation targets.
I DO TOO! Warts and all, the simplicity is really refreshing, especially in an age of:
"Make sure you have homebrew installed and then `curl http://goodluckbro.io/doomed | sh` then go read the Medium blog post to write your first hello world"
And don't get me wrong, I love and use whatever-shiny-lang-v2 too, I just really like the simplicity of C (and yes I acknowledge that comes with the ambiguity and undefined behavior too).
For those who appreciate radical simplicity I can recommend the Oberon programming language. As opposed to C it is a real high-level language whose language specification contains no reference to any computing mechanism:
I think you can argue for both sides of the argument, but what is the obsession for "the real"? Is the JavaScript running in your browser "less real" than the x86 Assembly that is compiled once again to Intel Microcode?[1]
> When I saw a memory address outputted to stdout, I felt that I was as close to the hardware that I could be without literally opening up my machine and stroking my ram modules like a crazy person.
Sorry to shatter your enthusiasm but the memory address you are seeing is very likely to be "virtual" and not physical thanks to paging: yet another damn abstraction![2]
-----
> "If I have seen further, it is by standing on the shoulders of giants." --Isaac Newton
The Magic of Software: How I Learned to stop Worrying and Love Abstractions
> "I think you can argue for both sides of the argument, but what is the obsession for "the real"? Is the JavaScript running in your browser "less real" than the x86 Assembly that is compiled once again to Intel Microcode?"
Because when the shiny abstractions and bloated frameworks layered like a Guinness World Record stack of pancakes† that attempt to hide all the underlying complexity of the hardware break down and leave you with a problem, be it CPU, memory, or I/O, then anyone without an understanding of what's going on at least at the assembly and architecture level will be up the proverbial creek without a paddle. I've made good money bailing people out of these kind of problems.
I wish C had gone back and done a better implementation of strings that wasn't null terminated. Or at least an implementation of plain buffers with a richer libc for manipulating them.
I know you can find bolt ons, but they aren't terribly useful since no 3rd party libraries use them. You have to resort to stuff like antirez's sds.
The choice is natural for C, due to the fact that it is the only way to implement the string simply as an array of characters. (Doing anything else would go against C's view of the world.)
C's "inherent inner workings" appear to rely on the fact that every char array has to be null-terminated.
That then implies at minimum a sanitation of user- and data-input of char array, and maybe an absolute maximum overrun char count protection at execution, but once your default is to keep track of every string length and not the opposite of keeping track of specific strings, I'd kinda agree with above that it's not really "C"... more like your own implementation of a(n overrun-safe) string library?
Inherent to the concept of the char array is that a size value would be redundant, for it is the position of the null-terminator that already contains/is this.
The analogy for the string concept of a char array's missing null-terminator is actually a wrong size value.
https://github.com/antirez/sds is heaps better than just using raw strings. Any language with a fat pointer variant out of the box is going to do it better than C as well.
That's not to say that the rest of C is bad, but there's nothing to be gained from defending C strings; they're bad and you should probably use an alternative and that alternative should probably have a length and a buffer, even better if you can find this out of the box in your language that otherwise works a lot like C.
I like C myself. I wish though they would extend it with some features similar those available in Delphi/FreePascal language. This will still leave complete low level control and have amazing compile speed and eliminate much need for C++. C++ is heavy beast and while I a have to use it every once in a while I do not particularly enjoy compile time and some other things.
Zig made some moves to that direction but not yet enough and is not really ready for production anyways. Also I did some experiments and initial impression was good but it failed to compile used libs written in C even though it claims to be a C compiler as well.
Zig is fantastic, but it's still early days. If it doesn't compile all c code ever I wouldn't mind taking c code, wrapping it into a .a and dropping it into a zig program until you can untangle it.
We used C99 in some of my classes at college, despite attending University from 2010~2017. I liked C a lot - working that 'close' to the hardware and the OS really gives you an appreciation for how things work. These days I've moved on to things like Ruby, Python, and Go...but I'd like to give it another try with C18 to see what modern C looks like.
For as much as it seems comical, I enjoy languages like Java and C# so much because while I cut my teeth on lots of C (then C++) from the early 90's onward, once the web began to drive so much development, you needed to be able to sling strings (with first-class Unicode support) around.
When I was learning C++, a popular idiom to employ was to use it as "a better C". Don't make your own classes, but leverage good "string" impls, templates, namespaces, declaration semantics, RAII, etc, basically stuff that just your C... better.
I can appreciate good nostalgia, but I can't help but think i'd be pining for some of those betternesses that came along later.
Hardware has evolved a lot since C was made, but is there a way to actually make use of this that isn't available in C? It seems like modern CPUs do an awful lot to support programming like it's still the 1970'ies.
I do too, mainly C + Win32. Besides Asm (which I use a little bit of too), it's the only way I've found to write applications which are a single tiny executable with no dependencies nor installation, and will work from Win95 upwards (and likely WINE as well). I'm a little surprised that the author didn't mention efficiency.
The Ur-Language is probably verilog, or at least the raw instructions of the platform; and C isn't much closer to those instructions than many of its alternatives, thanks to optimizing compilers and the differences between the true form of the computational model of the platform and the imaginary one that the C standard reflects.
"I hope I am able to explain some of the psychological appeal of lower level programming in C."
Felt like I finished reading the intro here, only to realise I scrolled to the bottom with only 2 paragraphs to read more. Guess I was expecting a bigger read, perhaps with some code references, as it was interesting :)
C with valgrind and other analysis tools (clang scan-build/cppcheck etc) make for a powerful combination. There should be a dinky acronym. But, never gonna happen in my workplace.
I’ve been messing with yasm lately and I think I might like that more. I’ve written a lot of C (including a small browser) and definitely enjoy it (especially parsing, which is probably not great.) The more I’ve written and read though the more I’ve realized how much it really does abstract and that kind of sucks the joy out of it for me.
For actually getting things done there’s go and python of course.
I love coding in a "C+" subset of C++, meaning minimal use of OOP, and maximal use of RAII, smart pointers, and STL. It's hard to beat C++ when it's scoped down like this.
If it had native support for JSON and supported GO like co-routines then a lot of its utility would be applicable to other scenarios that use fancy languages. This is what I always find surprising, C’ syntax is robust so add these missing gaps to the current language and weall win.
>C’ syntax is robust so add these missing gaps to the current language and weall win.
If every feature everyone wanted was added to C, it would just turn into a bloated mess like C++, or like PHP, with tons of bad decisions baked into the language that can't really be avoided. Features like this belong in libraries - the stdlib should remain as minimalistic as possible.
If everything everyone wanted was added to C, it would be C++. :-)
The C++ evolution is essentially an iterative cycle where people hackily create new language features with insane preprocessor and template hackery, and then the Standards Committee comes in and takes the most popular hacks and tries to turn them into real language features. And then people create new hacks on top of that.
C says, you want object oriented polymorphism, you've got structs and function pointers, what more do you need?
This is of course implicit in the mindsets -- C++ is continually trying to support new idioms for the programmer to express themselves, C would prefer you stick as close as possible to a smaller set of idioms. Just because you can technically make use of a design pattern in C, doesn't mean the language (or the community) will ever encourage that.
I participated in the coding of several arcade games in Z80 assembler in the late 80s. Not "a" large project but a collection of small ones.
These programs, by the time I came aboard and started sharing code with those to make the games I was involved in, had also "succumbed" to this force.
The shared bedrock of all these games was code that used certain Z80 registers as pointers to take advantage of instructions that read or wrote fixed offsets from those pointers. The first byte of the record pointed at by the pointer was the type and the next 4 bytes were the position. The rest of the record depended on the type.
That was us doing the best we could with 8-bit processors when our competition was using 68000.
Many C projects indeed have polymorphism (I believe io libraries do exactly that to implement the idea that "everything is a file"), but this is rather different from object-oriented polymorphism.
Isn't that basically Go's whole thing? It has syntax that is really close to C, Go style co-routines (obviously), and I believe it has good JSON support. You do have to use a GC and you lose some of the really low level bits but I would wager most C programs don't need to have no GC and be able to do the really low level stuff. The ones that do probably don't need the goroutines and JSON and stuff.
I have an embedded project with an RTOS for things you would do with coroutines, CBOR and JSON handling. I would not appreciate a garbage collector when it came time to debug a dumb pointer or overflow error.
Doing it in C obviously. I can’t really consider Go embedded because I’m maxing the chip out now in C without including a Go VM or whatever you call it.
I used to enjoy C. If you enjoy C for these reasons, you may enjoy Go and the benefits of living in this millenium. It feels very minimal with a lot of control. Similar heritage. Definitely feels like C a lot of the time. And you get to use the & operator if that's your thing. It's got cleverness to infer types if you want, but you can specify types and a lot of the cleverness is at compile-time: still very light-weight to run. And it's got a better standard library which makes it a lot more fun to throw something together quickly.
When it comes to simplicity, Go is definitely quite similar to C in many respects (and not all of them good). After I started working with Rust for a little bit, I think it better matches the feeling I get with C when it comes to being confident what your code will do when compiled. To be fair, there is still a fair amount of magic you have to get used to.
But the main issues that I've had with Go is when the Go runtime is causing your code to very much not act like a C program. To be fair, I work on container runtimes and similarly "lower level" tooling so this is probably not an experience that everyone else has.
I haven’t used it much, but I really dislike Go despite liking C. It adds new things but just barely, to the point that it makes me want to not want the watered-down features it has.
Undefined behaviour is a tool to generate faster code, by shifting some burden of correctness to the programmer. For example, accessing past the array size is undefined behaviour. This lets the compiler skip the bounds check, an expensive check+branch on each array access. The trade-off is that now the programmer has to ensure correctness manually (or use an alternative interface that wraps a bounds check) else bad (TM) things will happen.
Despite this, tools will often not warn of undefined behavior or these premature optimizations. Quite the opposite, the compilers are almost user hostile for trying to ensure safety and correctness. I cannot see into the minds of the GCC and LLVM teams responsible for the C frontends, but to the extent that you'd think the compiler should do anything here, they seem to think it's largely not their job to improve safety or correctness. That's the job of static analysis tools (which can cost a lot of money), runtime analysis tools (which can cost a lot of time in writing tests that exercise all code paths), and fuzzers (yet more time).
Punting all of this extremely important reasoning to "the programmer" is ridiculous when it's been proven over and over again that even incredibly smart C programmers get this wrong.
Compilers could certainly do a better job when performing optimizations that “abuse” undefined behavior, but in general this isn’t a trivial problem. The way optimization passes are set up can often make it difficult to generate warnings that have a low false-positive rate.
Do you think it might be a problem that undefined behavior optimizations and/or pattern matches are so prolific in C that reducing the "false-positive rate" of ... actually bad undefined behavior (?) would be the hard thing?
Why not disallow undefined behavior and instead have the compiler emit: "Hey, this branch is never taken, if you want us not to compile it remove the dead code". Seems like that's exactly the same sort of putting the onus on the programmer that andrepd mentioned, but without the side effect of millions of device drivers are zero day remote code execution vulnerabilities waiting to happen?
The array access example shows why this isn't generally possible -- the programmer hasn't written any bounds-checking code or assertions (they just wrote x[y]). Should the compiler emit bounds-checks? If yes, your code is now slower but there isn't an easy way to disable this "feature" (if x is heap-allocated or passed as a pointers, how will the compiler know what kind of bounds check is sufficient?). If no, you have undefined behaviour and crashes.
Even Rust made a mistake with restricting undefined behaviour -- they accidentally allowed creating references to fields in packed structures (resulting in possibly unaligned references which is undefined behaviour in Rust) in safe code. These days you get a warning telling you to use unsafe or copy the value, but they have yet to make it an error. For comparison, C also gives you a warning for the same problem (or an error with -Werror).
I also want to point out that modern compilers have different sanitisers (-fsanitize=...) which cause your program to abort() on memory unsafety or other violations. It's effectively a better valgrind which is compiled into your program.
I would argue that most of the time, yes, the compiler should emit bounds checks and only in performance sensitive code should they be removed (or where the compiler can prove the bounds check holds for a range).
I don't know how anyone who has studied the history of vulnerabilities and security issues and code would think this is a hard tradeoff:
> If yes, your code is now slower but there isn't an easy way to disable this "feature" (if x is heap-allocated or passed as a pointers, how will the compiler know what kind of bounds check is sufficient?).
> If no, you have undefined behaviour and crashes.
The latter choice has resulted in how many millions or billions in losses, perhaps even lives lost?
The problem is that if you want to permit users to do things like dereference pointers and do pointer arithmetic (a requirement for a C-like language) there is no way for the compiler to ensure (even at runtime, because the pointer might not be to memory managed by the language) that the pointer dereference is valid.
Okay, maybe you can add a small overhead to array accesses (which is what Rust did), but you can't stop undefined behaviour entirely. Rust has plenty of undefined behaviour (most can only be triggered through unsafe -- which is the whole argument behind the language -- but as I mentioned there are some cases where even they made a mistake and made an operation safe when it should've have been).
The benefit of "unsafe" blocks is that the surface area to audit is dramatically reduced, so that yes, if you want high performance pointer mangling and arithmetic, you can do it, but your code will require more attention.
In C/C++, undefined behavior and unsafe operations abound and it is nigh impossible to isolate what is "important" to review from a security standpoint.
> Why not disallow undefined behavior and instead have the compiler emit ...
You can't "disallow" all undefined behavior in C without fundamentally changing the language to something different (Which would also likely make it useless for it's intended use-case). Some of the details related to undefined behavior are simply not possible for the compiler to detect without extra information, hence why they are undefined. `Rust` has the same thing, you can invoke undefined behavior anywhere in the program through use of `unsafe` anywhere else. And Ex. How is the compiler supposed to determine if a particular function is ever called with a `NULL` parameter? How does it know if a particular signed addition may overflow? How does it know if two parameters to a function alias in an invalid way? How does it know if a particular pointer lacks a `const` and is actually backed by read-only memory? And most importantly, how does it determine all of this at compile time?
With that not all undefined behavior-based optimizations are unexpected or warrant a warning or error (In-fact, I would say at the very least the majority of them don't). Undefined behavior is much more complicated then just dead-code removal. Even then though, you don't always want dead-code removal warned about - Imagine getting a warning about every inline function or macro that results in some dead-code.
To be clear, I'm not saying there aren't obvious problems that have come from this approach, and I think some of the choices for undefined-behavior are clear mistakes now even though they made some sense in the 80s or 90s (And some of those can usually be turned off, thankfully). And I think there are also some additions that could be made to the language that could solve a lot of the existing problems (Like non-null pointers, or fat pointers, though I'm not exactly holding my breath...), but the existence of undefined-behavior is not in-and-of-itself the actual problem with the language, and "disallowing" it outright is simply not possible. I would actually wager undefined-behavior exists in some form in a lot more languages than you'd expect, just less documented. Rust certainly still has it, despite it's numerous safety-focused features. And even if you add things like fat pointers to C that would clearly reduce possible out-of-bounds mistakes, the language would still need to provide some way to create one from a size and a pointer, and that still allows for undefined-behavior cases to happen - my point being, while adding such a thing is still clearly better, it does not actually remove the undefined-behavior possibility.
It's not very expensive because the branch will never be taken except when it results in an out of bounds access. When that out of bound access happens you care more about the security benefits than performance.
I'm curious if this still stands in general today, given today's advanced branch predictors and the fact that the CPU tends to be memory-bandwidth-bound, thus you have more "free computation" while waiting for memory (what GPU programmers call compute to memory ratio).
So I have to hope for the best when some piece of hardware, coded in C, gets plugged into the network.
Morris Worm is 30+ year old, and yet best we can do is having platforms like CHERI, Solaris SPARC ADI, ARM MTE, which also happen to be quite specific.
C has a bad rep, and deservedly so in the context of anything related to security. We have not been able to sanitize the language because the ability to surprise the writer of some bit of code is almost a core concept in the language. This sucks. But at the same time the way newer languages are held up as the manna from heaven is rather tiresome as well. By the time they will be deployed in the quantities that C code is today there will be exploits in those languages because it is mostly the programmers that are fallible. Now, some languages - again, C - make it easier to shoot yourself in the foot than others. But I've seen plenty of code in Java that was exploitable even though memory safety in Java is arguably at a higher level than say Rust.
So you'll be hoping for the best for some time to come, no matter what the stuff you install was written in.
It is possible to foreclose on some kinds of error entirely, and if you compare like-for-like, those benefits become obvious. If, on the other hand, you take the productivity gains of a higher-level language as a reason to write more ambitious programs, well, new functionality can find new problems.
Very nice, but I'd nitpick memory corruption and expand it to hardware errors, or add that category.
Sending digital signals over analog medium (read: always) can fail, and may do so regularly depending on hardware itself and environment conditions.
Simple examples: Digital 3.3V signals sent over 10m, high-capacity lines (lose a bit here and then), or an overclocked CPU undervolting just at the right time, etc...
EDIT: I'd also argue that UB is a type of logical_error. Not the fact that UB exists within the language, but in that the logic failed to account for the scenario?
Except that unless we are talking about hardware design languages, no programming language accounts for hardware errors as such.
UB is its own kind of error, specially since ISO C documents over 200 use cases, and unless you are using static analysers, most likely won't be able to find out that are failing into such scenarios as no human is able to know all of them by heart, whereas logical errors are relatively easy to track down, even by code review.
But every language that can reach the hardware will have the ability to wreck things in spectacular and hard to predict ways. Every new language ever was touted as the one that would finally solve all our problems. For Java and COBOL the historical record borders on the comical. I have no doubt that the same will go for every other language that we just haven't found the warts in yet. Two steps forward, one step back, that seems to be the way of the world in the programming kingdom.
Nearly every piece of networked hardware has critical software running written in C, and the consequences are not nearly as disastrous as reading HN would make us believe.
A possible outcome is that you would trade pointer bugs etc. for "Java bugs" if Java embedded were used everywhere. Embedding a complete runtime increases the attack surface alot.
Many of those Java exploits are on C and C++ written code layer, yet another reason to get rid of them in security critical code.
According to Microsoft Security Research Center and Google's driven Linux Kernel Self Protection project, the industry losses due to memory corruptions in C written software goes up to billions of dollars per year.
Then, unless we are speaking about a non-conformant ISO C implementation for bare metal deployments, it provides the initialization before main() starts, floating point emulation, handling of signals on non-UNIX OSes, VLAs.
How Apple, Google, Microsoft, Sony, ARM are now steering the industry regarding their OS SDKs has already proven who is on the right side.
I have been at this since the BBS days, I don't care about Internet brownie points.
The only thing left is actually having some liabily in place for business damages caused by exploits, I am fairly confident that it will eventually happen, even if it takes a couple of more years or decades to arrive there.
Claiming that a little process startup code (that isn't really part of a particular language, but is much more part of the OS ABI) was easier to exploit than an entire JRE is just dishonest.
I would never think of "floating point emulation, handling of signals on non-UNIX OSes, VLAs" as anything resembling a "runtime". These are mostly irrelevant anyway, but apart from that they are just little library nuggets or a few assembly instructions that get inserted as part of the regular compilation.
By "runtime", I believe most people mean a runtime system (like the JRE), and that is an entirely different world. It runs your "compiled" byte code because that can't run on its own.
I care about computer science definitions, not what most people think.
Dishonest is selling C to write any kind of quality software, specially anything conected to the Internet, unless one's metrics about quality are very low.
C is basically it for embedded development though. I've gotten so tired of recompiling and waiting to flash a chip with C, that I've started learning ANTLR and making my own language. The idea is to have a language which runs in a VM written in C, and allows you to easily access code which has to be in C. Sort of like uPython, except it can easily be used on any board with a C compiler with a small amount of setup, and C FFI is a first class citizen. Also coroutines masquerading as first class actors with message passing, since I always end up building that in C anyway.