
Let’s stop copying C (2016) - bibyte
https://eev.ee/blog/2016/12/01/lets-stop-copying-c/
======
notacoward
As a C programmer for 20+ years, I find most "don't use C" articles banal and
snobbish (even when parts of them are correct) but this one's _really good_.
Well reasoned, well explained, not condescending at all. Kudos.

I like the fact that textual includes are at the top of the list. What an
abomination! Modules have been known and clearly superior since forever, both
for correctness and for compile times. Every time I see macro abuse in C, or
exploding compile times in C++ (which oddly doesn't even get mentioned in this
section) I cry a little for the future of my profession. Optional delimiters,
default fall-through, weak typing - check, check, check. Good points, well
argued. I've had to fix bugs caused by all of these. I happen to agree on type
first, disagree on single return (and it's a shame Zig doesn't seem to be on
the radar especially because of this), but these are certainly discussions
worth having and the OP provides a good starting point.

It's definitely a bit long, but well worth it.

~~~
linuxlizard
I work with a large vendor code base that is ~300k lines of C. There are 515
different #ifdef conditional symbols (thank you, grep). The code base started
probably >20 years ago and has has who knows how many engineers working on it.
There are 20+ years of standards' evolution in this code, across who knows how
many chipsets, the vendor changing hands, etc. And yet, in spite of the maze,
the code works very well!

I would love to see a new language, technology, _something_ , take on this
sort of problem. C works so well in this sort of problem because of the
preprocessor's ability to strip out bits & pieces that aren't necessary at the
time.

We need something that will allow us to fall into the pit of success rather
than succumb to the easy solution of using #ifdef for new, optional features.

~~~
MrBuddyCasino
I suppose you mean something besides dead code elimination? Because that looks
like it could be solved with proper encapsulation and abstraction mechanisms
in the language, which C lacks.

~~~
linuxlizard
It's a pretty amazing problem--a wifi chipset. The driver runs on Linux and
Windows, also supports 3x different CPU architectures (x86, mips, arm) and
multiple bus interfaces (SoC internal, usb, pci). Supports disabling optional
features so 2.4Ghz only chipsets have a smaller memory footprint. Can build
with/without different security options. And that's just starting in on the
huge number IEEE 802.11 standards across the last 20 years. Some chips support
some of the features, some don't -- but the driver still has to work on all of
them.

Linux kernel is an amazing example of how to do ^^^that and still produce
beautiful code. The Linux kernel uses a great number of function pointers to
accomplish their encapsulation goal.

~~~
MrBuddyCasino
I see where you're coming from. The description already gave me nightmares.

------
Steltek
Python's indentation for blocks isn't holding up to the test of time.

After suffering badly formatted code in my early career, Python's approach
seemed refreshing. But after suffering one too many bad merges where
indentation is left mangled, we need braces. You can automatically reformat
everything instantly without even thinking about it, eliminating merge
ambiguity. gofmt is the better solution because it declares a strict
representation that can be automatically enforced.

You could retort that better rebase/merge practices could alleviate some of
these issues but if that discipline could be enforced on arbitrary groups of
humans, we wouldn't have cheered Python's forced indentation in the first
place.

~~~
naasking
> But after suffering one too many bad merges where indentation is left
> mangled, we need braces.

Or semantics-aware merging.

~~~
AnimalMuppet
I'm not an expert on Python by any means, but if I understand correctly, the
end of the indent level is the _only_ way you know that the block ended. That
seems to me to make it impossible to write semantics-aware merging (presuming
I understood what you meant by the term).

~~~
dooglius
Why does this make it impossible? The merger just needs to understand where
blocks begin and end, not sure I follow you

~~~
AnimalMuppet
What I thought you meant is that you need something that will straighten out
the mangled whitespace. I didn't think that was possible, because the _only_
thing that marks where blocks begin and end is whitespace. If the whitespace
gets mangled, there's no other syntax you can use to straighten it out.

But I'm beginning to think you meant that you want a merge tool that doesn't
mangle the whitespace in the first place. If the pre-merge whitespace is
unmangled, and the merge tool understand Python whitespace, then at a minimum
it should be able to flag ambiguous changes and ask for help.

~~~
dooglius
Right, a merge tool that understands Python semantics at some level should be
able to detect when a block's indentation has changed, and adjust the
indentation of any changes to that block correspondingly.

------
jcranmer
There are a few more faults in C that I would call out that make it actually a
pretty bad low-level language:

* No bitcast operator. Your options are a) use the union trick; b) use the memcpy trick; or c) take an address, cast it to a different pointer type, and hope that your compiler gives you a pass on technically violating the standard here. Even C++ didn't get bitcast until C++20.

* No SIMD vector types. Of course, SIMD vectors are even more type-punning heavy than integers and floats, so you do need a good bitcast operator to get anywhere.

* Volatile and atomic are type qualifiers. These ought to be properties of the memory accesses; making them qualifiers on types obfuscates which memory accesses they apply to. If you look at the Linux kernel, it doesn't use volatile but instead uses a READ_ONCE macro that acts much like a volatile load.

* Bitfields are a mistake. They combine especially poorly with the vague properties of volatile. How many memory accesses are required in this program:
    
    
        struct {
          volatile unsigned a : 3;
          volatile unsigned b : 2;
        } foo;
    
        foo.a = 3;
        foo.b = 0;

~~~
dooglius
> No SIMD vector types

Huh? What architecture are you using that provides SIMD without corresponding
vector types?

------
scj
Point taken that newbies are going to be confused that "odd / even ==
integer", but they'd be equally confused about why odd / even yields something
close to what they expected but not quite (floating point errors).

In both cases, they're probably going to be confused why an equality test a
few lines later fails every once and a while.

The concept of divide vs. div & mod is one that a programmer eventually needs
to understand. More importantly, not everything should be optimized for
newbies. The context driven / operator is appropriate in programming languages
designed for experienced programmers.

Upon further thought... Couldn't the argument be rephrased as "integers are
broken numbers"? It sounds silly, but from a newbie's perspective the same
problem exists with "int f = 38.2 * 25;".

~~~
OskarS
The problem isn't necessarily that / does integer division: it's that it ONLY
SOMETIMES does integer division. It should either always do proper division,
or always do integer division, but not try and guess what the programmer
wants.

An example from C#: when I was a Unity developer, I frequently needed to
figure out what the aspect ratio of the screen is. You'd think this would
work:

    
    
        float aspect = Screen.width / Screen.height;
    

since 99% of all parameters in Unity are 32 bit floats (common in gamedev).
But no! Screen.width and Screen.height happen to be integers, so this
particular line silently returns that the screen is actually square. I've
literally run into this exact bug with Screen.width/height three or four
different times. Every time I feel like an idiot, even though it's not my
fault that C# inherited C's dumb division operator.

Python does it correctly, and C-like languages should as well. Division used
with two integers should always return a floating point number, and there
should be a separate operator for integer division. It makes no sense the way
it's done now.

------
freeopinion
As far as the confusion between the assignment operator and the equality
operator, I took somebody's advise a long time ago and always put the literal
on the left of the equality operator.

3 = x

Doesn't solve everything but it does help.

The real problem is using '=' as the assignment operator. I think this was a
serious design flaw. Of course some languages use ':=' which is better. I
prefer just ':'. I see many languages that use '=' in some contexts and ':' in
other contexts. Members/properties/fields quite often get assigned using ':'.
I say make it universal and reserve '=' for equality.

~~~
TheJoYo
We call them yoda conditions.

[https://en.wikipedia.org/wiki/Yoda_conditions](https://en.wikipedia.org/wiki/Yoda_conditions)

------
ikfmpwdsoz
Minor nits:

1\. "Single return and out parameters" should have a special mention for
Haskell, since Haskell doesn't even have multiple input parameters!

2\. Python has assignment expressions now.

Overall, it's a pretty good list of shortcomings of C, but I disagree with
several of the points: (a) special-casing subtraction lexing to be whitespace
sensitive is silly, and (b) integer division is essential whenever working
with arrays or modular arithmetic, and converting types explicitly, like Rust
mandates, is definitely the way to go. Who knows if I'd want a float, double,
rational or currency type to be the output, anyway?

~~~
upofadown
>Python has assignment expressions now.

PEP 572? If so, that doesn't support multiple assignment at all.

~~~
ikfmpwdsoz
Yes, but the post mentions assignment expressions as a bad thing, and says
Python "gets it right" in this regard, which would no longer be true.

------
rhn_mk1
There's only one nitpick I could find in the text: in Rust, semicolons are not
delimiters, but instead they distinguish statements and expressions. It's
clearest when returning. Rust returns the last value, so `3 + 5`, which
evaluates to `8: u32`, is different than `3 + 5;`, which evaluates to `():
()`.

I'm not sure if that makes things better, but it could be worth a special
mention.

------
Sean1708
> APL and Julia both use ~

Julia doesn't use ~ for logical not, it's used for bitwise not (it does work
as logical not but only because Bool is a subtype of Integer, I've never seen
it explicitly recommended).

------
jwilk
> awk, Tcl, and Unix shells only have strings, so in a surprising twist, they
> have no concept of null whatsoever.

In shell a variable can be unset, which is different than set to empty string.

~~~
frou_dh
This table shows some ways that distinction can be used when expanding a
$variable

[http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3...](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_02)

Make extensive use of these for maximum job security! /s

------
jancsika
> Quick test: if you create a new namespace and import another file within
> that namespace, do its contents end up in that namespace?

Suppose in my program "hello" I import "foo"\-- which depends on
foo/buggy.dll-- and "bar"\-- which depends on bar/buggy.dll.

foo/buggy.dll is a helper library that computes a correct value for "foo" but
would crash if used with "bar."

bar/buggy.dll is foo/buggy.dll with a single late-night bug fix for the
crasher which introduces a regression that will crash if used with "foo".

So does "contents ending up in that namespace" mean that "hello" will run
without crashing/clashing?

------
mcguire
" _In Lua, point would be the x value, and y would be silently discarded. I
don’t tend to be a fan of silently throwing data away, but I have to admit
that Lua makes pretty good use of this in several places for “optional” return
values that the caller can completely ignore if desired._ "

No. Just no.

This is several orders of magnitude worse than most of the rest of the list.
(Which is mostly personal opinions.)

~~~
_blrj
How so? In Lua, pass-by-value is only done with numbers, and everything else,
you're passing around a pointer under the hood.

Is there a reason I'm not thinking of?

------
ncmncm
I found this essay like a lot of other essays comparing languages. The
comparisons were broader. The author studiously avoided any mention of C++, or
how it differs from C and other languages. (Or is C++ supposed to have the
same answers as C? If so, FAIL.)

Almost all the attention went to superficial details that really don't make
much difference. Your fingers and your eyes learn the gestures and signposts,
and they fade into the background. (But modules and namespaces matter.)

What is left is whether you can express what you need to, at all. Whether,
having expressed it, you can pack it up into a library that anybody can use
without knowing all about how it was put there. Whether, finding a library,
you can actually use it in your runtime environment without it costing more
overhead than if you wrote your own, or demanding runtime concessions
different from what you have committed to for your code or for other libraries
you want to use. Obligate GC is death for interoperability.

The only languages that excel, there, are C++, Rust, and D. (I include Rust
because it is well on its way, and will get there before long if its users can
wean themselves off of ARC boxing.) None of the other languages are really
even trying. It's tragic. Haskell and the MLs could be good at libraries, were
it not for their obligate-GC problem. The other languages with big library
ecosystems are slow, so overhead isn't noticed.

There has been more than enough time to come up with something to unseat C++.
Part of the problem is that the main incubator for new languages has been
academia, and academics won't even discuss a language that is not obligate-GC.
We need a language that will be equally good at copying register values to and
from ALUs and memory buses, driving vector pipelines, orchestrating legions of
GPU cores, and wiring up FPGA subunits. (I have not seen an FPGA compatible
with GC.) If we end up programming our FPGAs in C++, it will be the fault of
everyone who failed to unseat it by making a better language than it.

------
freeopinion
I think that assignment by destructuring gets a bad rap here. It's a great
convenience and an elegant way to accept multiple return values. Or rather, to
maintain a coherent type system where there is only one return value but it
can be of a compound type.

------
sn41
To me a big mistake in C is its multidimensional arrays. For example, it is
not possible to write a function which multiplies two rectangular matrices,
since the sizes cannot vary in that manner (m×n and n×k, with m, n, and k
variable).

On the other hand, C has so many goodies which ought to be done right and
better in modern languages, but often are not:

1\. Variadic functions like printf. It sucks to wrap arguments into a list
just for this.

2\. Setjmp/longjmp and nonlocal returns

3\. Union data types

4\. Conditional macro directives to compile debug statement versions when
needed.

It's easy to criticise C or patronise it saying that it was good for its time,
the reality is that many of its features (or what they attempt) are futuristic
even today.

~~~
ThrowawayR2
> _Setjmp /longjmp and nonlocal returns_

That has to be the first time I've seen those features of C described as a
positive. I'm genuinely curious; would you be willing to explain further?

~~~
Boulth
Maybe the parent uses jmps for coroutine style programming?

------
tabtab
A similar discussion raged on the c2 wiki a few years ago:

[http://wiki.c2.com/?ItsTimeToDumpCeeSyntax](http://wiki.c2.com/?ItsTimeToDumpCeeSyntax)

[http://wiki.c2.com/?ItsNotTimeToDumpCeeSyntax](http://wiki.c2.com/?ItsNotTimeToDumpCeeSyntax)

[http://wiki.c2.com/?AlternativesToCeeSyntax](http://wiki.c2.com/?AlternativesToCeeSyntax)

(The c2 wiki is half-defunct. Long story.)

------
serichsen
Lisps do NOT do textual inclusion. Lisp systems are created from code, not
text.

I guess that the author only saw the use of `use-package' or the `:use' option
of `defpackage', but this is not necessary (and not generally used) to refer
to other namespaces.

The actual use of `defpackage' is often quite close to how Clojure does it.

~~~
kazinator
Lisps in fact do textual inclusion. In ANSI Common Lisp, when you (load
"foo.lisp"), and foo.lisp isn't a compiled file, it gets read and processed as
text.

[http://clhs.lisp.se/Body/f_load.htm](http://clhs.lisp.se/Body/f_load.htm)

The symbols in the compiled file get read in the current package. The
<star>package<star> variable is dynamically rebound, over the lifetime of the
_load_ , to its existing value, so that if the file happens to change it, that
effect is undone when the load finishes.

The loaded file source can arrange for its bulk to be read in its own
namespace, or it can be processed in the parent namespace, which is the best
of both worlds.

When a file is compiled, then it's no longer textual inclusion: best of both
worlds again.

This is all so reasonably designed that I copied the salient aspects of things
like _load_ and _compile-file_ and all that jazz nearly _as is_ into TXR Lisp,
which isn't an ANSI CL implementation and free to do anything differently.

------
chaoticmass
On Optional block delimiters, the author recommends programmers in C like
languages ALWAYS use braces.

I agree! I've stuck hard and fast to this rule since... I was programming
Qbasic as a kid. Back then it was for different reasons, but the practice
stuck with me as I learned new languages.

------
serichsen
Integer division: I think the best thing to do is to produce an exact
rational, like Clojure that was mentioned, and most Lisps, where it got that
from.

------
ngcc_hk
One comment I saw here is that c compiles to a c machine which is too dated.
Not sure I agree or not agreed. But not mention.

------
mar77i
I don't understand the modulo issue with negative numbers, since you can
always just use an unsigned number or write a representation layer without
much hassle.

Though maybe I'm too deep down the rabbithole already and simply got used to
it.

------
chriswarbo
I like this list, although I want to nitpick the use of the word "monad",
since it seems to (a) put people off these ideas, either due to them seeming
scary or pompous/ivory-tower and (b) confuse those who are otherwise receptive
to the idea.

Rather than call this "monadic error handling", I'd just say that results are
wrapped up so that errors can be distinguished from successful results.
Usually that's done by wrapping in a list (or, if the language supports it, an
"Optional"/"Maybe" type, which is just a list truncated to 1 element).

Adding this extra structure lets us distinguish things like "the query died"
(an empty list) from "there was no match" (a list containing an empty list).
If we'd used NULL to indicate failure, we wouldn't be able to distinguish
between these situations (or indeed if there _was_ a match, whose value
happened to be NULL!).

Naively we might think this require a lot of length-checking and unwrapping,
but we can avoid that by using list operations that are (hopefully) familiar
to every programmer, like "map", "concatenate" and "singleton".

It turns out that those operations form a monad, but it seems overly dramatic
and confusing to name the approach using that terminology. Sure it's nice that
we can abstract out this interface, but we don't need that much abstraction
when our whole ecosystem is using a single, _specific_ implementation like
"Optional".

Incidentally, there's a really nice paper on this called "How to replace
failure by a list of successes" (
[https://rkrishnan.org/files/wadler-1985.pdf](https://rkrishnan.org/files/wadler-1985.pdf)
), which shows how normal, non-truncated lists actually implement backtracking
search (assuming our lists are lazily generated, e.g. like in Haskell or using
an iterator).

Note that being "monadic" specifically means we're able to 'collapse' these
lists, i.e. concatenate a list-of-lists into a list ("singleton" comes from a
weaker notion called 'applicative', "map" comes from an weaker notion called
'functor'). Collapsing lists _removes_ the distinctions that we introduced,
since "concat([[]])" and "concat([])" are both "[]", making our result act
more like NULL. So calling this approach "monadic" is actually emphasising the
wrong part!

"Do you struggle to track down the source of NULL values in your code? With
_monadic_ error handling you can struggle to track down 'Nothing' values
instead!"

The real improvement is from the _non-monadic_ interface, like "map", which
preserves these distinctions.

------
andrewmcwatters
When I was closer to my teenage years, I tried what I considered a sizable
variety of programming languages and over time gravitated more towards C and
Lua and less towards everything else, the latter which is specially mentioned
a few times in this article.

I found that they both shared a philosophical simplicity (even if it only
_seemed_ that way with C, considering how much complexity you later learn
about) and over a decade later I've not found the same philosophy in any other
programming languages.

They all tend to be written with the goal of adding features that are suppose
to make the programmer's life easier—and here's the distinction—rather than
designing a language that is powerful but simple.

This of course has shortcomings of its own, but the trade-offs are ones that I
typically seek.

The grammar for Lua is delightfully short, which seems to be a significant
source of its beauty.
[https://www.lua.org/manual/5.1/manual.html#8](https://www.lua.org/manual/5.1/manual.html#8)

I'd love to be educated on similarly easy languages.

I'm not sure what the lingua franca of the future for software developers
should look like, but I forfeit that it probably should be slightly more
complicated than C or Lua in terms of looks. At least in terms of optional
standard libraries provided for things like cache levels and GPU support.

Perhaps that's outside of the scope of what a programming language should
provide to users, though? I'm not sure. It seems like we sit on a lot of
complexity and don't use it as efficiently as we could be, though. Maybe some
of these things the underlying virtual machines or compliers should be doing
for us as they currently do, but extending this reach.

------
rocqua
The complaint about null as a bottom type and then saying haskell fixes it
isn't quite right. Haskell has an explicit value ⊥ that is part of every type.
This is very close to Null with regards to the complaints of OP.

The difference is that ⊥ can't exist in a Haskell program. The moment you get
it your program will either crash or loop forever. (The reason it is useful
has to do with lazy evaluation)

------
enriquto
> #include is not a great basis for a module system

And that is perfectly OK! Any "module system" sucks big time, creates more
problems that it solves, and should be avoided whatever the cost. Textual
includes are great but of course they should not be used for silly module
systems.

~~~
ikfmpwdsoz
What's wrong with module systems?

~~~
enriquto
In the context of C, nearly everything.

The C language is not designed for building huge programs by accretion of
modules. The idea is that you build many independent programs, and then you
glue them together using scripts.

~~~
coldtea
> _The C language is not designed for building huge programs by accretion of
> modules._

Can you give a single concrete example of a problem, because the above are all
noops semantically speaking...

> _The idea is that you build many independent programs, and then you glue
> them together using scripts._

This is a non-starter for most use cases outside pipeable shell commands
(which are not the only kind of programs people want to write).

People need, and write, and have written for decades, large programs in C, and
programs in C which have from 10s to 100s of headers files included (including
recursively from included libs).

~~~
dTal
People need it, yes. And they use C to do it. But C wasn't designed for it. C
was designed to write programs for Unix, which fit this description.

~~~
coldtea
Like the etymology of a word doesn't necessarily convey its meaning, the
"original intent" of something is often meaningless as to its actual practical
use.

The idea you mention is valid, and is part of the Unix philosophy.

But it was never the idea that C should be used JUST for that.

In fact the first use of C was to write a whole operating system.

