
“C is how the computer works” is a dangerous mindset for C programmers - todsacerdoti
https://words.steveklabnik.com/c-is-how-the-computer-works-is-a-dangerous-mindset-for-c-programmers
======
oconnor663
My favorite take on the severity of the UB problem,
[https://blog.regehr.org/archives/1520](https://blog.regehr.org/archives/1520)
:

> Tools like [Valgrind] are exceptionally useful and they have helped us
> progress from a world where almost every nontrivial C and C++ program
> executed a continuous stream of UB to a world where quite a few important
> programs seem to be largely UB-free in their most common configurations and
> use cases...Be knowledgeable about what’s actually in the C and C++
> standards since these are what compiler writers are going by. Avoid
> repeating tired maxims like “C is a portable assembly language” and “trust
> the programmer.” Unfortunately, C and C++ are mostly taught the old way, as
> if programming in them isn’t like walking in a minefield.

~~~
shadowgovt
> Be knowledgeable about what’s actually in the C and C++ standards

... turns out to be an extremely tall order; the 2017-11-17 working draft of
the C++ standard is 1,448 pages in PDF format.

At some point, I wonder when programmers who are required to care about
correctness throw up their hands and say "This language isn't reasonably
human-sized for a user to know they're using it correctly." At which point the
immediate next question should be "Why don't you use something else?"

(In my experience, the only sane counter-answer to that last question is "I'm
programming on an architecture where the {C/C++} compiler is the only one with
any real expressive power that anyone has ported to this physical system."
This is an answer I understand; most other answers will raise an eyebrow from
me).

~~~
z92
> most other answers will raise an eyebrow from me

Portability. A C library can be trivially linked with any other language. But
one will be hard pressed using a python library form Ruby, for example.

~~~
geofft
This is literally the reason I most recently wrote some C, but - do you really
need _linking_? You can pretty easily use a Python library from Ruby by
writing a little loop in Python that accepts JSON input and produces JSON
output, and calling that as a Ruby subprocess.

It's fairly rare that you actually need to be in the same process. (The thing
I wrote was a wrapper for unshare(), so it did strictly need to be in-process,
but that's an unusual use case.) Even if you want to share large amounts of
data between your process and the library, you can often use things like
memory-mapped files.

And there are some benefits from splitting up the languages, including the
ability to let each language's runtime handle parallelism (finding out that
the C library you want to use isn't actually thread-safe is no fun!), letting
each language do exception-handling on its own, increased testability, etc.

~~~
evil-olive
> You can pretty easily use a Python library from Ruby by writing a little
> loop in Python that accepts JSON input and produces JSON output, and calling
> that as a Ruby subprocess.

If you're a Ruby application, sure, you can use a Python library by forking
off a subprocess.

Just make sure to document the installation requirements - in addition to
having Ruby and the correct gems installed, you also need Python (which
version?) and the correct pip libraries installed.

If you're a Ruby _library_ , instead of an application, forking off a Python
process is a nonstarter, unless you want to propagate those requirements out
to every single application using your library.

~~~
geofft
Sure, but there's an equivalent problem in the C world - go back in time a
couple years before we figured out how to do precompiled binary-compatible C
extensions in Python and try running 'pip install numpy'.

I'm mostly thinking that if you already have a Python library you really like,
chances are you already have a Python environment capable of running it. (The
environment doesn't have to be related to your Ruby environment at all! If you
want to get Ruby through your OS and Python through a Docker container, or
whatever, that should work fine.)

------
jcranmer
One of the blog posts I keep meaning to write is in the vein of "Why C is not
portable assembly." Ironically, most of my points about C failures aren't
related to UB at all, but rather the fact that C's internal model doesn't
comport well to important details about machines:

* There's no distinction between registers and memory in C. A function parameter that is "register volatile _Atomic int" is completely legal and also makes absolutely no sense if you want C to be a "thin" abstraction over assembly.

* There's no "bitcast" operator that changes the type of a value without affecting the value. The most common workaround involves going through memory and praying the compiler will optimize that away (while violating strict aliasing semantics to boot).

* No support for multiple return values (in registers). There's structs, but that means returning via memory because, well, see point 1.

* Missing machine state in the form of vector support (which is where the lack of the bitcast becomes particularly annoying). The floating-point environment is also commonly poorly supported.

* Missing operations such as popcount, cttz, simultaneous div/rem, or rotates.

* Traps are UB instead of having more predictable semantics.

* No support for various kinds of coroutines. setjmp/longjmp is the limit. You can't write zero-cost exception handling in C code, for example. Even computed-goto (a source-level equivalent of jump to a location held in a register) has no C representation, though it is present in some compiler extensions.

~~~
jeffdavis
"There's no "bitcast" operator that changes the type of a value without
affecting the value."

Technically, you are supposed to use unions for that, though it's a pain. The
PostgreSQL codebase is compile with -fno-strict-aliasing so that a simple cast
will get the job done, but obviously that's technically out of spec.

~~~
jcranmer
Actually, the guideline I've heard is that memcpy should be used, since memcpy
has the magic property of copying the bytes without affecting the destination
type. Unions are only legal in C99 and newer (although C99 erroneously
includes this in its list of undefined behavior--this was fixed in C11); C89
and any version of C++ don't permit this behavior.

~~~
aidenn0
I keep hearing this rumor, but C99 did not erroneously include unions-as-type-
punning in a list of undefined behavior. The normative language is unclear,
but clarifications were added1][2].

1: [http://www.open-
std.org/jtc1/sc22/wg14/www/docs/dr_257.htm](http://www.open-
std.org/jtc1/sc22/wg14/www/docs/dr_257.htm)

2: [http://www.open-
std.org/jtc1/sc22/wg14/www/docs/dr_283.htm](http://www.open-
std.org/jtc1/sc22/wg14/www/docs/dr_283.htm)

~~~
quietbritishjim
The parent comment seems to be saying that C99 _does_ allow type punning with
a union, with an explicit clarification as you say. In C89 it was ambiguous
and in C++ it is explicitly disallowed (you are supposed to use memcpy).

~~~
quietbritishjim
Sorry, I misread the parent comment. Now it's too late to delete mine, please
ignore.

------
overgard
To be fair, even assembly language isn't how the computer works (gets
translated into micro code). Not to mention other integral components like the
GPU that are coded in an entirely different model.

I think what's fair to say is: the "abstract C machine" tends to have a
minimal amount of concepts on top of the machines instruction set, compared to
other languages, and generally provides the least friction if you need to poke
a memory address or layout a structure in a very specific way. It's just
easier to say it's closer to the metal because that's a mouthful.

~~~
dooglius
Not really, assembly language is extremely close to microcode, much much
closer than it is to C. C has a whole bunch of baggage like pointer
provenance, inability to read uninitialized memory, doesn't understand cache
aliasing, etc that don't fit hardware well.

~~~
radarsat1
> pointer provenance, inability to read uninitialized memory

what do you mean by these two? not sure about the former. as for the latter,
of course you can read uninitialized memory, unless you mean something else?

~~~
chrisseaton
Can you write a well-defined C program that reads uninitialised memory?

~~~
saagarjha
Reads? I think so; I believe the bad part is when you try to use it.

~~~
chrisseaton
No, it is undefined behaviour. Check the spec.

~~~
saagarjha
I couldn't find the verbiage for that. The standard seems to have this under
undefined behavior:

> The value of an object with automatic storage duration is used while it is
> indeterminate.

But reading the value without using it seems fine?

~~~
chrisseaton
The 'object' in this context is the storage location, not the value you read
from it. You're 'using' the 'object' when you read from it.

~~~
saagarjha
Oh, not that kind of read. I meant "read" in the context of "without actually
using its value in a way that affects your program execution"–not, in what I
believe the C++ term is–an ODR use. As in here:
[https://news.ycombinator.com/item?id=22740582](https://news.ycombinator.com/item?id=22740582)

------
spicymaki
Wait until they learn that machine code is not exactly how the computer works
either. The hardware is doing things to your code you might not expect.

~~~
ThrowawayR2
Agreed, The example the author uses in his previous post on the topic talks
about cache-unaware code but it's perfectly possible to write cache-unaware
code in machine language as well.

I'd say "C is not how the computer works ... but it's much, much closer than
nearly every other language."

~~~
pdimitar
I feel your statement is actually much closer to the danger. C is pretty close
to the hardware indeed and that misleads people into thinking that C is
actually _exactly_ how the hardware works. It's a very easy trap to fall into
and I've seen many colleagues do just that.

~~~
dpc_pw
M............................................C

M..............................................Java

Look. C is much closer to how the machine (M) works.

~~~
scottLobster
Give that Java is written in C++ which is in turn an extended version of C,
I'm not sure that chart is entirely to scale.

~~~
TomMarius
That is no argument in this discussion, IMHO - what does it change? The
discussion is about the language (and in case of Java, the runtime) and it's
machine abstraction, which is nearly the same for Java and C, even if JVM was
written in Lisp.

------
asveikau
I think the biggest point where C is more "how the computer works" vs. a lot
of languages is when you write an allocator.

I've noticed in, say, an interview context, that people without a lot of C
experience have a lot of trouble reasoning about how an allocator might work,
or where memory comes from, or even that some constructs in their preferred
language result in memory allocation. With C you can't hide these details.

Yes there are abstractions beneath you, both in software (your OS has set up
the MMU) and hardware (cache, out of order execution) and also in the C
standard itself (assumptions about pointer types, the precise meaning of the
stack). This doesn't take away from the point that reasoning about allocations
is more of a thing.

------
bluetomcat
"Real hardware" is also an abstract machine. Until you get to the level of
each individual transistor, it's hardware abstractions all the way down.
Caches, reorder buffers, load and store buffers, instruction decoding are all
abstract concepts sitting at a much higher level than the individual
transistor.

Those abstractions are there mostly for the purpose of speeding up code
execution, but also for enabling hardware designers to think about the machine
in terms of real-world concepts and not just as an incomprehensible mind-
blowing collection of a few billion transistors.

~~~
SuperscalarMeme
And I bet you physicists argue that transistors are just abstractions of
semiconductors

~~~
j9461701
Which is, in turn, merely an abstraction over the actual quantum physics that
governs what's really happening on the silicon. And quantum physics might
itself one day may be found to be a higher level abstraction of what's
_really_ really going on, as happened with Newtonian physics before it.

It's abstractions all the way down. I don't know why they're so maligned by
programmers, they're the only way any work gets done.

~~~
diydsp
heh and it's even worse than _that_ , because near the quantum level, we're
mainly finding reflections of our own psychology, psyche and societal norms.

This is why saying sth is "not how the machine works" is a fallacy of black
and white thinking and not useful unless one is ready to be very specific abt
how it resembles the machine and doesn't.

wrt C, the smartest thing to do is use it wisely and avoid dragons, the same
way we e.g. use chemistry to make useful products without performing
operations in unstable environments. And surround it with an appropriate layer
of cultural expectations - keeping humans in the loop. You don't just do what
the computer tells you, you use it to help you make judgments, but ultimately
take responsibility yourself.

One can see this in Rust's efforts to not only develop a language, but a
conscious culture around it. This will be successful until, like a 51% attack,
it is outmaneuvered.

Anyway...

------
antirez
Actually I look forward for enough momentum to be created in the community, to
allow new C standards to change the way undefined behaviors are conceived, and
make them as specified as possible, and even when they cannot in reasonable
unique ways, to provide reference possible behaviors to select in order to
have the least unexpected outcome for the programmer. Especially now that low
level programming is abstracting away, exactly but not only because of Rust
itself, we need a C that is more C than what it became in recent years. After
all we don't really need to change the language, but just the new
specifications and the compilers implementations of UB.

~~~
rumanator
> allow new C standards to change the way undefined behaviors are conceived,

Of all the potential issues that might be attributed to C, undefined behavior
(UB) is the one that creates few to no problems at all.

To me, complaining about UB is like complaining about the quality of a highway
once they intentionally break through the guard rails and start racing through
the middle of the woods.

The reason why some behavior is intentionally left undefined is to enable
implementations to do whatever they find best in a given platform. This is not
a problem, it's a solution to whole classes of problems.

[https://en.wikipedia.org/wiki/Undefined_behavior](https://en.wikipedia.org/wiki/Undefined_behavior)

~~~
rhinoceraptor
This is like saying there's nothing dangerous about guns, because it takes a
human to make them dangerous.

C is not unsafe if you only write safe code. But humans are fallible, and
"just don't write unsafe code" is not a solution.

~~~
rumanator
> This is like saying there's nothing dangerous about guns, because it takes a
> human to make them dangerous.

If you want to go with that analogy, UB is a kin of you intentionally pointing
your gun to your foot, taking your time to aim it precisely on your foot, and
in spite of all the possible warnings and error messages that are shouted at
you... You still decide that yes, what you want to do is to shoot yourself in
the foot because that is exactly what you want to achieve.

And then complain about the consequences.

~~~
masklinn
> If you want to go with that analogy, UB is a kin of you intentionally
> pointing your gun to your foot, taking your time to aim it precisely on your
> foot, and in spite of all the possible warnings and error messages that are
> shouted at you…

Yeaaaah not really:

    
    
        #include <stdio.h>
        #include <stdlib.h>
        #include <string.h>
    
        static char a[128];
        static char b[128];
    
        int main(int argc, char *argv[]) {
          (void)argc;
          strcpy(a + atoi(argv[1]), "owned.");
          printf("%s\n", b);
          return 0;
        }
    

compiles with no warning using clang (8), even with -Weverything

And that’s a super easy case, it’s not a use after a dynamic callback
somewhere freed a pointer it did not own, or a function 5 levels deep (or
worse in a dll) didn’t check its input pointer which only became occasionally
null years later.

~~~
saagarjha
And many (myself included) think that the compiler _should not_ warn about
this code, even if it can potentially exhibit undefined behavior, because it
would make it impossible to write code at all without warnings. (A good static
analyzer or code reviewer should call it out, though.)

~~~
masklinn
That's completely fair. I'm not criticising the compiler not complaining about
this code, I'm providing a counter-point to rumanator's assertion that:

> UB is a kin of you intentionally pointing your gun to your foot, taking your
> time to aim it precisely on your foot, and in spite of all the possible
> warnings and error messages that are shouted at you…

Because they could hardly be more wrong, the average UB is completely
invisible to static analysers because it's a dynamic condition which the type
system is not smart enough to move to static. That's why it's an UB and not,
say, a compile error.

------
fit2rule
I've been coding since C was a new language, and surfed multiple computer
revolutions in all and sundry phases, and for personal interests have recently
returned to the 80's era for a broader attack at the 8-bit platforms of the
time.

This has required a fair bit of assembly language for old machines with a much
more minimal profit motive - in fact, non-existent - just so that I can keep
my machines alive - which is the reason I'm digging into things as a bit of a
challenge to myself.

There is so much great code to read, when you can disassemble. And, even when
you can't.

I have a veritable fleet of old systems, many of which are still getting new
titles released at astonishing levels of proficiency, yearly now .. CPC6128,
ZX (Next!), C64, Atmos, BBC, etc., all have new tools being written for them,
including C compilers.

One thing I note is that the tools are everything, because all tools get
'degraded' by new hardware releases. Hardware isn't driven by the compiler
designer - the compiler designer is nothing until the hardware guy prints
something.

Returning to older computers has meant understanding their architecture, and
surprisingly enough after 30 years of mainstream programming, going back to
this world is rewarding. There is a lot of joy in writing in C for the 6502,
or even Z80, in real mode.

But, always disassemble.

And, in each case, its quite easy to disassemble - get an understanding of
registers and memory usage and code inefficiencies, and so on.

------
hyperpallium
Is the world ready for _lower-level languages_? e.g. cache-aware. (assembly
isn't)

In a way, GPU shader languages "fit" parallel GPU architecture (though not
cache-aware).

Maybe... limited loop code-length to fit in cache; No pointer chasing (though
you can workaround anything in a TM).

Some java subsets for very limited hardware might be instances.

~~~
Impossible
GPU languages have some cache aware constraints, especially if you're working
with the graphics pipeline vs compute where you have more constrained inputs
and outputs. Shared memory is also often used as a user managed cache.

Cell SPUs had explicit user managed cache and host DMA, and Intel ISPC and
Unity Burst compiler/ECS (variants of C and C#) are good examples of
programming environments that try to more explicitly encourage parallelism
and/or optimal cache utilization.

~~~
gpderetta
> Cell SPUs had explicit user managed cache and host DMA

just a nitpick, but explicitly user managed caches aren't. They are really
better known as scratchpads. A cache is supposed to be mostly transparent
except for the performance implications.

------
diath
People simply have to come to terms that modern computers are nothing like
their beloved PDP-11.

------
SloopJon
There was an ACM article with a similar sentiment recently: "C Is Not a Low-
level Language - Your computer is not a fast PDP-11."

[https://queue.acm.org/detail.cfm?id=3212479](https://queue.acm.org/detail.cfm?id=3212479)

HN discussion:

[https://news.ycombinator.com/item?id=16967675](https://news.ycombinator.com/item?id=16967675)

------
SuperscalarMeme
I don't think anyone suggests using C as an alternative for microarchitecture
classes. C still remains one of the best ways to access hardware _relative_ to
other software languages. No language is "how computers work" unless you're
writing Verilog, but C is the closest to the metal relative to other software
paradigms while still being more convenient than assembly.

------
btilly
A good way to think of it is that C is how computers worked 40 years ago.

A lot has changed since then, but computers put up a lot of effort to pretend
that they still work similarly, and most other programming languages have put
up a lot of effort to figure out how to make things work for a "C computer".

~~~
JdeBP
40 years ago was the home computer revolution. A _lot_ of computers did _not_
work like that. They had things like segmentation, zero pages, workspace
register files, shadow registers, BCD arithmetic, and software interrupts. A
few years later they had things like heterogeneous multiple processors, PAD,
and trap/interrupt gates.

The big iron of the time (and a few years later) had some things that did not
work like C, too, such as single-level storage addressing, tagged
architectures, and capability-based addressing.

Thinking that the C language is a representative model for computer history,
or indeed for computer architecture, is a _bad_ way of thinking of this.
During the 1980s and 1990s there was a _lot_ of shoehorning all of that
architectural stuff into C compilers, with extra keywords and the like. (e.g.
__interrupt, _Far, _Huge, __based, _Decimal, _Packed, digitsof, precisionof,
and so forth)

~~~
btilly
Well, the original C was pretty much exactly how a PDP-11 worked. And the
PDP-11 was a source of design inspiration for the original Intel x86 and
Motorola 68000 chips.

So there were computer chips sold in the 1970s that worked a lot like C
thinks. You are right that there were also ones that didn't.

Going forward from close to 1980 you added more and more features to all
architectures that diverge.

------
juped
Do C programmers really think this? I hear it a lot, but only from non-C
programmers who hold C in awe. I think they may have heard it from a different
cohort of C programmers, though.

------
xkriva11
Some time ago, I saw an article that described a very crazy hypothetical
architecture that still fits the C standard and uses the undefined behaviors
in various insane ways. It was named like "Computer platform from hell" or so.
Unfortunately, I was not able to find it again. Does anyone know about
something like that?

I think that the hegemony of C for low-level programming is harming us
significantly because it puts the other models to obscurity. Try to write in C
for something like F21 chip or J–Machine...

~~~
zlynx
The Death Station 9000, designed by the Top Men from highly rated government
contractors, built to meet and _exceed_ all mandated specification documents.

Undefined behavior or use of anything not strictly included in POSIX.1c may
result in the launch of nuclear weapons.

------
nwmcsween
A test for this is to code some relatively but not too simple of a program in
both C and Rust using idiomatic code (don't knowingly pick some corner case
for either language), compile with -O0 -g0 and be able to _understand_ the
resulting output with regards to the program written. I'm not championing the
use of either Rust or C and in my opinion languages should be for people not
for computers.

------
MivLives
Interesting this is less of blog post and more of an announcement that a blog
post will not be written.

~~~
steveklabnik
Yes, I did not expect this to get posted here, for this reason. The first post
got a ton of attention, the second one not so much.

The self-quote for the thesis for this article comes from a conversation with
tptacek in the HN thread from the first post.

~~~
jstrong
I consider myself a Steve Klabnik fan and even I was surprised to see this at
the top of HN haha. Just goes to show you can never really predict how these
things will go.

~~~
steveklabnik
Hehe :)

I mean, I guess you could argue that it's produced an interesting discussion.
There's actually a pretty interesting question here: is the purpose of a site
like this to distribute interesting news, or foster interesting discussion? I
personally feel like it's more the former, but maybe the latter matters too.

------
strategarius
Is someone thinking so? Really? THIS is how computer works:
[https://software.intel.com/en-us/articles/intel-
sdm](https://software.intel.com/en-us/articles/intel-sdm)

I remember we started our University course of low-level programming from
writing in machine codes. No jokes, took binary representation of assembly
instructions from that manual and wrote in HEX editor. Then we disassembled
simple C applications, understanding unfolding our high-level instructions
into x86 assembly. Passing parameters through registres, through stack, two
models of stack restoration (yeah, cdecl and stdcall, that moment I realized
what does it mean), and other stuff you won't be ever bothered to think about
writing in C/C++. And only after that started writing in actual Assembly. THIS
is knowledge how computer works. If you know only C, you know nothing.

------
wooby
"C is JQuery for the computer."

\-- Micha Niskin

------
AndrewOMartin
Which is why God(bolt) gave us Compiler Explorer.

------
bjourne
I know this article has been up a few times already on HN. What I think Steve
is missing is that all modern computer hardware is designed around _how C
works_! 40 years ago you might have had hardware in which there was a real
mismatch between how it worked and C worked.

Not so much these days. How C works and how x86-64 assembly works is very
similar. Of course, that is not the whole story and you could argue that how
x86-64 assembly works is not how the "computer" works because it is a high-
level abstraction of the CPU... But then what does it mean to talk about how
the computer works?

------
rs23296008n1
Which language, other than assembly, most accurately represents how modern
computers work?

What high language could I use that gives me most performance because it
accurately matches the underlying platform?

------
gens
C _is_ a high level language.

C _is_ close to how a computer works.

Assembly is the next step down, and is not nearly as hard as the internet
would lead you to believe.

Step down from that; read a book like "Computer Architecture: A Quantitative
Approach", and learn about cpu-s and memory (also Agner Fog's site, official
amd and intel doc, many other).

Down from that is electronics (logic gates, 8bit adder, etc., down to
transistors and capacitors).

And beyond is physics.

Everything else is either computing theory or buzzwordy bullshit.

To repeat: C is a high level language, no matter what anyone says.

------
Diesel555
Assembly is how the computer works. Wait no - Binary is how the computer
works. Wait no, NAND gates and ALUs are how the computer works. Every level of
abstraction is just that. A layer of abstraction. You can pick any layer, then
go down and say "learn that".

------
dirtydroog
While the presence of UB is annoying, but necessary, the real issue is that
compiler writers use it as a means of performing optimisations instead of
warnings. So you end up with a really efficient, but broken, program.

------
zelly
Reposting this classic, "C Is Not a Low Level Language"

[https://queue.acm.org/detail.cfm?id=3212479](https://queue.acm.org/detail.cfm?id=3212479)

------
Tomte
C has never been "how the computer works", its abstract model is way too…
well, abstract.

But there is an interesting catch to it: because C is so important, and
programmers at large know so little "real C", vendors do their damndest to
make it work for them.

A well-known microcontroller forum in Germany is constantly pushing that you
need to make your code safe in the case of concurrent access to a variable by
a main loop and an interrupt. By making it volatile.

Strictly speaking that's plain wrong. You need some kind of mutex.

Practically speaking, no embedded C compiler vendor would dare use an
"optimizing opportunity" and surprise his customers there.

~~~
scottLobster
"Concurrent access" isn't actually an accurate description of what's going on.
Microcontrollers are generally single-core, single-threaded devices. The most
common pipeline goes

main thread -> interrupt processing -> resume main thread.

If you're modifying a variable in the interrupt you should declare it volatile
so as to avoid the compiler optimizing it and introducing unexpected behavior.
Adding a mutex would just be worthless overhead.

~~~
Tomte
The C language until very recently didn't know about threads or other kinds of
concurrency. So the compiler was technically free to assume that the interrupt
is never called. It's nowhere in the call tree that originates at main().

Of course, it didn't do that. Because vendors imparted their compilers with
additional knowledge about common programming patterns and uses _that aren 't
standard C_.

~~~
scottLobster
So there are vendors out there that alter the behavior of volatile to act like
a mutex in multi-threaded environments but don't tell anyone? Ugh.

Well, TI doesn't do that thankfully. At least not for the MSP430.

~~~
gpderetta
for a while Microsoft gave volatile acquire/release semantics, which implied a
barrier on non-x86 cpus. They recognized it was a mistake backtracked on that
around the time they started their effort to run Windows on ARM.

------
alekratz
Someone told me back in high school that "C is just shorthand for assembly",
which I think about a lot. Does anyone else feel that way?

~~~
ThrowawayR2
> " _Someone told me back in high school that 'C is just shorthand for
> assembly'_"

That was true for the PDP-11 that the article series mentions and it was true
for the early home computers of the '70s and '80s but it became increasingly
less true as processors became more and more complicated under the hood.
That's kind of the point of the article series but it's not as big a deal as
the author makes it out to be because it was already people writing in C were
aware of.

~~~
6keZbCECT2uB
As other people have pointed out, rather than C being more unlike assembly,
processors are more unlike assembly. As such, the abstractions of C map pretty
well to a theoretical assembly, even if optimization changes that precise
mapping. A possible exception to this is the multicore semantics that are
apart of each architecture aren't particularly well represented.

------
proninyaroslav
Does it mean that Rust is more situable for the study of low-level
programming?

------
calebm
It's always dangerous to mistake the abstraction for the underlying Reality.

------
GnarfGnarf
UB = Undefined Behaviour

------
ken
Should I learn Verilog to understand “how the computer works”?

~~~
krupan
In short, no. Verilog is very nearly a general purpose programming language,
with some syntactic sugar for doing userspace/green/lightweight threads (that
it confusingly calls processes). It is used to model digital hardware, but the
language itself doesn't really give you any insight into how to construct
those models. Creating the model is still up to you.

------
0xdeadbeefbabe
Equally dangerous as swimming with piranhas. Programming is so exciting if not
self indulgent.

Edit: Oh I get it--your manager, who doesn't believe in leaky abstraction or
UB, will beat you if the program fails.

------
killmov3s
UB is bullshit so this is all wrong.

------
xllao
What is the output of Klabnik? Has he written anything substantial or does he
just evangelize?

~~~
myrrlyn
[https://eleroth.files.wordpress.com/2012/01/hadvar.png](https://eleroth.files.wordpress.com/2012/01/hadvar.png)

------
csours
There's the computer, and then there's the computer, and then there's the
computer and then there's the computer:

The computer is the thing that has the web browser running on it.

The computer is the thing that has the code editor running on it.

The computer is the thing that has the compiler running on it.

The computer is the thing that provides a virtual hardware interface for
running programs.

The computer is the thing that the virtual hardware interface interacts with.

The computer is the thing inside the hard drive/graphics card/other component
that the computer talks to.

The computer is the thing inside the processor that pre-processes, jits, or
otherwise transforms commands from the computer for the computer so the
computer ... ... ...

[https://xkcd.com/722/](https://xkcd.com/722/)

~~~
fctorial
There's the c1(physical computer), and then there's the c2(os runtime), and
then there's the c3(silicon) and then there's the c4(turing machine):

The c1 is the thing that has the web browser running on it.

The c1 is the thing that has the code editor running on it.

The c1 is the thing that has the compiler running on it.

The c2 is the thing that provides a virtual hardware interface for running
programs.

The c3 is the thing that the virtual hardware interface interacts with.

The c3 is the thing inside the hard drive/graphics card/other component that
the c2 talks to.

The c4 is the thing inside the processor that pre-processes, jits, or
otherwise transforms commands from the c2 for the c3 so the c1 ... ... ...

Is this correct?

~~~
csours
Sure, but you can also run non-trivial code in the web browser.

I wasn't being super-precise with my comment, and my point is that there is no
one thing that "is the computer". There are many computers in your computer,
and many definitions of what a computer is.

I've never had to go below the HAL, or inside the JIT, but to some people
that's where "the" computer really is. If that's where the computer really is,
then I'm not a computer programmer (HINT: I'm not, but I do write code
sometimes and get paid for it).

So if I write code, but it doesn't tell the computer what to do, what am I
even doing?

Well, I write code that gets consumed by pre-processors and lexers and parsers
and compilers and linkers and build agents and I don't know how the soup
works, and you probably don't either, you may know more or less of the of the
alpha-bits in the soup, but it's still soup. The soup is turned into jello and
fed to a virtual machine, which is a fake computer, but it works well enough.

\---

If I were to be more precise, I would say that there's a von Neumann machine
C0 that is represented by some hardware abstraction layer C1 by a operating
system kernel. There may be a virtual machine C2 running in the C1 context.

But, inside that C0 von Neumann machine there are a bunch of microprocessors
that may be called computers that may have Turing complete instruction sets.

\---

Or you could say, anything that can be Turing complete is a computer, in which
case your web browser is a computer (among many others).

~~~
shadowgovt
The trick is to expand the definition of "tell the computer what to do."

You can tell the computer what to do by writing a program that passes through
six levels of abstraction before the fate of any electrons is affected by what
you wrote.

You can also tell the computer what to do by pushing a lot of keys on a
keyboard. We're lucky; our keyboards have like a hundred keys on them. Some of
our predecessors had eight switches and a ninth key or switch to "ACCEPT" the
current switch-bank state.

You can also tell the computer what to do by causing patterns to be stored on
magnetic or optical media and read back later.

You can also tell the computer what to do by plugging a wire into the back of
it and varying voltages on that wire using another machine a hundred miles
away.

Part of the art of programming is knowing that there are no bright, solid
lines between these ways to tell a computer what to do (but for a given task,
some are clearly more applicable than others ;) ).

------
dirtydroog
Rust this, Rust that. Tiresome.

~~~
sgift
What do you mean? The post doesn't contain anything about rust.

------
xwdv
This may be a little off topic but it is something that until now I’ve never
questioned in my entire career: Is the c in C supposed to stand for
“computer“?

If not, what is it?

~~~
CDSlice
According to Wikipedia, C is named C because it resulted from work done by
Dennis Ritchie to improve the language B, and C comes after B. B came from Ken
Thompson making a cut down version of the language BCPL, so he just kept the
first letter.

[https://en.wikipedia.org/wiki/C_(programming_language)](https://en.wikipedia.org/wiki/C_\(programming_language\))

~~~
slfnflctd
It almost implies that at the time, they realized a plethora of languages
would emerge but were hoping everyone would stick with their alphabetic
convention. Instead, someone skipped straight to S, someone else stepped back
to R, and then everyone said screw it and started making up their own funny
names & acronyms.

I've read some intriguing things about D, though (which came along more
recently, interestingly enough). Apparently it's in production at several
large companies.

~~~
JdeBP
That creation myth rather falters on the fact that a fair number of languages
had already emerged. This was the 1970s, not the 1950s. No-one expected
language namers to follow some universal alphabetic convention beginning with
BCPL.

------
lidHanteyk
It's far past time to put C to bed. D, Go, Java, Zig, etc. are all languages
that one can use to do approximately C-like things without as much danger. I
am tired of seeing a literal Twitter feed of memory unsafety security bugs
[0]. We can do better.

[0] [https://twitter.com/LazyFishBarrel](https://twitter.com/LazyFishBarrel)

~~~
adrianN
Good luck rewriting all the programs that are written in C without introducing
new security problems.

~~~
lidHanteyk
Feel free to peruse the Twitter feed that I linked; if most bugs are due to
memory corruption, then literally rewriting in place may still be a net
negative in terms of total bug count! This is the main argument of Zig [0];
they offer technologies to incrementally rewrite C projects in Zig without
losing compatibility. Their compiler can compile C code [1] just like GCC or
Clang can.

[0] [https://ziglang.org/](https://ziglang.org/)

[1] [https://andrewkelley.me/post/zig-cc-powerful-drop-in-
replace...](https://andrewkelley.me/post/zig-cc-powerful-drop-in-replacement-
gcc-clang.html)

------
sys_64738
C is like water. Atomically it is the DNA of all other programming languages.
There's a reason why C is generally the fastest language with smaller
binaries, and there's a reason why language like Python fallback to C for
doing anything processor intensive. C will still be here when the latest fad
languages of today are long forgotten.

~~~
myrrlyn
Not a word of what you just wrote is true.

~~~
sys_64738
I think you know better.

~~~
myrrlyn
than to write something like "Atomically [C] is the DNA of all other
programming languages"? I do, yes

------
pizlonator
The problem with the “C is an abstract machine” nonsense is that it’s just not
how it works. I get that it’s what the spec says. But C compilers are super
careful to compile structured assembly style code without breaking it, even
though they don’t formally promise to do so. That’s because lots of large C
code based assume structured assembly semantics and probably always will.

~~~
jcranmer
> But C compilers are super careful to compile structured assembly style code
> without breaking it, even though they don’t formally promise to do so.

You have never worked on a C compiler, I take it. All production compilers
will at some point turn even straight-line code into a DAG and relinearize it
without care to the original structure of code. Any semantics beyond that
which is given in the C specification (which are purely dynamic, mind you) are
not considered for preservation. In particular, static intuitions about
relative positions (or even the number!) of call statements to if statements
or other control flow are completely and totally wrong and unreliable.

This is the main reason why people keep harping on the "C is an abstract
machine." When people think of machine code, there is an assumption about the
set of semantics that are relevant to the machine code, and that set is often
much, much broader than the set of semantics actually preserved by the C
specification.

~~~
pizlonator
I work on compilers for a living.

Structured assembly semantics are what C users in a lot of domains expect and
production compilers like clang at -O3 will obey the programmer in all of the
cases that are necessary to make that code work:

\- Aliasing rules even under strict aliasing give a must-alias escape hatch to
allow arbitrary pointer casts.

\- Pointer math is treated as integer math. Llvm internally considers typed
pointer math to be equivalent to casting to int and doing math that way.

\- Effects are treated with super high levels of conservatism. A C compiler is
way more conservative about the possible outcomes of a function call or a
memory store than a JS JIT for example.

\- Lots of other stuff.

It doesn’t matter that the compiler turns the program into a DAG or any other
representation. Structured assembly semantics are obeyed because of the
combination of conservative constraints that the compiler places on itself.

But bottom line: lots of C code expects structured assembly and gets it.
WebKit’s JavaScript engine, any malloc implementation, and probably most (if
not all) kernels are examples. That code isn’t wrong. It works in optimizing C
compilers. The only thing wrong is the spec.

~~~
saagarjha
> I work on compilers for a living.

…as I take it, for languages without undefined behavior.

> Structured assembly semantics are what C users in a lot of domains expect
> and production compilers like clang at -O3 will obey the programmer in all
> of the cases that are necessary to make that code work

No, they will not, and the people working on Clang will tell you that they
won't.

> Llvm internally considers typed pointer math to be equivalent to casting to
> int and doing math that way.

 _LLVM IR is not C_ , but I am fairly sure that it is still illegal to access
memory that is outside of the bounds of an object even in IR.

> A C compiler is way more conservative about the possible outcomes of a
> function call or a memory store than a JS JIT for example.

Well, yes, because JavaScript JITs own the entire ABI and have full insight
into both sides of the function call.

> That code isn’t wrong. It works in optimizing C compilers. The only thing
> wrong is the spec.

The code is wrong, and that it works today doesn't mean it won't work
tomorrow. If think that all numbers ending in "3" are prime because you've
only looked at "3", "13" and "23", would you say that "math is wrong" because
it tells you that this isn't true in general?

~~~
pizlonator
There is a broad pattern of software that uses undefined behavior that gets
compiled exactly as the authors of that software want. That kind of code isn’t
going anywhere.

You’re kind of glossing over the fact that for the compiler to perform an
optimization that is correct under C semantics but not under structured
assembly semantics is rare because under both laws you have to assume that
memory accessed have profound effects (stores have super weak may alias rules)
and calls clobber everything. Put those things together and it’s unusual that
a programmer expecting proper structured assembly behavior from their
illegally (according to spec) computed pointer would get anything other than
correct structured assembly behavior. Like, except for obvious cases, the C
compiler has no clue what a pointer points at. That’s great news for
professional programmers who don’t have time for bullshit about abstract
machines and just need to write structured assembly. You can do it because
either way C has semantics that are not very amenable to analysis of the state
of the heap.

Partly it’s also because of the optimizations went further, they would break
too much code. So it’s just not going to happen.

~~~
saagarjha
You say that this doesn't happen, and yet we have patches like this in
JavaScriptCore:
[https://trac.webkit.org/changeset/195906/webkit](https://trac.webkit.org/changeset/195906/webkit).
Pointers are hard to reason about, but 1. undefined behavior extends to a lot
of things that aren't pointers and 2. compilers keep getting better at this.
For example, it used to be that you could "hide" code inside a function and
the compiler would have no idea what you were doing, but today's compilers
inline aggressively and are better at finding this sort of thing. And it isn't
just WebKit: other large projects have struggled with this as well. The
compiler discarded a NULL pointer check in the Linux kernel (which I can't
find a good link to, so I'll let Chris Lattner paraphrase the issue for me):
[http://blog.llvm.org/2011/05/what-every-c-programmer-
should-...](http://blog.llvm.org/2011/05/what-every-c-programmer-should-
know_14.html) ; here's one where it crippled sanitization of return addresses
in NaCl:
[https://bugs.chromium.org/p/nativeclient/issues/detail?id=24...](https://bugs.chromium.org/p/nativeclient/issues/detail?id=245)

