
Understanding C by learning assembly - luu
https://www.recurse.com/blog/7-understanding-c-by-learning-assembly
======
haberman
There are two levels of deepening your understanding of C.

Level 1 is what this article describes: your eyes are opened when you see how
various features of C map naturally onto things that are simple/efficient in
hardware. You see a pointer dereference and in your head you think "aha! load
instruction." You understand that there is a real difference between "char
ch[10]" and "char ch[10] = {0}" \-- the latter emits extra instructions to
zero out the memory.

Level 2 is when you realize that the optimizer is free to screw with all your
intuitive ideas about C -> hardware mapping that you learned about at level 1.
You learn that writing _correct_ C means that you need to think in terms of
the abstract machine that C defines, and not assume that any specific C
statement will translate to the assembly code you expect. You can't think
"pointer dereference means load instruction", you have to think "pointer
deference reads an object."

There's nothing wrong with level 1, as long as you don't stop there!

~~~
nickpsecurity
Great point. It's also worth remembering that optimizations such as floating
point at O3 will trade away accuracy for speed. C compilers are willing to
give you... the wrong result... but faster than ever! Undefined behavior is
another fun line of inquiry. Best to learn about the dark corners and avoid
them where possible.

~~~
tspiteri
No, you don't get inaccurate floating-point results just by using a higher
level of optimization. For example, gcc requires an explicit -ffast-math
option for that kind of optimization.

------
akkartik
Assembly has one huge advantage over C: no undefined behavior. What you see is
what you get. There's none of the it-might-work-for-years-before-betraying-you
worry when you deal with assembly. Even if you need to think about multiple
machine types, you can think about the finite set of machine types you care
about, and not worry about some abstract ideal of portability that will cause
the C compiler to do strange things
([http://blog.regehr.org/archives/213](http://blog.regehr.org/archives/213)).

Unfortunately, this advantage means you can't understand _all_ of C just by
understanding assembly, even though C programs turn into just assembly at
bottom.

 _Edit_ : after the responses below I've been (re)reading
[https://www.scss.tcd.ie/John.Waldron/3d1/arm_arm.pdf](https://www.scss.tcd.ie/John.Waldron/3d1/arm_arm.pdf)
and
[http://www.intel.com/content/dam/www/public/us/en/documents/...](http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-
software-developer-instruction-set-reference-manual-325383.pdf). And
marvelling.

~~~
gecko
> Assembly has one huge advantage over C: no undefined behavior.

In the olden days, maybe, but nowadays? No way. As the most extreme example:
forget to use memory barriers when you should? Congratulations, the OOO nature
of modern day CPUs means your code may not execute identically on all
platforms. But even lesser things can impact you: instructions that were cheap
become expensive, or suddenly registers that you know you weren't supposed to
touch but did (and that happened to work) suddenly don't because a library got
changed, or half a dozen other things.

With assembly on a modern CPU, congratulations: you have eliminated one level
of abstraction. But there are still many, many turtles to go before you're
really all the way down.

~~~
ndesaulniers
> instructions that were cheap become expensive

Agree.

At which point, C gives you a huge advantage. Sure, you can hand write
assembly, but I bet the libc implementation you're linking against has
optimized to death with intimate knowledge of timing sequences and inner state
of the processor from the processor vendor. I would expect a company like
Intel to contribute patches to glibc (GNU libc) based on running functions
from the standard library on a very very expensive FPGA (or just expensive-
software simulation of hardware).

Memory ordering operations are an advanced topic; I have a copy of C++
Concurrency in Action, currently sitting on my bookshelf, that has a chapter
explaining the concepts. Should be a fascinating read!

~~~
pcwalton
Agreed. I think there's no clearer example of C giving you an advantage than
integer division by a constant. Even on architectures where it's supported
(x86), the divide instruction is really slow. The standard approach is to use
the Hacker's Delight technique to convert the division into a fixed-point
multiplication and shift. All production-quality compilers will do this for
you by generating the magic number, but if you're doing it yourself in
assembler you'd have to look up the magic number and shift. Lots of assembly
language programmers won't even bother doing this and will just write "idiv"
in x86 (or "bl __divsi3" on ARM), which slows down their programs.

------
kazinator
I don't think the article is aptly titled, which is too bad, because literally
"understanding C by learning assembly" is actually a terribly wrong-headed
idea.

The article is rather about exploring a particular C run-time at the assembly
language level, and a tutorial about stepping through programs at that level.

This is something useful that would-be developers need to learn, the main
reason being that programs sometimes fail in ways such that uncovering the
root cause requires a trip through that level. Other reasons are that in some
situations we need to _verify_ that programs believed to be correct at the
source level are actually being translated in the way we expect.

~~~
vmarsy
I agree with the wrong title for that article,

But, why do you think that understanding C by learning assembly is a terribly
wrong-headed idea?

Some professors tend to disagree and prefer a bottom up approach of learning.
The main advantage is : you get rid of the "magic" of computing.

[http://www.ncsu.edu/wcae/WCAE2/patt.pdf](http://www.ncsu.edu/wcae/WCAE2/patt.pdf)

[https://www.youtube.com/watch?v=J926i6MwRpo](https://www.youtube.com/watch?v=J926i6MwRpo)

~~~
0x09
The thing is, C does contain quite a bit of "magic" built into the
transformations a compiler is allowed to do to your constructs. A mental model
of C built on the mapping to assembly is appealing, but leads to false
intuition when reasoning about the language's rules, which occasionally has
disastrous effects.

Just as an anecdote, I have seen a surprising number of C programmers assume
that the language's type aliasing rules are related to or somehow governed by
pointer alignment, when these are entirely separate. The former exists to
facilitate analysis and optimization at the compiler level -- there is no
"ground" reason. It's important not to get trapped in the portable assembler
mindset and lose sight of an entire level of transformations that might not
have an intuitive relationship with hardware.

------
stcredzero
Alternatively: Understanding C by writing a compiler.

~~~
tptacek
Strong agree. Everyone should do this at least once in their career.

~~~
jwdunne
What resources would you recommend to get started with?

~~~
daj40
The "Dragon Book" is pretty solid and a widely accepted resource:
[http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniqu...](http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools)

~~~
wglb
One of the authors of the Rosalyn compiler has a different opinion:
[https://news.ycombinator.com/item?id=5149315](https://news.ycombinator.com/item?id=5149315).
And he gives some examples of much better books.

It used to be good, but it is not any longer thought of that way.

~~~
tptacek
I think you'll learn more from porting a simple C compiler than you will from
any compiler book, at least given a fixed amount of time to do either task in.

~~~
wglb
This book, [http://www.amazon.com/Compiler-Generator-Automatic-
Computati...](http://www.amazon.com/Compiler-Generator-Automatic-Computation-
McKeeman/dp/0131550772), which by today's standards is not a very good book,
got me started in my compiler career.

And I agree with your basic point that building one is the only to gain true
understanding.

------
nickpsecurity
I agree with kazinator: the title doesn't reflect the work. I bet I got
something that does: Write Great Code [1]. Hyde helps a programmer understand
how the compiler converts most imperative language features to assembler if
it's doing it in a direct, lightly optimized way. It often gives C code next
to assembler code along with optimizations you can do in assembler but not C.
He also wisely created a High Level Assembler that embeds 3GL-style constructs
and libraries in assembler code to help beginners learn it piece by piece
instead of all at once. Enjoy!

[1] [http://www.plantation-
productions.com/Webster/www.writegreat...](http://www.plantation-
productions.com/Webster/www.writegreatcode.com/index.html)

------
dmix
Another article by the same blog titled "Learning C with gdb" was posted 3
days ago with 24 comments:

[https://news.ycombinator.com/item?id=9560708](https://news.ycombinator.com/item?id=9560708)

------
chisleu
Learning NIOS ASM was great for learning how to code functional languages too.
When you see how much overhead is involved in languages like C for recursive
functions that have a simple algorithm, then you start to appreciate the power
of functional languages!

------
skibz
Did anyone encounter this error when trying to run gdb with the example
progams written in this article?

The error: "Unable to find Mach task port for process-id 13942: (os/kern)
failure (0x5). (please check gdb is codesigned - see taskgated(8))"

------
AndyKelley
One nitpick here, if you want a generator style function, you need a struct to
hold the state, not a static local variable. Otherwise you're limited to one
global function. I can't think of a valid use case for static local variables.

~~~
blt
they are nice if you want to switch the behavior of a function at runtime with
a debugger. I use them a lot when tweaking the constants in GUI layout code.
But I have them behind a macro that switches "static" to "const" in optimized
builds.

------
akhilcacharya
This is actually how my school's ECE department teaches C and lower level
hardware. Its quite nice, actually.

------
ChrisCinelli
I started with assembly when I was 10. Self taught C at the age of 12. I
learned that speed of development is more often the bottleneck than speed of
execution...

------
101914
Appreciating C by learning assembly

