The book needs more work, but I still believe it's an great resource. For example, on page 11 it says "Note that when the lower 32-bit eax portion of the 64-bit rax register is set, the upper 32-bits are unaffected." In reality, the high order bits are zeroed to avoid a data dependency. I'm going through the entire book for typo hunting :-)
Also I found some issues while discussing Unicode, but the class only requires use of the ASCII character set.
That sounds like wishful thinking, and indeed would be expected by someone familiar with the 16-32 transition of the 386 (modifying AX doesn't change the upper 16 bits of EAX.) Instead of getting even more 32-bit registers, or even 64-bit registers accessible in 32-bit mode, AMD64 gave us a weird not-quite-fully-64-bit extension.
I've heard the "partial register stall" excuse multiple times, ostensibly valid but only if you insist on thinking in "partial registers" instead of simply more 32-bit ones as input. For example, some variants of the divide instruction use EDX:EAX (or RDX:RAX) for its input.
That would mean you have to double the amount of state you track. The hardware cost of doing this is ~= the cost of doubling the amount of 64-bit registers. The amount of transistors used for storing register data is negligible compared to the cost of "metadata" and handling around them. Why not just have more registers then?
"Just allow us to partially update the upper halves of the registers" is the sort of thing someone who understands software but not hardware would ask. It's 99% of the cost of just having twice the registers, but not nearly as useful, and it would introduce a lot of potential performance pitfalls. (Any instruction that might update only partially now has to wait for all the previous results on the register.)
Of the times I've used Asm, there's been far more situations where an extra 32-bit register would be more useful than a 64-bit one, and having them combine automatically into high and low halves is more useful than you think. Tight loops with non-parallelisable bit/byte manipulations of this sort occur quite often in things like data compression and emulation.
Any instruction that might update only partially now has to wait for all the previous results on the register
Does it? Once again, you seem to be thinking in "partial registers" rather than "just another one" --- and I argue that this conceptual difference is very important. E.g. you can work with both AL and AH independently, then use them together as AX --- at which point, yes, the processor will need to wait for the results from both, but then it can combine them implicitly without having to waste time and space decoding and executing the instructions to do it.
You need a 64 bit register any time you want to store a pointer though unless you want to use some kind of a segmented memory model. I don't think anybody wants to go back to that although I'm not one to criticize weird fetishes.
Clearly when you look at the fine details of AMD64 it looks like a weird frankenstein monster of an instruction set. REX prefix holding the MSBs of registers since 32bit opcodes only allowed three bits to encode 8 registers. Same prefix used to set the width of memory target operands, except that some instructions default to 32bits while others default to 64. R12 has weird encoding quircks because it matches RSP in the "low" register set and that register has specific semantics when used as a base register...
I wonder how much die area is used on a modern CPU just to deal with all this cruft and translate it into a saner RISC instruction set internally.
It's easy to say "think of them as separate registers" just as it's easy to say "think of them as a partial registers" - but the ISA definition is such that they have to appear as partial registers in the scenarios where it would be visible.
So sure, you could make hardware that would rename and treat them as different registers (at it has been done on some x86 versions), but then when you read a wider portion you need to combine them, which won't be free (yes it happens "implicitly" but that doesn't somehow make it much easier for the hardware).
There are also plenty of cases where you want zero-extension for functional reasons, especially in compiled code where things like casts to a larger size become free. Cleverly using both halves of a register and using the implicit combination into a full 64-bit value is much rarer that just wanting to store a 32-bit value and sometimes wanting to use it as a zero-extended 64-bit value.
Can you elaborate on the last behavior, that " ah today works very differently from al"? How do they differ? I hadn't heard this before.
That's only true when compiling for x86 (32 bits). When targeting x86-64 (the subject of the book), long is 64 bits in gcc and g++.
EDIT: I found out he is providing extra credit for textbook errors.
1.3 Why Learn Assembly Language
1.3.1 Gain a Better Understanding of Architecture Issues
1.3.1 Understanding the Tool Chain
1.3.1 Improves Understanding of Functions/Procedures
Well, there is one case where the upper 32-bits are not zeroed. It turns out that xor eax, eax is assigned to opcode 0x90, which is better known to most people as NOP.
If you want real fun, read up on what happens with AVX registers. Whether or not you leave untouched or zero the upper bits are dependent on if you use VEX encoding or not.
`xor eax, eax` actually generates 0x31 0xc0, and `xor rax, rax` generates 0x48 0x31 0xc0. 0x90 decodes to xchg eax, eax in all modes except long mode, which has no effect. In long mode, the opcode 0x90 has no effect still but is no longer equal to xchg eax, eax.
22.214.171.124 NOP Idioms
NOP instruction is often used for padding or alignment purposes. The Goldmont and later microarchitecture has hardware support for NOP handling by marking the NOP as completed without allocating it into the reservation station. This saves execution resources and bandwidth. Retirement resource is still needed for the eliminated NOP.
This can't be true, since xoring a register with itself zeroes that register, and zeroing a register can't possibly be a general NOP instruction.
XOR also sets flags, another thing NOPs can't do.
But you're programming "with Ubuntu", not Windows. IMHO you could safely assume/recommend UTF-8.
Windows used to use the UCS-2 encoding scheme which indeed used 2 bytes for each character, but since Windows 2000, it uses UTF-16 instead, which like UTF-8 uses a variable number of bytes per character.
If you value your time and sanity consider learning something smaller and more reasonable such as AVR assembly (the kind of controllers you find on Arduinos). It's a lot smaller and you don't even need an OS, you can truly do everything from scratch. If you want something a little more advanced ARM is an obvious target, it's got all the features you'd expect from a modern CPU (SIMD, floating point etc...) and it's not nearly as crazy as x86 assembly.
I can't make sense of this. Is your logic "there are more ARM CPUs than x86 CPUs => there are more programmers dealing with ARM assembly than with x86 assembly"?
> but the number of times I've needed asm in x86 is way less than I've had with those embedded platforms
Sure, this is your situation. But are you claiming your situation is typical? In your mind do the majority of programmers who deal with assembly deal with embedded platforms as much as you do?
You suggested x86 assembly was more useful to learn than ARM or AVR ("AVR or ARM assembly are far less handy to know than x86"), but provided no justification for that claim - and yet seem to be extremely demanding on similar claims of others.
So what's your situation? Are you claiming that it's typical?
The logic of: "desktop CPUs are rarely coded in assembly, embedded CPUs are absolutely everywhere and often coded in assembly, the latter assembly languages are more useful to know" is extremely obvious and straightforward. I can't make sense of your opposition to it, especially since you've given absolutely no substance to back up your contrarian position.
2. Optimization: Some critial code sections or cryptographic operations can be massively sped up with hand-crafted assembly.
3. Obfuscation: Sometimes, you have to drop to assembly for obfuscation. This is especially the case if you need to/want to fool the Hex-Rays Decompiler. Applications include malware and digital rights management (some might argue the latter is just a sub-species of malware). This also includes writing your own custom assembly language for a custom virtual ISA for the purpose of code obfuscation.
4. Embedded platforms: Some very obscure microcontrollers may only have support for pure assembly, though at least C89 compilers being available seems to be the norm.
5. Education: It can be very enlightening to understand how things work below your "home" layer of abstraction. Some parts of C seem to be much easier to understand once you have a firm grasp on a common assembly language.
6. Compilers: Self-evident, you kind of have to understand the code you emit.
 See e.g. this benchmark https://monocypher.org/speed -- authenticated encryption on libsodium gets an improvement of over 300% from hand-crafted assembly over the portable C implementation
Outside of system code, the main use for assembly is trying to maximize performance on hot loops. Your optimized matrix-multiply routine, or media decoding kernel, could well be written in assembly. I've seen a few cases where people do things such as manual calling conventions in assembly as well.
"ontogeny recapitulates phylogeny"
1.3.1 Gain a Better Understanding of Architecture Issues
1.3.1 Understanding the Tool Chain
1.3.1 Improve Algorithm Development Skills
1.3.1 Improves Understanding of Functions/Procedures
1.3.1 Gain an Understanding of I/O Buffering
1.3.1 Understand Compiler Scope
1.3.1 Introduction to Multi-processing Concepts
1.3.1 Introduction to Interrupt Processing Concepts
The book uses the 1.3.1 heading for each and I'm too lazy to change them.
Reasons to (maybe/arguably) write assembly:
1) You're bringing up a new board, your bootloader is partially written but you need some customization for the real-time OS you're using. It can be advantageous to do this in assembly
2) You're dealing with some particularly old hardware and (ab)using it for some commercial purpose
Of course, I can imagine that for each you'll have someone obstinately state there's no need to use assembly because of some gcc feature. More than one way to get things done, and most use the tools they're comfortable with.
Additionally, knowing assembly can help a lot when debugging weird crashes in code written in higher-level languages. gdb's asm layout is an awesome resource if you know how to use it, but someone who's never used assembly before will probably not even consider using it.
See eg (from 2013, but at a glance seem to outline the ideas well):
For Julia - it's fairly easy to work "high up" most of the time - and drop down inspecting the code (and profiling) - and unlike many other languages, even high pref libraries will often be julia all the way down (unlike tcl/perl/ruby/python + c/fortran etc).
Similar for sbcl (common lisp) or various languages that target/use llvm. And obviously for looking at output of optimizing compilers for c, c++, rust, d and similar "medium to high" level languages (Pascal, Ada, crystal, etc).
It's popular partially because people have codebases that they started writing in the 70's or 80's in assembler that they maintain to this day because it's cheaper than switching it all over to a new language. Pretty much the same reason that COBOL is still around.
z/OS (the OS that runs on IBM mainframes) also exposes a lot of it's functionality through HLASM, so it's far more convenient to use than x86 assembly.
For whatever reason, C also never really caught on as ubiquitously as it did in the PC world. Probably because IBM themselves generally used their proprietary PL/S language instead back in the 70's and 80's.
Naturally it didn't caught on mainframes, already using better system languages.
It is also barely used on Unisys ClearPath MCP.
The problem with a high level assembly language is that it really isn't very high level; your program still rests right on the hardware for a reason, and usually that reason is a concern about using registers and instructions very carefully for performance or interacting with hardware at ring 0 level where you are managing the virtual memory page table or handling network device interrupts or system IPC and so forth.
In my experience (as an IBM AIX kernel architect, virtual memory architect, and distributed file system design), sometimes one needs assembly language, but it was always a relief to get up to the level of C programming where the programming teams were much more productive. Much OS development has been done with C and it really was the best choice for most of the kernel work going on back then in my opinion.
AIX was an interesting project. The hardware didn't exist in final form while AIX was being developed. The challenge for our group was developing/porting a whole OS, the kernel and user space code, that would run on hardware being developed at the same time. IBM's language PL/1 was an important mainframe language, but seemed a poor fit for systems programming. However, IBM had state of the art compilers for it and a strong research interest in compilers for RISC machines (like the POWER processors, the first of which outside of IBM's research processors would run AIX 1); so they took the 80% of PL/1 that seemed useful to systems programming and wrote a compiler for PL.8 (.8 of PL/1) to run on the hypothetical RISC system my group was developing.
We were developing a Unix system on the RISC hardware, but we didn't have a stable target (page table sizes, floating point hardware traps, etc.) and couldn't afford to wait for the hardware before starting development. The approach my group took was to write the lowest level parts of the kernel in PL/.8 so that as the hardware changed the compiler could be tweaked to take advantage of it easier than rewriting low level assembly language code. The high-level parts of the kernel (coming from licensed Unix code) could then be mated to the low level code and wouldn't be affected by the changes in the hardware that happened over time.
I wasn't in charge of these decisions, so I don't really know enough about them to say that this was better or worse than just using C and assembly language as is normally done in most OS development, but I do see some of the trade offs that had to be made.
An aside on higher level system programming languages, I know that some on HN say that C is a terrible choice for OS development. Perhaps there are better choices (now), but I see things a bit differently. At the time there were not obvious choices that were better. We didn't have Rust or even C++. We had C, Pascal, MODULA, PL/1, and a few other unlikely choices (e.g. ALGOL-68, LISP, JOVIAL). C is a big improvement over assembly language, but it isn't clear to me that PASCAL or MODULA, or LISP or the others available back then were better choices than C. Unix became a kind of proof of C's suitability as a OS development language. Before that, PL/1 had been used to develop Multics, but Multics failed as a commercial OS (despite it's subsequent influence on OS design). C was simpler than PL/1. Algol had been used by Burroughs, but it was a non-standard version of Algol specially designed to work with the rather novel hardware.
C is flawed but none of the other candidates for a language higher level than assembly language for system programming was without flaws and they hadn't produced something like Unix. The C used in the Unix kernel was the real K&R C; it was the same language that ran on many platforms. Other attempts at a high level systems programming language based on Lisp, Smalltalk, Pascal, Algol, and IBM's proprietary subsets of PL/1 were all languages modified for the hardware they ran on. C seemed to be just low enough to work for most of the kernel's requirements without special extensions.
I always appreciate pjmlp's comments reminding HN readers about Pascal or Modula. I liked those languages; I'm very familiar with them. I still think C was the correct language for system programming in the past. Today, I'm more interested in seeing what happens with Rust for kernel development and Go for non-kernel systems programming.
Also interesting to learn that PL.8 also had a shot at the Aix kernel. I got all the PL.8 papers I could get my hands on.
Regarding UNIX and C's adoption, I think that had Bell Labs been allowed to go commercial from day one with UNIX and history of C's adoption would have been quite different.
If you're curious what a simple program ends up looking like, I've got one I wrote that copies the contents of one file into another file up on GitLab. Lots of loading registers and what not.
MS-DOS and Amiga assemblers (TASM, MASM, DevPac) had quite powerful macro capabilities, many of which gas still doesn't support.
Here is the documentation for MASM high level constructs, it is still distributed with Visual Studio.
Basically: somebody has to provide the infrastructure between assembly and high level languages, so that everyone else can write in HLL.
Optimization of games or HPC workloads is also a valid use-case, but today you probably want to use intrinsics in C instead.
Burroughs, Xerox PARC workstations, ETHZ workstations as a couple of examples.
You can't really understand programming without having built a compiler. Or at the very least having done assembly programming. I don't think you can truly understand any basic programming concept like variable, pointer, reference, etc without learning compilers or assembly. Yes, a pointer "points" to something. But what does that really mean? When you realize that it's all just higher level concepts of specific assembly mechanisms of accessing data, you'll have a eureka moment. Don't see how you do that without digging into compilers/assembly.
And ultimately, it'll make you a better overall programmer since compilers/assembly is fundamental to every programming language.
This is just elitism with a little bit of "back in my day" thrown in.
I can assure you that I fully understood the concepts of pointers, variables, and references without knowing anything about assembly language or compilers. The concepts are not that difficult to grasp.
That wasn't my intention and I apologize if it came off that way. Also, I'm not that old and I certainly didn't start programming with assembly. My programming experience began with OOPs ( Java, C#, C++ ). Didn't really touch assembly until I went to college. Also, compilers is a basic core course you have to take to get a CS degree. How is something everyone has to learn elitist?
> I can assure you that I fully understood the concepts of pointers, variables, and references without knowing anything about assembly language or compilers. The concepts are not that difficult to grasp.
The concepts were fairly difficult for me. Especially pointers. I thought I understood it but then later on realized I didn't. Maybe other people have their eureka moments sooner. For me it happened when I built a compiler and did some assembly programming.
I just think knowing how things work under the hood will make you are better programmer. Especially something like assembly since all programming languages gets translated to assembly whether you are using C# or Haskell or Lisp or ML. I wasn't trying to offend anyone. But I'm sorry if others were offended by my comment. That was not my intention.
I'm not sure if this is true. It wasn't required for my degree.
I wasn't offended by your comment, though. I just disagree with you. No hard feelings about that.
Anyways, I agree that knowing how things work under the hood will make you a better programmer. I just disagree with how you phrased that sentiment in your original comment.
> Also, compilers is a basic core course you have to take to get a CS degree. How is something everyone has to learn elitist?
Not everyone is lucky enough to get a CS degree.
Some people come from backgrounds where CS was not in their worldview growing up. Or they were discouraged from pursuing CS despite the subject being interesting to them. Or they tried taking CS, but they had to deal with external factors that prevented them from continuing.
To discount these individuals from making meaningful contributions to a programming endeavor is short-sighted. Their particular experience may allow them to be better at implementing the correct solution, even if they are not as strong of a programmer. Or they may be a very fast learner, and given more time and mentorship, they will learn about compilers, but they can be strong contributors until then.
Knowing assembly also helps out when doing capture the flag challenges, specifically reverse engineering and binary exploitation.
OpenSSL, for example, has a fair amount of ASM in it for various processors to speed things up.
See https://software.intel.com/en-us/articles/improving-openssl-... for some examples of what they specifically did for x86/64.
But other than highly opimized graphics manipulation I can't really think of any other good application for assembly.
Up until the early 2000's ish, Randall Hyde used to develop a ton of teaching materials and libraries for intel.
You can find most of if here: http://www.plantation-productions.com/Webster/index.html