
x86-64 Assembly Language Programming with Ubuntu - lainon
http://www.egr.unlv.edu/~ed/x86.html
======
dsamarin
I am currently taking this class, and am happily surprised this made it here.

The book needs more work, but I still believe it's an great resource. For
example, on page 11 it says "Note that when the lower 32-bit eax portion of
the 64-bit rax register is set, the upper 32-bits are unaffected." In reality,
the high order bits are zeroed to avoid a data dependency. I'm going through
the entire book for typo hunting :-)

Also I found some issues while discussing Unicode, but the class only requires
use of the ASCII character set.

~~~
jcranmer
> For example, on page 11 it says "Note that when the lower 32-bit eax portion
> of the 64-bit rax register is set, the upper 32-bits are unaffected." In
> reality, the high order bits are zeroed to avoid a data dependency.

Well, there is one case where the upper 32-bits are not zeroed. It turns out
that xor eax, eax is assigned to opcode 0x90, which is better known to most
people as NOP.

If you want real fun, read up on what happens with AVX registers. Whether or
not you leave untouched or zero the upper bits are dependent on if you use VEX
encoding or not.

~~~
msla
> It turns out that xor eax, eax is assigned to opcode 0x90, which is better
> known to most people as NOP.

This can't be true, since xoring a register with itself zeroes that register,
and zeroing a register can't possibly be a general NOP instruction.

XOR also sets flags, another thing NOPs can't do.

~~~
simcop2387
That's because it's actually xchg eax, eax not xor eax, eax.

------
wybiral
I recently did a video series on x86 using nasm and GCC. It only covers 32-bit
but I think that's a better way to start since of the conventions are simpler
(especially when interfacing with C code).

[https://www.youtube.com/playlist?list=PLmxT2pVYo5LB5EzTPZGfF...](https://www.youtube.com/playlist?list=PLmxT2pVYo5LB5EzTPZGfFN0c2GDiSXgQe)

------
aruggirello
> It should be noted that Unicode uses 2 bytes for each character.

But you're programming "with Ubuntu", not Windows. IMHO you could safely
assume/recommend UTF-8.

~~~
Arkanosis
Just a reminder BTW that since version 2.0 (1996), Unicode is not an encoding
scheme but a character set (I avoid the confusing “charset” word on purpose).
Therefore, _Unicode_ does not use any number of bytes: it only assigns code
points to characters.

Windows used to use the UCS-2 encoding scheme which indeed used 2 bytes for
each character, but since Windows 2000, it uses UTF-16 instead, which like
UTF-8 uses a variable number of bytes per character.

~~~
dunpeal
Indeed. "Unicode" is an abstract character set, it doesn't "use" any bytes. A
specific encoding does.

------
simias
If you have a particular need for learning x86 assembly this is great, however
I want to point out that if you just want to learn an assembly for the sake of
learning very low level development and understanding how CPUs work at the
lowest software level I would not advise picking x86. It's a crufty, messy,
overcomplicated, plagued with decades of shifting paradigms in CPU ISA design
and still maintained for backward-compatibility's sake.

If you value your time and sanity consider learning something smaller and more
reasonable such as AVR assembly (the kind of controllers you find on
Arduinos). It's a lot smaller and you don't even need an OS, you can truly do
everything from scratch. If you want something a little more advanced ARM is
an obvious target, it's got all the features you'd expect from a modern CPU
(SIMD, floating point etc...) and it's not nearly as crazy as x86 assembly.

~~~
mehrdadn
Learning doesn't happen in a vacuum though. People generally try to learn
things that will be handy for them. AVR or ARM assembly are far less handy to
know than x86, so telling people to ditch x86 and learn those instead kind of
misses the point.

~~~
makapuf
Sure x86 is more ubiquitous (or is it really? Many arm cores and embedded AVR
are released embedded) but the number of times I've needed asm in x86 is way
less than I've had with those embedded platforms where a byte is a byte ...
(ditto for cycles) edit:typo

~~~
mehrdadn
> Sure x86 is more ubiquitous (or is it really? Many arm cores and embedded
> AVR are released embedded)

I can't make sense of this. Is your logic "there are more ARM CPUs than x86
CPUs => there are more programmers dealing with ARM assembly than with x86
assembly"?

> but the number of times I've needed asm in x86 is way less than I've had
> with those embedded platforms

Sure, this is your situation. But are you claiming your situation is typical?
In your mind do the majority of programmers who deal with assembly deal with
embedded platforms as much as you do?

~~~
wk_end
You're asking leading questions but they don't really...seem to lead anywhere.

You suggested x86 assembly was more useful to learn than ARM or AVR ("AVR or
ARM assembly are far less handy to know than x86"), but provided no
justification for that claim - and yet seem to be extremely demanding on
similar claims of others.

So what's _your_ situation? Are you claiming that it's typical?

The logic of: "desktop CPUs are rarely coded in assembly, embedded CPUs are
absolutely everywhere and often coded in assembly, the latter assembly
languages are more useful to know" is extremely obvious and straightforward. I
can't make sense of your opposition to it, especially since you've given
absolutely no substance to back up your contrarian position.

------
emily-c
These types of resources are awesome though I've always wished that they at
least briefly touched upon the x86 memory model/consistency. Understanding the
concept of memory barriers should be considered fundamental.

------
Areading314
Out of curiosity, what are the main reasons to need to actually write
assembler in 2018? Compilers? Games? Genuinely curious

~~~
wilsonnb3
Assembler is still popular for IBM mainframes. The current version has been
around since '92 and is called High Level Assembler.

It's popular partially because people have codebases that they started writing
in the 70's or 80's in assembler that they maintain to this day because it's
cheaper than switching it all over to a new language. Pretty much the same
reason that COBOL is still around.

z/OS (the OS that runs on IBM mainframes) also exposes a lot of it's
functionality through HLASM, so it's far more convenient to use than x86
assembly.

For whatever reason, C also never really caught on as ubiquitously as it did
in the PC world. Probably because IBM themselves generally used their
proprietary PL/S language instead back in the 70's and 80's.

[https://en.wikipedia.org/wiki/IBM_High_Level_Assembler](https://en.wikipedia.org/wiki/IBM_High_Level_Assembler)

~~~
__sdegutis
This is fascinating, I didn't know there was a such thing as a high level
assembly language, but IBM High Level Assembler has IF/ELSE/ENDIF, and several
types of built in loops. I wonder how similar it is to writing in C. One thing
this page doesn't mention is structured data types, I suppose these would
still have to be implicit like in other assembly languages.

~~~
todd8
I used to write assembly language programs back in the 70s while working on
process control computers (Texas Instruments 990, TI 980, TI 960 etc). At one
point I was using an assembler that supported complex macros (macros that
could be expanded into other macro definitions and supported counters and so
forth) so I developed a library of macros that supported nested it-then-else
and loops. They made the code a bit easier to read, but it was probably not
worth the trouble.

The problem with a high level assembly language is that it really isn't very
high level; your program still rests right on the hardware for a reason, and
usually that reason is a concern about using registers and instructions very
carefully for performance or interacting with hardware at ring 0 level where
you are managing the virtual memory page table or handling network device
interrupts or system IPC and so forth.

In my experience (as an IBM AIX kernel architect, virtual memory architect,
and distributed file system design), sometimes one needs assembly language,
but it was always a relief to get up to the level of C programming where the
programming teams were much more productive. Much OS development has been done
with C and it really was the best choice for most of the kernel work going on
back then in my opinion.

AIX was an interesting project. The hardware didn't exist in final form while
AIX was being developed. The challenge for our group was developing/porting a
whole OS, the kernel and user space code, that would run on hardware being
developed at the same time. IBM's language PL/1 was an important mainframe
language, but seemed a poor fit for systems programming. However, IBM had
state of the art compilers for it and a strong research interest in compilers
for RISC machines (like the POWER processors, the first of which outside of
IBM's research processors would run AIX 1); so they took the 80% of PL/1 that
seemed useful to systems programming and wrote a compiler for PL.8 (.8 of
PL/1) to run on the hypothetical RISC system my group was developing.

We were developing a Unix system on the RISC hardware, but we didn't have a
stable target (page table sizes, floating point hardware traps, etc.) and
couldn't afford to wait for the hardware before starting development. The
approach my group took was to write the lowest level parts of the kernel in
PL/.8 so that as the hardware changed the compiler could be tweaked to take
advantage of it easier than rewriting low level assembly language code. The
high-level parts of the kernel (coming from licensed Unix code) could then be
mated to the low level code and wouldn't be affected by the changes in the
hardware that happened over time.

I wasn't in charge of these decisions, so I don't really know enough about
them to say that this was better or worse than just using C and assembly
language as is normally done in most OS development, but I do see some of the
trade offs that had to be made.

An aside on higher level system programming languages, I know that some on HN
say that C is a terrible choice for OS development. Perhaps there are better
choices (now), but I see things a bit differently. At the time there were not
obvious choices that were better. We didn't have Rust or even C++. We had C,
Pascal, MODULA, PL/1, and a few other unlikely choices (e.g. ALGOL-68, LISP,
JOVIAL). C is a _big_ improvement over assembly language, but it isn't clear
to me that PASCAL or MODULA, or LISP or the others available back then were
better choices than C. Unix became a kind of proof of C's suitability as a OS
development language. Before that, PL/1 had been used to develop Multics, but
Multics failed as a commercial OS (despite it's subsequent influence on OS
design). C was simpler than PL/1\. Algol had been used by Burroughs, but it
was a non-standard version of Algol specially designed to work with the rather
novel hardware.

C is flawed but none of the other candidates for a language higher level than
assembly language for system programming was without flaws and they hadn't
produced something like Unix. The C used in the Unix kernel was the real K&R
C; it was the same language that ran on many platforms. Other attempts at a
high level systems programming language based on Lisp, Smalltalk, Pascal,
Algol, and IBM's proprietary subsets of PL/1 were all languages modified for
the hardware they ran on. C seemed to be just low enough to work for most of
the kernel's requirements without special extensions.

I always appreciate pjmlp's comments reminding HN readers about Pascal or
Modula. I liked those languages; I'm very familiar with them. I still think C
was the correct language for system programming in the past. Today, I'm more
interested in seeing what happens with Rust for kernel development and Go for
non-kernel systems programming.

~~~
pjmlp
Thanks very much for the insightful comment.

Also interesting to learn that PL.8 also had a shot at the Aix kernel. I got
all the PL.8 papers I could get my hands on.

Regarding UNIX and C's adoption, I think that had Bell Labs been allowed to go
commercial from day one with UNIX and history of C's adoption would have been
quite different.

------
fredsanford
For those interested...

Up until the early 2000's ish, Randall Hyde used to develop a ton of teaching
materials and libraries for intel.

You can find most of if here: [http://www.plantation-
productions.com/Webster/index.html](http://www.plantation-
productions.com/Webster/index.html)

------
pjmlp
Thankfully it uses Intel syntax.

------
new_age_garbage
I was skeptical but this looks like an amazing resource!

------
markhahn
Why Ubuntu? IE why is any of it distro-specific?

~~~
Keyframe
syscalls for OS maybe, but why Ubuntu... who knows

~~~
harry8
Likely because it's the most popular among target group of students at time of
writing. Choosing a distribution allows one to discuss complier, assembler,
gdb, text editor with known versions and known availability reducing admin
overhead so students can get on with it. It's not a terrible way to go
whatever one thinks of the most popular distribution at any given point in
time.

------
agumonkey
The TOC is very tempting. Thanks for the link

------
jmsmistral
Thanks for sharing!

------
lepasana
Thanks for this!

