Hacker News new | comments | show | ask | jobs | submit login

> I'd say learning assembly by starting with x86 is indeed a bit crazy.

I've heard this said before. Having had basic exposure to both MIPS and x86 assembly in school, I'm not really clear on WHY people say this. Assembly seems equally unforgiving and obnoxious either way.

My personal reason is because X86 just handles memory segmentation in an awkward manner.

Assembly is unforgiving, but some processors are gentler on the programmer than others. If I was trying to teach someone the basics of register indexing and indirect indexing, I'd rather do it on a 6809 than an x86.

You don't have to care about segmentation if your program is less than 64K (fits in a single segment), and 64K is far more than enough for doing a lot of interesting things in Asm. A basic "Hello world" that runs in DOS is less than 20 bytes.

(Contrast this with using a C compiler on Windows, where "Hello world" compiled with default settings is usually beyond 10KB already.)

Starting with the 8086, it has 14 registers. Four of which are actually eight more registers (AX is AH and AL; BX is BH and BL; CX is CH and CL; DX is DH and DL). Each of these registers has a special purpose (BX can be used to index memory, the rest can't; CX is used in some instructions as a count; AX is your accumulator, and DX is ... just there). There are four more special purpose index registers, SI, DI, BP and SP. SI and DI support post-index and post-decrement indexing mode, but you need to set a flag in another register (CF register) to indicate you want the post-increment mode OR the post-decrement mode. SP is the stack register, but you can't use it as an index register (you can starting with the 80386). You can use the BP register as an index register, but it always takes an offset (there is no special mode for a 0 index).

Then you get to the segment registers. As if things weren't bad enough. There are four, CS to point to code, DS and ES for data, and SS for the stack. You see, the 8086 can address 1MB of memory (20 bits), but it's a 16bit instruction set. So the segment registers contain not the upper 4-bits of the address, but the physical address divided by 16 (or, shifted right four bits). A physical address is formed by SEGMENTx16+OFFSET, giving you 20 bits of address. Most instructions and addressing modes use DS, except if you use BP, which defaults to SS, and the store-post-increment/decrement instructions, which must use ES:DI (DI is incremented---ES doesn't). You can override the segment register for most instructions, but not all (the string instructions, which give you the post-increment/decrement addressing modes are the exception).

And because of these segment registers, you can have 16 or 32 bit function pointers, 16 or 32 bit data pointers, and 16 or 32 bit stack pointers.

So what you have is an instruction set that is almost, but not entirely consistently inconsistent. The 80186 gives you a few more instructions. The 80286 adds protected mode (with four levels of protection) with a change of how the segment registers work (in protected mode---they're now indexes into a table of physical addresses for each segment, each of which can only be 64K in size), and the 80386, which extends all the registers to 32 bits in size (except for the segment registers---they're still only 16 bits long), adds new registers only available in the highest protection ring, and paged memory, in addition to keeping the segmented memory, plus allowing all the registers to more general purpose, in addition to keeping the old instruction set.

I'm really hard pressed to come up with a more convoluted architecture than the x86.

> I'm really hard pressed to come up with a more convoluted architecture than the x86.

At the risk of coming off a little more contrary than I intend: seriously, who cares? joezydeco was talking about teaching an assembly language with "training wheels." 8086 assembly without any doodads works for that. It's useless for normal people but no more useless than 6502, and the student will have development tools that are actually friendly to play with.

The nice thing about learning x86 assembly is that embodies the entire learning curve. If you want to inflict all the horrors of assembly language on yourself - a pointless pursuit for anyone not writing a compiler while simultaneously eating paste, but hey, whatever - x86 lets you start small and work your way up.

I'm just not clear on who needs elegant assembly language programming (outside of compiler writers, etc.). Doing it at all represents failure. The point of an assembly language class is to be very quick and make the student appreciate what C does for them, IMO.

These days, you are right, there isn't much use for assembly, other than compiler writers and the occasional kernel code. And most of what I described was the 8086, warts and all. By comparison, MIPS is relatively straightforward, but I see by your attitude towards assembly in general that you'd rather spend no time with assembly.

Fair enough, you can certainly go your entire career without resorting to assembly language (heck, the last time I got paid for programming in assembly was the early to mid 90s---I'm still amused by the "C is too low level for programming" arguments these days, when back in the early 90s, people were bitching about C being too high level and inefficient).

Thanks. Yeah, the pain points I remember from learning x86 asm have to do with screwing up dealing with both 16-bit and 32-bit ints and pointers in the same program. Those were probably pretty contrived assignments, and you're right, most of the x86 bizarreness is there even if you're just doing 8086 stuff.

I just don't know if it's wrong to jump in the deep end with x86. Obviously the argument can be made either way, and x86 is ugly, but I'm pretty prejudiced toward learning something that will potentially be useful right off the bat. That would make me lean towards x86 or ARM for teaching or for self study, I think, and I have no idea what the ARM tools look like right now.

One of my more masochistic long-term goals is to make a compiler (I've made an assembler of sorts) so I might be revisiting this question at some point. Joy.

Besides, I was answering the question about why learning the x86 was considered crazy, because compared to other CPUs, it is crazy (not that you care about that).

I've been working with x86 for a long time and it's not as bad as it seems once you see the patterns. The 16-bit memory addressing modes can basically be summed up as "one or more of {BP,BX}, {SI,DI}, {disp8,disp16}", and the fact that [BP] is actually [BP+0] is just an encoding detail; the assembler will handle it for you. What would be [BP] is where the [disp16] mode goes.

You don't have to worry about segmentation unless you're working with more than 64K of code or data, which is plenty when you're writing in Asm. And when you do, it's not hard to learn the few extra rules that come with that.

Protected mode and the 32-bit extensions are two separate (but related) topics, you don't need to learn the former to use the latter, and you can already do a lot without knowing either. Unfortunately the 64-bit extensions have gone in a very different direction, and you will need protected mode to use them. But for learning the basics of Asm, I don't think you need 64 bits.

The 8080/8085/Z80 are a little simpler, but learning them coudl be useful if studying x86 since they're its ancestors.

It's no harder (and in many ways easier) than learning the irregularities of English or any other language. One side-effect of learning x86 is that all the RISCs then become really boring and straightforward. x86 has character. :)

I got used to it. The 8086 was the second CPU I learned assembly for (the first was the 6909). While I never did like it as much as the 6809 or 68000, it still had its charms (and I liked it more than SPARC assembly).

> I liked it more than SPARC assembly

Now that's interesting. What is the deal there?

Not him, but I'm guessing it's because SPARC looks a lot like the GAS/AT&T Intel syntax -- lots of 'noise' in the form of % and $ everywhere. It also has branch delay slots and not-so-intuitive operand ordering, register windows, and all the other RISC features that look like they discourage using Asm, as a big part of the philosophy was "use a compiler".

Is x86 so convoluted because with each new iteration (8086 -> 80186 -> 80286), Intel tried to maintain backward compatibility?

Yep, quite so. There is a lot of legacy cruft. There are even bugs in prior versions which have now become features[1]!

[1]: https://en.wikipedia.org/wiki/A20_line

Sure the Z80 was simple, then the 68000 on the Amiga was nicer, but at home I only had x86 as 16/32 bit processors.

Eventually one learned to live with those quirks, the same way young developers nowadays learn the C quirks, JavaScript/PHP WTF and so on.

Sadly the best technology is not the one that usually wins mass adoption.

Thank you for taking the time to write that up. Great overview.

A nice tidy orthogonal instruction set is a joy to work with. The x86 ISA unfortunately isn't one of those by virtue of it's history in the marketplace.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact