I cannot agree that the simplicity of the 6502 wins over the 68000.
The 68000 has more registers and wider data types: but the registers are all uniformly the same. It's really just two registers A and D, copied a bunch of times in to D0 to D7, and A0 to A7. Whatever you can do with D0 can be done with D3 and so on. Some A's are dedicated, like for a stack pointer.
Simplicity of structure has to be balanced with simplicity of programming.
It's not that easy to program the 6502, because in moderately complex programs, you're constantly running into its limitations.
The best way to learn how to hack around the limitations of a tiny machine is to completely ignore them and become a seasoned software engineer. A seasoned engineer will read the instruction set manual in one setting, and other pertinent documents, and the clever hacks will start brewing in their head.
You don't have to start with tiny systems to get this skill. When you're new, you don't have the maturity for that; get something that's a bit easier to program with more addressing modes and easier handling of larger arrays, more registers, wider integers.
I come from a similar but different angle. Have been at 6502, 68k, z80, and some x86/64 over past few decades (mostly demo). I also wonder if this article/post was written by AI - he argues for 6502 that it has 6 registers, and thus it's simple. ANYONE who ever touched 6502 for more than a week will know you're pretty much dealing with three, and that's where the nickname of a sempahore CPU comes from (three registers).
6502 is fun and neat, especially if you want to program those machines. If you want more modern approach, it's not really all that suitable. Do what we did in the 90's then and start with MIPS if you want to follow the path of glory days or just start out with Neon. I'd even argue Z80 with all its registers and complexities is more similar to what you'll find today (far from it, but let's entertain the connection between past and present as OP does).
There's nothing inherently complicated about modern asm. Take FASM, start. It's not complicated, but it gets complex rather quickly which was and will be always true for such low level approach. Then you start macroing and sooner or later you have your own poor man's C on top.
I think one of the values 6502 provides is that it has so many retro platforms. Apple ][, NES, Atari 2600, BBC, C64, they all use 6502. If you want to do something cool retro stuffs that a lot of people today still enjoy, that's the best bet.
But again Z80 looks like a very good option too because it has GB/GBC plus ZX Spectrum. The same goes to 68K -- Mac 68K, Sega Genesis and Neogeo are popular hits.
When I was writing my reply to another question (https://news.ycombinator.com/item?id=42891200), I was thinking about a new software engineering education system which starts from a retro platform and goes up from there. Maybe all three can fit the bill. I'm very much against the "elegant/pure" way of teaching.
BTW totally agree 6502 essentially has 3 registers. You don't get to touch the others.
If you want to do something cool retro stuffs that a lot of people today still enjoy, that's the best bet.
This shouldn't be discounted, since it has great pedagogical value in itself. On the other hand, personal opinion here, is that not many things are transferable to modern ISAs. Even back in the 90's, we were shown to work on MIPS instead. 68k is very nice to read and write, almost feels like higher language, but overall ROI for newer stuff is maybe just jump into Neon, x86_64, or as someone said RV directly. It's not _THAT_ hard. It gets complex as programs grow though.
Yeah maybe jumping directly into x86-64 is not a bad idea. e.g. for Windows just have to understand the calling conventions and syscalls and such, to push some windows.
I need to look into my SDL2/C++ game's assembly code and see how it is done.
I get what you're saying but I wouldn't personally call zeropage bytes _registers_, even though they are physically connected to internal ones. I treat them as a special zone of memory that has special properties. There are many of these things and just like you would start with any of the old machines you'd first take a look at the memory map to see where's what and how to use it. Registers in a traditional sense would be those few mentioned of which you'd only ever really touch A, X, and Y.
One reason to consider them equivalent to registers is that the cycle time for zero page and register-based instructions is the same (2-3 cycles). The full 16-bit instructions ("absolute") take at least 1 extra cycle and often 2 if crossing a page.
Of course there is the Atari 2600 where you only have 128 of those bytes with RAM (80-FF) and the stack is set to page 0 instead of 1 (not that you'll use the stack a lot in an Atari 2600 program).
you're right, I meant $0000 and $0001 which _are_ registers. The rest 254, yeah. Zeropage is a must on C64 at least. I don't know much about other platforms.
So 0 and 1 are connected to the 6510 I/O data direction and I/O data register, but there is still RAM there too. Seems the 6510 won't issue write signals to the bus for addresses 0 and 1, but the VIC can see them and it's readable through sprite collisions.
> The best way to learn how to hack around the limitations of a tiny machine is to completely ignore them and become a seasoned software engineer
Learning to hack around the limitations of a tiny machine is a good step toward becoming a seasoned software (and possibly hardware) engineer!
The advantage of tiny systems is that you can understand them completely, from silicon to OS to software.
The 6809 is simpler than the 68K, and more powerful and orthogonal than the 6502, but there isn't the same software base for it. I think Motorola was onto something with the 6809 and 68K instruction sets and programmer-facing architecture. IIRC PDP-11/VAX and NS32K are similarly orthogonal.
The 6809 is the Chrysler Cordoba of 8 bit microprocessors, the personal sized luxury processor with styling in timeless good taste, rich Corinthian leather, and an efficient sophisticated multiply instruction.
I learned assembly on a 6809 (TRS-80 CoCo) platform. It was only later that I really appreciated how cool of a CPU it really was.
It’s a shame that Tandy missed the boat on including coprocessors for game support in their computers, especially that one. If they’d just included decent audio and maybe something for sprite management it would’ve been highly competitive.
Yeah, once OS-9 came out we got some decent game ports too. That’s where I discovered Epyx Rogue! It was very late in the lifespan of the system though.
C64/128 was what I was thinking of more than anything re 8-bit competition, keeping in mind I’m talking mid-late 80s by this point. I do also remember Atari 800 (and later) doing considerably better than you imply. But you’re right, Apple captured the early-mid 80s gaming market nicely.
6502 is a better "toy" asm than the Z80, but that's not saying much. It's not even obviously better than the AVR 8-bit insn set. As far as more modern platforms, I think there is a strong case for teaching RISC-V over something like MC68k. RISC-V can be very simple and elegant (in the base integer insn set) while still being very similar to other modern architectures like ARM, Aarch-64 and MIPS. It's also available in both 32 and 64-bit varieties, and the official documentation is very accessible.
MC68k just has tons of quirks which aren't really going to be relevant in the modern day and age. (About the only thing it has going for it is the huge variety of hardware platforms that happened to make use of it, many of which still have thriving retro communities around them. But that's more of a curiosity as opposed to something genuinely relevant.)
Some people believe that RISC-V is simple and elegant, other people believe that RISC-V is one of the worst and ugliest instruction-set architectures that have ever been conceived. Usually the first class of people consists of people with little experience in assembly programming and the second consists of those who had experience in using multiple ISAs for assembly programming.
Using RISC-V for executing programs written in high-level programming languages is fine. It should be noted that in the research papers that have coined the name RISC, where there was a list of properties defining what RISC means, one of them was that RISC ISAs were intended to be programmed exclusively in high-level languages and never in assembly language. From this point of view, RISC-V has certainly followed the original definition of the term RISC, by making the programming in assembly language difficult and error prone.
On the other hand, learning assembly programming with RISC-V is a great handicap because it does not expose the programmer to some of the most fundamental features of programming in an assembly language, because RISC-V does not have even many features that existed in Zilog Z80 or in vacuum-tube computers from 70 years ago, like integer overflow detection and indexed addressing.
Someone who has learned only RISC-V has an extremely incomplete image about how a normal instruction set architecture looks like. When faced with almost anyone of the several thousands of different ISAs that have been used during the last three quarters of a century a RISC-V assembly programmer will be puzzled.
6502 is also too quirky and unusual. I agree that among the old ISAs DEC PDP-11 or Motorola MC68000 are better ISAs for someone learning to program in an assembly language from scratch. Writing assembly programs for any of these 2 is much simpler than writing good assembly programs for RISC-V, where things as simple as doing a correct addition are quite difficult (by correct operation I mean an operation where any errors are handled appropriately).
This is the first time I’ve heard of this, but RISC-V not providing access to carry and overflow status seems insane. E.g. for BigNum implementations and constant-time (branchless) cryptography.
RISC-V does not have carry and overflow flags in the traditional sense. Which is actually great because it needs you don't have to specify the complicated details of how any single instruction might affect the flags. (And yes, it uses compare-and-branch instructions instead.) It does provide suggested insn sequences to check for overflow, which might be executed as a single instruction in a more full-featured implementation.
Those sequences of instructions transform the cheapest arithmetic operations into very expensive operations and they make unachievable the concurrent execution of up to 8 arithmetic instructions, which is possible in other modern CPU cores.
Using instruction fusion of a pair of instructions to achieve the effect of indexed addressing, but in a many times more complex and more inefficient way, is believable even if ugly.
Using instruction fusion to fuse the long sequence needed for checking overflow into a single operation is unbelievable.
Even if that were done, the immense discrepancy in complexity between detecting overflow with one extra XOR gate in a 64-bit adder that may have hundreds of gates, depending on its speed, and an instruction decoder capable of fetching ahead and decoding the corresponding long instruction sequence into one micro-operation is ridiculous.
I'm surprised RISC-V doesn't expose carry. Thanks for pointing this out. For embedded programming with AVR and ARM I often prefer ASM over C because I have access to the carry flag and I don't have to worry about C/C++ overflow undefined behavior.
I also agree the 6502 is not a simple ISA. However after learning 6502 machine code, MIPS, AVR, and ARMv6-M were all easy to learn.
Lol. I've been programming assembly language since 1980, on at least (that I can remember) 6502, 6800, 6809, 680x0, z80, 8086, PDP-11, VAX, Z8000, MIPS, SPARC, PA-RISC, PowerPC, Arm32, Arm64, AVR, PIC, MSP430, SuperH, and some proprietary ISAs on custom chips.
RISC-V is simply the best ISA I've ever used. It's got everything you need, without unnecessary complexity.
> RISC ISAs were intended to be programmed exclusively in high-level languages and never in assembly language.
That's a misunderstanding. RISC was designed to include only the instructions that high level language compilers found useful. There was never any intention to make assembly language programming difficult.
Some early RISC ISAs did make some of the housekeeping difficult for assembly language programmers by for example having branch delay slots, or no hardware interlocks between things such as loads or long-running instructions such as multiple or divide and the instructions that used their result. So if you counted wrong and tried to access the result register too soon you probably silently got the previous value.
That all went completely out the window as soon as there was a second implementation of the same ISA with a different number of pipeline stages, or more of less latency to cache, or a faster or slower divide instruction. And it was just completely untenable as soon as you got CPUs executing 2 or 3 instructions in each clock cycle instead of 1. The compiler could not calculate when it was safe to use a result because it didn't know what CPU version the code would be running on.
Modern RISC -- anything deigned since 1990 -- is completely fine to program in assembly language.
> RISC ISAs were intended to be programmed exclusively in high-level languages and never in assembly language.
> That's a misunderstanding. RISC was designed to include only the instructions that high level language compilers found useful. There was never any intention to make assembly language programming difficult.
That is no misunderstanding, but almost a quotation of the 4th point in the definition of the term "RISC", in “RISC I: A Reduced Instruction Set VLSI Computer”, David A. Patterson & Carlo H. Sequin (University of California, Berkeley), presented at ISCA '81.
Of course they did not try to make assembly language programming difficult as a goal, but the theory was that no ISA feature should be provided with the purpose of making the assembly programming easy, because it was expected that any inconvenience shall be handled by a compiler for a high-level language.
In practice, only the academic RISC ISAs from Berkeley and Stanford were less convenient to program in assembly language, while those from the industry, like IBM 801 and ARM and their successors did not have any disadvantage for programming in assembly language, on the contrary they were simpler to program than many less orthogonal older processors.
RISC-V has everything needed only in the same way as an UNIVAC from 1950 has everything needed.
As an ISA for didactic hardware implementation, RISC-V is very good. As a target for writing programs it is annoying whenever you are writing non-toy programs.
RISC-V has only one good feature in comparison with x86 or ARM, the combined compare-and-branch instructions, which saves a lot of instruction words in comparison with separate instructions, because branches are very frequent, e.g. one branch to every 6 to 8 instructions.
Except for that, there are too many important features that are missing without absolutely any justification and without any benefit.
Detecting integer overflow in hardware, like in almost all CPUs ever made, with the notable exceptions of Intel 8080, RISC-V and a few other that have never pretended to be suitable for general-purpose computing, is exceedingly simple and cheap. Detecting integer overflow in software is complicate and very expensive.
Implementing indexed addressing in hardware is extremely simple in comparison with implementing instruction fusion or with increasing the number of functional units and increasing the width of all structures to be able to execute more instructions concurrently, in order to reach the same performance for an ISA without indexed addressing.
While the loops of RISC-V usually save an instruction at the loop test, they waste at least one instruction at each memory access, so except for very simple loops RISC-V needs much more instructions for executing an interation than most other ISAs.
The RISC-V proponents have bogus excuses, i.e. the availability of the compressed mode and the possibility of using instruction fusion. These 2 are just workarounds for a bad instruction encoding that requires an excessive number of instructions for encoding any program. They can never be as good as a well designed instruction encoding.
All competing ISAs that are better encoded can also implement a compressed encoding (like ARM, MIPS and POWER also have) and they can implement instruction fusion. They seldom do this only because they seldom need this, unlike RISC-V.
I have programmed in about the same list as ISAs as enumerated by you, over about the same time interval. I agree that for most controller tasks RISC-V is acceptable, even if sub-optimal. When it is cheap enough (due to no royalties) that can overcome any other defects. On the other hand I have also written various programs dominated by complex computations, where the deficiencies in arithmetic and array addressing of RISC-V would have made it clearly unacceptable.
"4. Support high-level languages (HLL). An explanation of the degree of support follows. Our intention is always to use high-level languages with RISC I."
That doesn't say anything about making it difficult to use assembly language. It speaks only to making it UNNECESSARY, to the largest extent practical.
Maybe you weren't around at the time, but I remember very clearly the literature on the software crisis and the VAX being designed to make assembly language programming as productive as possible, EVEN at the expense of raw machine speed (the VAX 11/780 was slower than the PDP 11/70), because no one trusted the performance of high level languages.
The above aim of RISC was to make a machine fast enough and easy enough for compilers to use effectively that there would seldom be a reason to move out of the more productive high level language.
Great flamebait ;) what makes riscv suitable for teaching is that the ISA is trivial to map to hardware. This is because it is so regular. Students with zero experience can design simple riscv cpus in their first asm course. More experienced students can design cpus with pipelining, branch prediction, ooo execution, etc. Old ISAs from the 1980s are way more difficult.
RISC-V has been designed with the exact goal for being an ISA easy to implement in hardware, so that students will be able to do this.
For this purpose, RISC-V is excellent.
The RISC-V ISA has not been designed as an efficient method for encoding computer programs and even less for being easy to program in assembly language.
When programming in assembly language, a complex ISA is bad, because one cannot hold in mind all its peculiarities, but a too simple ISA is even worse, because every simple operation must be implemented by a complex sequence of instructions that is easy to forget and to get wrong.
I wonder if any of those who claim that the RISC-V ISA is simple can remember without searching the documentation how to implement the checks for integer overflow after each arithmetic operation.
The didactic examples of using RISC-V usually do not check for any errors, which is not acceptable in any critical application. I find funny that sometimes the same people who claim that the unsafe RISC-V ISA is good also claim that C and C++ are bad, because with default compilation options they are unsafe in comparison with e.g. Rust.
> I wonder if any of those who claim that the RISC-V ISA is simple can remember without searching the documentation how to implement the checks for integer overflow after each arithmetic operation.
I mean, I can't give you the Hacker's Delight magic numbers for expressing integer division by a constant via multiplication on AArch64 or x86-64 off the top of my head either, but that's what we have compilers for. The fact that you sometimes have to look up how to do simple things is just part of programming.
The 68K is still a tiny bit awkward, with its 24-bit bus and alignment restrictions and middling support for indexing into arrays (no scale factor). The 68020 is about as close to C as an ISA can get - it’s extraordinarily pleasant.
While I agree that 68020 felt like a great improvement over 68000 and 68010, scaled indexed addressing is an unnecessary feature in any ISA that has good support for post-incremented and pre-decremented addressing.
Scaled indexed addressing is useful in 80386 and successors only because they lack general support for addressing modes with register update, and also because they have INC/DEC instructions that are one-byte shorter than ADD/SUB, so it is preferable to add/subtract 1 to an index register, instead of adding/subtracting the operand size.
Scaled indexed addressing allows the writing with a minimum number of instructions of some loops where multiple arrays are accessed, even when those arrays have elements with different sizes. When all array elements have the same size, non-scaled indexed addressing is sufficient (because you increment the index register with the common operand size, not with 1).
However there are many loops where scaled index addressing is not enough for executing them with a minimum number of instructions, while using post-incremented/pre-decremented addressing still allows the minimum number of instructions (e.g. for arrays of structures or for multi-dimensional arrays).
Unfortunately not even MC68020 has complete support for auto-incremented addressing modes, because besides auto-incrementing with the operand size there are cases when one needs auto-incrementing with the increment in a register, like provided by CDC 6600, IBM 801, ARM, HP PA-RISC, IBM POWER and their successors (i.e. when the increment is an array stride that is unknown at compile-time).
On x86-64, using scaled indexed addressing is a necessity for efficient programs. On the other hand on ISAs like ARM, which have both scaled indexed addressing and auto-indexed addressing, it is possible to never use scaled indexed addressing without losing anything, so in such ISAs scaled indexed addressing is superfluous.
The 24-bit bus means that you can use the top bits of a pointer as a tag. In a small system we don't need that much memory, this can actually be a great advantage. We are rediscovering the value of tag bits in 64-bit systems.
While I learned programming with 6510, I agree that 68000 instruction set was much nicer and easier to read and learn. I would also chose 68000.
This said, 65XX on an early Commodore computers was extremely rewarding to use because there was no memory protection and you could write code altering video memory, mess with sprites, fonts, borders, interrupts, write self modifying code etc etc. 68000 assembly on Amiga was more safe and controlled.
When I was in college, they taught the Computer Architecture course using the 68000. Coded GUI stuff on the Mac 128k with assembler, and it was surprisingly easy, especially compared to doing anything with the 8086.
For anyone interested, learn the PDP-11 instruction set from the first model, the PDP-11/20, before DEC added protection modes, memory management hardware, etc. A summary of the instruction set is in Appendix A, and it's worth taking a look at chapters 3 and 4 of the PDP-11(/20) Handbook to get more detail on each instruction and the addressing modes.
My school started with a 6502 lab and then followed it with a 68k lab. That still seems like the right order to me all these years later. Starting with the smaller, more cramped chip and learning its limitations makes it all the more interesting when you can "spread your wings" into the larger, more capable one.
You scale up the complexity of the programs you need to build with the complexity of the chips. In the 6502 lab it was a lot about learning the basics of how things like an MMU works and building basic ones out of TTL logic gates. In the 68k lab you take an MMU for granted and do more ambitious things with that. There's useful skills both from knowing what the low level hardware is like as intimately as a 6502 can get versus the skills from where it is easier to program because you have more modern advantages and fewer limitations.
The other thing about that order was that breadboarding the 6502 hardware was a lot more complex, but it made up for it in that writing and debugging 6502 assembly was a lot easier. There are a ton of useful software emulators for the 6502 and you can debug your code easily before ever testing it on lab hardware. At one point in that lab I even just wrote my own mini-6502 emulator specific to our (intended) breadboard hardware design. On the other side, there a lot fewer software emulators for the 68k and the debug cycle was a lot rougher. The 68k breadboard hardware was a bit more "off the shelf" so it was easier to find an existing emulator that matched our (intended) hardware design, but emulator itself was buggier and more painful to use and ultimately testing on the real hardware was the only trustworthy way to deal with it. I also wasn't going to try to write my own 68k emulator. (Though some of the complexity there was the realities of hardware labs in that the hardware itself starts to pick up its own localized quirks from being run only on breadboards for years.)
Yeah. If you want to teach someone to implement a CPU, then the sort of simplicity the 6502 brings to the table is useful, but it's definitely not the most straightforward CPU to write code for.
The 68000 instruction set is very C-like (probably because it draws a lot of inspiration from the PDP-11 which was C's first target). Implementing it sucks (I know from experience) because the ISA is on the large side and all the little pragmatic exceptions add up. But because these exceptions are fairly pragmatic, they're not really a big issue when writing code.
One of the downsides having learned to code close to the metal on old CPUs is that you understood how that system worked and it‘s easy to assume assembler and the resulting machine code are the truth, but on modern CPUs that‘s a lie. A modern beefy out of order CPU will do insane optimization dynamically at runtime and the instructions you carefully scheduled to avoid register spills are nothing but a suggestion to the actual hardware. sigh<
That was never about the instruction set, it was more about the operating system -- or lack of one.
As for modern CPUs and OoO etc, that's only about performance. The CPU, no matter sophisticated, must produce exactly the same results as the simplest in-order CPU.
Hardware is never going to spill a register to RAM when you didn't write that. The maximum that is going to happen -- and this is pretty recent -- is that memory at small offsets from the SP might be treated as if it was extra registers, and this renaming can be tracked over smallish adjustments to the SP.
"You can't understand what modern CPUs do" is much overblown. Just write good clean code and it will work well on everything.
The performance is the point. 8-bit CPUs are so slow assembler could be - often had to be - hand-optimised for speed.
You can't do that on modern CPUs, because the nominal ISA has a very distant relationship to what happens inside the hardware.
The code you write gets optimised for you dynamically, and the machine is better at it than you are.
You may as well write in a high-level language, because the combination of optimising compilers with optimising hardware is probably going to be faster than any hand-written assembler. It's certainly going to be quicker to write.
The 68000 is a very nice, clean machine to code for, but the main point of doing it today is historical curiosity. It's quite distant from modern computing.
The original 68K instruction set is distant from modern computing only in these points:
- 32 bits
- lack of vectorization and such.
It's still perfect for most embedded stuff, by my estimation.
Well .... there are certain points like: wasn't there some issue with branch displacements on MC68K being confined to 16 bit ranges? If you have large functions, it can be a problem.
I dimly remember a project I was on circa 2001 to port a protocol stack (that we were developing on x86) to a system with some Freescale processor with a 68K instruction set.
I remember having to chop the source file into several translation units, because when it was all in one file, the inlining or static function calls or whatever, were generating PC relative branches that were too large for the opcode.
With today's hugeware, you'd be running into that left and right.
> I remember having to chop the source file into several translation units, because when it was all in one file, the inlining or static function calls or whatever, were generating PC relative branches that were too large for the opcode.
That's just inadequate tools.
With GNU as for RISC-V if I write `beq a1,a2,target` and target is more than 4k away then the assembler just silently emits `bne a1,a2,.+4; j target` instead.
The MC68K has a BRA instruction with an opcode 0110 0000 (0x60). The low byte of the 16 bit opcode word is a displacement. If it is 0, then the next 16 bit word has a 16 bit displacement. If that low byte is 0xFF, then the next two 16 bit words have a 32 bit displacement.
The displacement is PC relative.
This unconditional 0x60 opcode is just a special case of Bcc which ids 0x6N, where N is a condition type to check for a conditional branch. Bcc also has 8, 16 or 32 bit displacement.
So, yeah; that's not a problem. Not sure what the issue was with that GCC target; it somehow just didn't generate the bigger displacements.
Not sure why you were downvoted. What you say is true. You could count the cycles of the instructions to figure out exactly how long a loop would take at a given clock frequency. The entities manipulated by the program literally existed. If the ISA said there are 8 registers, you could find 8 areas of the silicon die corresponding to those. The ISA features on a modern processor are a fiction. Especially if it something riddled with 50 years of backwards compatibility, like Intel.
I would argue that 6502 is a bad first ISA to learn if you want learn assembly. You‘ll spend most of your time fighting the quirks of this clever, but deeply flawed architecture. The idioms you learn to work around them don‘t translate to any better designed architecture that wasn‘t constrained by the tools and budget available to MOS at the time.
If you want to learn a small, yet powerful instruction set with a few quirks go for ARM v6M. It‘s still in meaningful production (no the Monster 6502 doesn‘t count), has good platform support in the latest open source toolchains (debuggers, compilers, assemblers, linkers, etc.).
If you value openness of the architecture enough to deal with a less mature platform (as of early 2025) then pick a RISC-V MCU instead. If you can‘t decide pick a RP2350 :-).
The ARMv6M instruction set is small and no loading constants doesn‘t require long winded instruction sequences if you do it as documented (PC-relative load instead of shifting in immediate data). You don‘t have to deal with self modifying code and/or the zero page to index memory. Your registers are the same width as the address space. Yes it‘s 32 bits, but that makes it simpler to learn, use and teach than all 8/16 bit and most 16 bit instruction sets I’ve seen because you don‘t have to work around to narrow registers for common operations. To anyone who thinks this sounds boring: don‘t worry ARMv6 still has enough quirks you can use for code golfing.
I wrote a lot of 6502 assembly once, and you spend a lot of time dealing with the 8-bittedness of the architecture. Multiplying two 16 bit numbers is a whole blob of code. That doesn't seem useful for new programmers to struggle with.
It depends on what you are trying to teach I guess. If you want to impress upon students that everything is made of bytes, that could be useful.
I don't think the 6502 would be a good fit because zero-page doesn't really translate to any useful modern concept. Quite a lot of 6502 coding is Zero-page management.
I went with 8 bit AVR as an instruction set for my silly fantasy console project. It has an in-browser editor and assembler to let people write 8 bit code. The AVR has the best 8-bit instruction set I have found, it's still not perfect (only loading constants to some registers) but definitely built with the hindsight provided by it's predecessors.
If you wanted to avoid the management of data types, I would suggest an instruction set with floating point registers. The same management of bytes into words and dwords, signed and unsigned etc. has to happen on a CPU without floating point support. It's an added complication, which you may or may not want to expose students to.
If the intent is to use Asm to teach from the point of view of "every instruction is a clearly defined action" I would use something with 32-bit ints and 32-bit floats.
If you wanted people to feel our pain, go with 6502, Z80, or PIC depending how sadistic you are.
I believe you should be able to sort~of simulate higher level languages on the editor level. Writing c = a * b could just be a representation for the large blob of code or subroutine.*
What is particularly funny about your example is that 40 years later you still cant do multiplication or addition in js. You have to install packages (that are multiple computers in size!) after you make up your mind which of the 20+ different modules (read: "dependencies") has the right kind of multiplication for you! I'm not bitter, it's objectively ridiculous :)
* it should actually be c = a × b Without the asterix, we did actually have arithmetic reasonably standardized before IT de-standardized it.
I think this is true for any bus/register size. Various high-level language "bignum" libraries generalize to any size.
But it's true that with only 8 bits, the problem happens very soon. However, if one doesn't want to solve basic problems "the hard way", one probably won't be interested in assembly programming anyway.
Multiplying (or even adding) two 16bit numbers is a good learning experience for assembler. I think it really depends on your goal. I learned x86 assembly way back when it was relevant but now I do 6502 assembly for fun. The simplicity and the limitations are what make it interesting.
ARM does have a few pain points for new comers related to how immediates are embedded within instructions. It can be non obvious why a certain immediate or offset can be used but not another.
Granted, this extends across a wide range of assembly languages, but it'd be nice to start with a instruction set without this issue.
The 6502 is designed to be a simple, minimal, low-cost CPU. From that perspective, it's a brilliant design, and it's fun to write small programs for.
Where it becomes "deeply flawed" is if you're trying to develop large, complex programs for it, mainly because there's no way to efficiently implement pointers or local variables -- you only have 3 registers, none of which are truly "general purpose", and none of which are wide enough to hold a pointer; and stack operations are severely limited. (This also means that writing a C compiler that targets 6502 and generates efficient code is almost impossible).
So, in idiomatic 6502 code, all variables are global; and if you need dynamically managed objects, you keep track of them using indices into statically-allocated arrays. This is difficult to scale as your programs get large and complex, because at some point you're going to waste an afternoon finding out that your "tmp7" variable in one routine is getting clobbered by another routine 5 levels deep down the call chain that also uses "tmp7" for some unrelated purpose.
It was a perfect processor for its time and its target market, but it made heavy design compromises to achieve its goals, and Moore's law quickly made those compromises obsolete.
>Moore's law quickly made those compromises obsolete.
Your overall argument is perfect, but I'd question the use of "quickly" since people were selling essentially 6502-based stuff for how long, 20 years at least? From the Apple II in 1977 to the SNES which was released in 1990 and still getting games in 1995, so that's 18 years of that CPU being highly relevant and worth learning.
Of course, you're 100% right that, assuming money was no object and you could buy and use whatever hardware you wanted to, it wasn't long before you could escape those compromises since the x86 and the 68000 (and probably others I don't know) brought much better architectures to those who could afford them.
It's not "deeply flawed" at all, OP is being overly dramatic with that statement.
I've coded on a bunch of embedded 8-bit platforms over the decades, and 6502 is great. A, X, Y registers - it's really quite simple. It has various standard and useful addressing modes. It has pretty much the same status register that exist in modern 8-bit MCUs. There's nothing "deeply flawed" about it.
32-bit MCUs are probably a bit too complex for a beginner, 8-bit MCUs will teach a newcomer a lot about the basics of computing in an easy to learn way. It will teach them the significance of "a byte" and working with raw data that maybe you don't exactly get with 32-bit MCUs. There isn't that much to master with a 6502, it's pretty simple, but amazing things can still be done with it.
It wouldn't surprise me if you could do this with a RP2350, connecting GPIOs to the same location as the 65C02 does on the breadboard and even running emulated 6502 code.
It's totally not the same thing of course. A whole lot of transistors and clock speed go to making that feat possible.
As someone who has been teaching assembly to undergrads for many years, I have a couple of things to say about this. First of all, I agree. The 6502 is great for beginners but that is not just merit of the 6502 language and I want to explain why.
I have taught 68K, MIPS, ARM, x86, etc., and the overall good student feedback I got by teaching 6502 is mostly because of the surrounding context that comes with the CPU. The reason 6502 clicked better than other modern alternatives (MIPS, ARM, x86, etc.) was because we use it to program a real machine that is simple to understand (i.e. Nintendo Entertainment System). Rudimentary memory mapped IO, no operating system, no pipelined instructions, no delay slots, no network, no extra noise, ...it's just a simple box with a a clock inside, a CPU, some memory addresses, some helper chips, IO mapped to memory addresses, and that's pretty much it!!! So, even though I agree that the 6502 is not the simplest instruction set out there, THIS simplicity of the system helped a lot.
And about the limitations of the 6502 CPU, these limitations were also important for students to understand that these instructions have a reason to be the way they are. CPUs were designed and wired given the constraints of that time, and that reflects on how we programmed for them.
So, even thought this was mostly empirical, I have to say picking 6502 and the NES to teach beginners was successful. Once again, not really because it was the 6502, but because the 6502 forced us to go simple in terms of the sytem we were moving bits left and right.
Once students played around with the 6502 and saw NES tiles moving on the screen, then it was super cool to evolve and show them how the 68000 did things differently, and then evolve more and show how MIPS came, show how pipelining works, how to take advantage of delay slots, and being able to compare the differences of RISC and CISC. It's super simple to evolve once the basics are there.
I'm slightly surprised that no one has suggested PDP-11 assembler as a good starting point if you're not going to learn a current instruction set. Perhaps it's because it was the first one I learnt properly but all the early miccroprocessors felt like a step backwards. I did spend a few years writing Z80 assembler but I wouldn't recommend it nowadays as it's not a very orthogonal instruction set and 6502 doesn't have enough registers to give you a proper feel for writing assembler.
It's kind of hard to get hold of a PDP-11 these days. Even getting an OS, compiler etc is not that easy.
If you like the PDP-11 then you get the same qualities slightly restricted in the MSP430 and slightly enhanced in the 68000.
But, really, just forget all those relics and learn either RISC-V (the best answer) or else one of the half-dozen Arm variations. I'm partial to ARM7TDMI myself for sentimental reasons doing a lot with it in the mid 2000s. The Thumb mode is probably slightly easier to learn than the original Arm mode, but neither is as satisfactory as RISC-V.
> Even getting an OS, compiler etc is not that easy.
There's a GCC fork [0], macro11 [1] (GCC and clang also both have macro11 backends), ack [2] and more.
Getting hold of a modern compiler is trivial.
> It's kind of hard to get hold of a PDP-11 these days.
The PiDP-11 [3] emulator that runs on a Pi, is fairly popular among the retro crowd. So sourcing something hardware wise that behaves that way is easily possible.
The Computer History Simulation Project [4] will give you easy access to simulating a PDP-11 on just about anything that you own.
But if you want the original hardware, then they're in the $400-500 range, in my area. Easy to source.
Through a few of the computer-oriented antique dealers, yes. Not so much online. Part of being locally located, means the device has been sitting on their shelves for over a decade without anyone even making a bid. Several would take even less.
But for a hobbyist, I'd still probably recommend something like the PiDP. Run the system and learn it, without having to take the power bill of an early machine.
I mean you can grab SIMH and trivially have a working PDP11 emulator on pretty much any system that has a C compiler.
Then just get 2.11 BSD (or V7 Unix if you're a minimalist/masochist), install it, and you're free to write/run/debug all the PDP11 assembly you want.
But I do question the value of doing this. I'm a big believer in the notion that you can't train skills by proxy, so if your goal is to become proficient in writing assembly for modern architectures, then you might as well do so by learning a modern architecture.
I agree. PDP-11 is so much more pleasant, it's not even close. One could make the argument that the addressing modes are conceptually more complicated than what you'd have on a RISC, but the 6502 addressing modes are probably actually harder to understand than the PDP-11's.
Plus, learning PDP-11 ASM explains some of the idioms from C as they map directly onto the architecture! "Pointer to a pointer" is just a native addressing mode, for instance.
Yah, C's pre-increment and post-increment are map right onto the -11 addressing modes. It's a brilliantly conceived minimal instruction set. Just a joy to code in.
I really don't see how. When students are first exposed to computer programming, it might make sense to start with toy / compact languages that don't have any real-world use. But assembly is not the first language you're supposed to learn!
It's very utilitarian and most commonly just used for debugging and reverse engineering. So why would you waste time on the assembly language of a long-obsolete platform?
Plus, the best way to learn assembly is to experiment. Write some code in your favorite language and look at the intermediate assembler output, or peek under the hood with objdump or gdb. Try to make changes and see what happens. Yes, you can do that with an emulator of a vintage computer, but it's going to be harder. You need to learn the architecture of that computer, including all the hardware and ROM facilities the assembly is interacting with to do something as simple as putting text on the screen... and none of this is going to be even remotely applicable to Linux (or Windows) on x86-64.
I struggled with C until I learned to hex and assembly program on the 68HC11. Maybe I'm just a moron, but things like pointers for a complete beginner seemed so abstract and obtuse until I learned how to do indirect addressing in assembly, then suddenly it was painfully obvious why C had pointers and how they worked. Before that I just mostly used Python where it's far more abstracted away. People forget that many features like pointers exist to address hardware/performance limitations, which is not immediately obvious to new devs the "why and what" of what they're actually doing inside the cpu which limits your intuitive understanding.
That's why some people see the C language as "portable assembly". I think C is at its best when you want Assembly-like memory addressing flexibility but don't want to deal with instruction-set specific idiosyncrasies.
This is my experience as well, for a lot of features that seem like magic in high level languages. I could somewhat accept pointers even without understanding them, but object-oriented programming with its classes and all was such a mystery to me that I was scared to even try using it.
It's not until I learned assembly language that I understood pointers, and from there, I could implement a basic OOP system in C and finally understand what objects are all about. It only clicked when I learned how to do it from scratch.
I have a hypothesis that to move beyond being a consumer of technology to being a producer of technology, modern education in software development should be built on building fundamentals through first principles — especially with kids and young adults. It should be analogous to how we learn counting, arithmetic, and higher level maths.
One of the best features of obsolete, constrained architectures was their simplicity. Recovery for something wrong is quick and there is (generally) no permanent damage. All of this makes it much easier to understand what is happening at a lower level.
Once you have a basic understanding, then you are ready for the next level of abstraction.
I assume that openness and availability of IP is the biggest challenge to putting some kind of curriculum together. I would be highly interested in whether anyone has curated this into a cohesive educational approach, especially one that targets childhood development.
Spending a couple of months tinkering with hardware and programming assembly will make you understand the basics of how a computer works in a totally different way than other high level languages will. I haven't programmed a single line of assembly after high school (back in the 80s...), but the fundamental understanding of how operations are executed, how registries work makes it so much easier to understand the whys, the ifs and buts of programming and optimization. And you'll certainly start to appreciate clean, effective code.
I wish there were books like this today, that could lead a kid from knowing virtually nothing to a graduate level understanding of computer architecture, digital storage, logic and set theory, graphics, mathematical modelling, networking, numerical methods, signal processing, electrical engineering, software design, ...
Wow, so many memories there! Yeah, BASIC (TI-99), then Atari Basic, then Basic XL (along with reading a 6502 book), then GFA BASIC, then Megamax / Laser C, then Pascal and 68K and Ansi C at uni. RISC was part of the 4th year digital architecture/design course.
There are roughly two ways to learn programming: top down (start with abstract concept and go down to actual implementation, e.g. SICP) and bottom up (start with the concrete low-level code and let the abstractions naturally emerge).
I studied electronics, so naturally we began with assembly (Motorola HC11). By the end of the course everyone had independently developed their macros to do things like for-loops, so it was a natural progression from there to C. By the end of the C course "C-style OOP" had also emerged naturally, which led to the next course in C++.
The downside of this approach is that it there is no gradual route from there to functional paradigms (or non-imperative in general). Also, one develops the habit of always thinking of how the language works under the hood, which can be counter, productive. E.g. when I was trying to learn Haskell, my mind was trying to understand how the interpreter worked.
Learning assembler is not just about the language, but understanding how the machine works (buses, memory-mapped peripherals, etc). In older platforms this is much simpler, so while ARM instructions can be easier to learn than the CISC instructions of the HC11, everything else is much friendlier for the beginner in the HC11.
With the dmd compiler, compiling with -vasm will show the generated assembly as it compiles. It's been poo-pooed because why not use objdump or -S? But once you try it, you'll know why it's so convenient, as it just emits the assembler, and not the huge pile of boilerplate needed to make an object file.
For example, I'm working on an AArch64 code generator, more specifically, generating floating point code. I have a function:
You might think that the compiler was generating assembler code, and then assembling the code to binary. Nope. It generates the binary instructions directly. The compiler has a builtin disassembler for the display. This makes for a fantastic way to make sure the correct binary is generated. It has saved me an enormous amount of debugging time.
Learning assembly is what finally made programming "click" for me. With a solid intuition for instruction sets, pointers and adressing modes I could suddenly reason about programs on another lever.
I have found good results with model of teaching since them and wish that more people tried it.
I started programming in AppleSoft BASIC in 1986. But quickly became fascinated with 65C02 assembly. By the time I got to college and started taking different programming language classes, I quickly fell in love C. My knowing assembly helped me understand C and even though I haven’t learned any other assembly language since then, when I read articles about low level architecture of processors, I can follow along.
Let me take that back, I did learn 16 bit x86 assembly and the int21 DOS commands in college.
I went from BASIC straight to Assembly. I wish they had taught me Assembly when I was 9 years old (in the 1970s) instead of BASIC. Once I learned Assembly I never used BASIC again.
If you had a 16 kB memory card (a.k.a "Language Card") you could overlay the ROM memory with RAM, and load the Integer Basic ROM overe the AppleSoft Basic ROM.
AppleSoft did "everything" with floating-point variables, like loops indexing into arrays. It's amazing that programs ran at usable speeds on a 1 Mhz machine.
My C64 didn't come with a monitor. I typed one in from a magazine, then learned assembly by entering instructions directly into it. I was so thrilled to discover assemblers later.
I thought I was in heaven when I got the Action Replay cartridge (not the genuine, but a clone with the same software) that came with a monitor, a "freezer", fastloader, and various disk utilities.
> The era in which there was nothing but assembly language was very short-lived.
Hell, I'm not even sure the era existed.
Grace Hopper was creating the first few high level languages for UNIVAC I. A-0 was complete in May 1952. A-2 (the first which saw extensive use) was created in August 1953.
As far as I can tell, UNIVAC I never had an assembler. If you weren't using A-0, programmers were expected to just type in raw machine code. Here is a UNIVAC programming manual from 1953 [1], and there is no mention of an assembler. Oh, and if you think you see instruction mnemonics... no, those are just letters which the CPU instruction maps onto.
Over in the IBM world, at least the 701 launched with a proper symbolic assembler in April 1953. But it also launched with Speedcoding [2], a somewhat higher level language halfway between non-symbolic assembler (it decodes mnemonics, but the programmer has to specify all addresses as absolute numbers) and an interpreter.
None of the other early computers seem to have had assemblers (though some, like the Manchester Mark 1, had high level languages).
There might have been some programmers at IBM who might have had access to a the prototype 701, and the symbolic assembler before Speedcoding existed [3]. But for everyone else, they seem to have gotten access to high-level languages at the same time, or before they got access to assemblers.
[3] It's also possible that Speedcoding development was largely finished before the first 701 was operational. I'm finding it hard to find exact dates for that.
Part of the problem is that "Assembly language era" is ill-defined. Personally, I don't think it counts as Assembly language unless you are using a symbolic assembler, because that's what modern programmers think about when you say "assembly".
There is a reasonably common interpretation includes the whole machine code era as "Assembly language", as you are writing it out and then hand-assembling it. Which means the UNIVAC's C-10 machine code counts as "Assembly language", even if they weren't using that terminology. With this interpretation, the "assembly language era" lasted a few years. but I think this inspiration is very misleading to any programmer exposed to a proper assembler.
Anyway, even with my stricter definition, there was an assembly-only era, but it only seems to have existed inside IBM's research labs. They had their first symbolic assemblers running on the "test assembly" by October 1950.
There is very little information about this "test assembly" computer on the internet, doesn't even have a wikipedia page (same with the "Tape Processing Machine" or TPM that followed). But "IBM's early computers" by Charles J Bashe documents it. This computer had not one but two symbolic assemblers running by autumn 1950, which does actually seem to beat most high level languages.
Long before the first 701 was even installed (in April 1953, at IBM's HQ), programmers had already gotten sick of assembly programming, which is why speedcoding was created.
Though, this wasn't just internal IBM programmers. Customers who bought the 701 were given documentation about both the computer and the assembler as early as 1951. These customers had hired programmers, who had started writing assembly code, months before the 701 assembler was even debugged and running on the first prototype 701, and years before they received their computers. So maybe there was also an "Assembly language only era" in the offices of these early 701 customers. But it's kind of an edge case if they didn't have a computer to run the assembler on, or test their programs.
I assume these early programmers were occasionally visiting the prototype 701 to assemble and test their code.
> I believe 6502 instruction set is a good first assembly language
It was for me. In 1977 I lived in a little cabin in Oregon. I bought an Apple II as a diversion. Within a year I was working on a program that later became "Apple Writer" (https://en.wikipedia.org/wiki/Apple_Writer), written entirely in assembly.
Some in this conversation identify 6502 assembly as rather clunky and difficult to use. In retrospect I have to agree, but in 1977 I didn't have a basis for comparison.
There were no fast, high-level languages available for the Apple II, so my little program became an Apple product, primarily from the lack of alternatives.
Consider this -- Apple Writer lived in 8 kilobytes of RAM, but actually did things. It even had a macro language people used to process address lists.
I just boosted my main system to 96 gigabytes of RAM to be able to more easily host DeepSeek locally (I have an RTX 4090). I just realized that's enough RAM to hold almost 12 million copies of Apple Writer.
This is all pretty surreal ... but I've had occasion to say that a number of times since 1977.
I prefer RISCV as an starting assembly language because: it has good design, it's more intuitive, it has modern language and tool support (GCC, LLVM, Rust, etc.), and it runs on QEMU and real available hardware.
It's $8 and can't run without external ROM and RAM and a clock circuit and some glue logic.
A RISC-V CH32V003 32 bit processor with 2k RAM and 16k of flash running at 48 MHz costs $0.10 for an 8 pin package (6 available I/O pins) or $0.20 for the 20 pin package. Once programmed, it needs only electricity between 2.8V and 5.5V to be applied to start running.
The recommended dev board, with a USB programmer/debugger interface for your PC, plus 5 of the 20 pin chips, costs $8.35 about the same as a bare 65C02 chip.
For a "first dive" into a programming paradigm, I could see the appeal of something more "PC" shaped than "MCU shaped", just because it offers recognizable, easy ways to deal with primitive debugging.
If you have a memory-mapped frame buffer, you can write a single byte to it if you need a status checkpoint or tracking a variable. If you have a keyboard, you can probably read its buffer or use it for triggers.
Maybe modern debugging environments and tools make it easier, but I tend to think of my university assembly language class which featured original Intel SDK-86 boards with LEDs and hex keypads to interact with.
Yes I'm well aware. I have a 65C02 on a breadboard. I have a 6809 on a breadboard. I've seen all Ben's videos.
The *only* advantage the 6502 has is the exposed and non-multiplexed address and data busses, which you can single-step and examine with an Arduino or something.
The whole setup is going to cost you the thick end of $100.
But you can do that just as easily in an emulator.
And meanwhile with RISC-V you can get an ESP32-C3 board with 400k RAM for a dollar or two, or the 1 GHz 64 MB RAM 64 bit Linux Milk-V Duo for $5. And if you wish, run exactly the same programs on a $2500 64 core 128 GB RAM Pioneer. And bunches of $10, $30, $70, $100, $200 boards (and now $400 laptops) in between.
Only ARM gives you similar breadth, but the instruction sets of a Cortex-M0 and M4 Mac are so different they might as well be from different companies.
With an 8088 you could port DOS (which resembles CP/M-86) though games would require more memory and a video chip. But we've kind of left true 8-bit territory at this point.
> It's $8 and can't run without external ROM and RAM and a clock circuit and some glue logic.
True but I've always found that there was a special 'charm' to these old CPUs because you have to build the circuit you described as a bare minimum, which is not difficult and makes you learn a range of skills.
The topic of the 6502 ISA simplicity is a pet peeve of mine, because to me it's clear that anybody thinking that such simplicity is a good thing, never progressed past a hello world.
Programming anything of moderate complexity on the 6502 is hard. 8 bits are way too restrictive (e.g. screen addressing on the Commodore 64). Multiplications/divisions need to be hand-rolled. And even 16 bit sums/subtractions are simple but still not trivial to perform efficiently.
The 8086+DOS platform is way easier to work on, in comparison (if one wants to learn basic assembly).
> Multiplications/divisions need to be hand-rolled.
In the context of someone's first contact with assembly language, that's a good thing! Translating various multiplication and division algorithms to assembly is a great way to learn it.
I learned to implement my own multiplication and division as part of learning bignum with a high level language (in my case Pascal). By the time I learned assembly language I simply focused on how to translate high level language to assembler; basically be a manual compiler and compare my output against a real compiler.
It seems unnecessary to learn any algorithm including the multiplication algorithm in assembly language.
The restictiveness is part of the fun. 8086 assembly isn't as much fun. If you're already going for something irrelevant (not x86-64 or RISC-V or ARM) then I don't see the advantage of 8086.
I probably wrote more 8086 (really 80286 in real mode) assembly than any other assembly. I loved every minute of it, I didn't even mind the segment registers.
I go to an compsci-centered technical institute and my informatics professor (a retro nerd that loves to show us his arcade machines) went against the set program for our class, which is learning the 8808 assembly language, and instead focused on the 6502. It was honestly one of the best learning experiences of my life and i woudln't have it any other way. He even got us to build the ben eater breadboard computer so it was particularly hands on and really interesting!)
My first exposure assembly language was the PDP-10. All I had was the DEC-10 processor manual. I was completely baffled by it. It had hundreds of instructions, with completely opaque descriptions. What was a register? What was an accumulator? What was an address? What was a stack? I had no idea. David Rolfe wrote some subroutines I needed for my Fortran version of Empire, and that helped a little, but I was lost.
One day, I asked my friend Shal Farley, what was a stack? He said "Imagine a stack of plates. You added a plate (pushed it on the stack), and took off a plate (popped it off the stack). Suddenly, Shal had turned the lights on! I instantly understood it.
Then, I began working with a 6800 microprocessor on a little board. It had something like 40 instructions, that all fit on a card. 40 instructions were simple to learn, and suddenly, I got it.
I want back to the -10 manual, and it all made sense.
I’ve only ever learned two assembly languages: 6502 (when I was in elementary school in the early 80s) and 370 (my senior year of high school when I had access to the UIC mainframe thanks to taking night classes there). I can roughly follow the output of godbolt on those rare occasions that I’m curious enough to bother, but the complexity of modern CPUs and computer architectures are such that I don’t know that I really need to deal with assembly language now. That said, my mental model of how a computer works at a low level is very much based on how the Apple II was set up.
That's what did it for me, too. The magazines at the time said that if you wanted to make your game run faster, you had to write parts of it in assembly. I was fortunate that no one told me that was supposed to be harder. I just knew that it was different, and it never occurred to me to be leery of it.
6502 was my first assembly language (back in the 80s). It was fine. However,
when I later had to do some Z80 assembly programming I considered that instruction set a lot nicer. It had more registers. It had a swappable register set. It even had a few 16 bit instructions. Really nice.
I guess most people just give up once they see the ugliness of what they got from Intel.
I guess, it's a matter of perspective. The Z80 is derived from the 8008 (actually the 8080), which was an on-a-chip implementation of the processor of the Datapoint 2200. Notably, the DP2200 was designed around an early Intel shift-register for memory (meaning, sequential memory), hence the multitude of internal registers in order to minimize memory access. (It's probably only for the 8080, which provided better support for direct memory access, that there's this notion of luxury about these internal registers. Before this, it had been a necessity.)
The 6502, on the other hand, is a derivate of the 6800, a genuine microprocessor design that takes fast random access into account. And thus, taking fast memory for granted, it became viable to outsource some of the internals. From this perspective, the 6502 provides a plenitude of 256 slightly slower registers in the zeropage. – However, if you're using the 6502 like this, you may discover that you lose some of the much acclaimed advantage in cycle count per instruction.
Zilog Z80 had very significant improvements over Intel 8080. From many points of view it can be considered midway between Intel 8080/8085 and Intel 8086/8088.
Besides increasing the number of registers, Z80 has added many features that were standard in any general-purpose computer (e.g. signed integer support and indexed addressing), but which were missing in Datapoint 2200/Intel 8008/Intel 8080, simply because Datapoint 2200 had not been designed to be used in general-purpose computers but only for implementing the logic required inside a serial terminal.
However many programs for Z80 did not make good use of its additional functions, because they were intended to remain compatible with the legacy Intel 8080 systems.
Intel 8086 did not have this problem, because it was only source-code compatible with 8080, not binary compatible, so any application had to be recompiled for it, and fully using the new ISA did not have any additional disadvantage.
Unlike 6502, Intel 8080 and Z80 had a few 16-bit operations, which were intended for address computations, while data operations were expected to be handled using the 8-bit accumulator.
Despite the intended use and the limited set of 16-bit operations, implementing complicated arithmetic operations, e.g. floating-point arithmetic, was still faster using the 16-bit address registers and operations for handling data. With properly optimized programs, Z80 and even 8080 could be much faster than 6502 for number crunching. (Though faster is only relative, because FP64 floating-point operations took many milliseconds per operation on any 8-bit microprocessor, many billions of times slower than on a modern laptop or desktop CPU.)
I think, this is much about another major difference in the history of both designs: while originally for a different architecture, the DP2200 processor / Intel 8008 was meant to be a CPU for a small computer system from the beginning. The 6800 and in consequence even more so the 6502 was more about a microcontroller for implementing in software what wasn't economically viable to be put in silicon. Notably, it was not meant to be a computer CPU. Thus, the 6502 falls short on many things like support for large stacks, as required for higher languages, or efficient 16-bit operations. (Its philosophy may be better described as, "if it runs, it's good enough, even better so, if it runs cost effectively.")
PS: regarding the DP2200 not being meant as a small system, I'm not so sure about this based on my own reading. But, certainly, it wasn't marketed as such.
And, regarding the educational merits of the 6502, it may be a good second language, as it requires you to think about your implementation. (Personally, I'm more for the PDP-1, which Ed Fredkin – "world's best programmer", no less – once claimed to have inspired IBM's RISC architecture. ;-) )
I love 68K. So nice to read with the separated data/address registers, and between the instruction set and addressing modes you can easily translate idiomatic C into reasonable 68K assembly.
Main downside is that with 68K you're more likely to be loading your program into an OS rather than bare metal hardware, and have to deal with things like relocation.
68000 is much more pleasent from my experience (~40 years ago, so might be wrong). Personally I also like the Z80 better than the 6502, but that is just taste. The biggest benefit of the 6502 was a zero page, I loved the idea. My favorite computer book of all time is "Z80 Assembly Language Subroutines".
Can anyone speak on how it is to move from an older assembly, to a modern CPU? I asked to take an assembly class in local/public college, and was told they wouldn't hold the class because not enough students were interested. This was in 1998, and I truly couldn't believe my ears.
I feel like learning modern assembly would be more useful, but maybe 6502 assembly is far easier to pick up? The first language I learned was Atari BASIC around 1991, and it was enough to get me to keep on moving to better languages and more powerful CPUs, but I wonder where I would have ended up if I learned assembly either before or after Atari BASIC. I try to keep up with technology, and have read quite a bit on different assembly dialects. I still haven't learned an assembly that I can program in, and I suppose it's because there are so many other high-level languages now and I feel like I need to keep up to what is used in "industry" rather than what is more obscure (but might help in debugging).
I moved from 6502 assembly to PowerPC assembly. I found 6502 assembly easy to understand because of the limited number of registers, instructions and addressing modes. You didn't have to learn very many.
And as it turned out, I didn't have any problems adapting to the expanded PPC ISA. Being on the 6502 already taught me how to chop problems down into tiny bits and now the bits didn't have to be so tiny. I certainly wasn't trying to do 8-bit chained adds for bigger quantities anymore; it didn't make sense to when I had 32-bit registers and an FPU.
I started with BASIC on a Vic 20 in 1979 in a computer class for kids. Then I got my own Atari 400 at home, so I learned BASIC on that. Then around 1984 I got a Commodore 64 and it was amazing. I quickly got past the BASIC part of the C64 manual and in the back of the manual they had a memory map, all the hardware registers for all the chips, and then all the 6502/6510 opcodes. There was even an entire schematic for the C64 in the back of the book as a fold-out poster. 14 year old me was hooked. I got into assembly when I was about 15 and never looked back to BASIC. I wrote a lot of assembly code for many years, creating "crack intros" for cracking groups and demos on the C64. I then moved on to Amiga and 68000 assembly, and I even got into SNES 65C816 (16 bit version of 6502) assembly coding using a 30th generation photocopy of the pirated SNES developer manual and a floppy disk game copier, which also had a parallel port that I managed to connect to my Amiga and use that as the development system uploading the compiled program to the SNES via parallel port. I would make "crack intros" for SNES games being traded on pirate BBS's. It was a lot of fun. Many years later, I'm still doing some 8-bit assembly code for microcontroller projects where I'm trying to use the smallest possible CPU to do interesting things. Everything I learned about Assembly on the C64 still applies to the modern microcontrollers (PIC/AVR) that I use today.
In so many of these years I had not learned C. I went from assembly to scripting languages, and was a very early adopter of Javascript since the same month it first came out. Only in the last 10 years have I learned C, and it was so easy! After learning Assembly and Javascript, C was right in the middle.
I've programmed in over a dozen languages through the years, but I'm really happy with Assembly, C, and Javascript as my stack depending on the project.
Thank you for this, I was born in 1979, but got introduced to the Atari 8-bit computers VERY early, and connected immediately. I have a Raspberry Pi v4, and it sounds like learning assembly on nearly anything right now will pay off. I don't have a problem learning a new language, but the C-style of languages are easiest for me. I'm sure them being imperative is a huge plus for me learning new things about any imperative language. I actually find JavaScript to be quite satisfying to write code in, mostly because I can test out functional features more easily than adopting an entirely different language to do it in.
I spent my teenage years grabbing cracks and texts from BBS', and enjoyed the whole scene. It's really what sparked my interest into writing software, and I always wished I had learned some lower-level skills beyond running a debugger and poking at memory. I felt like if I knew Assembly, I could get into deeper debugging (cracking was initially my "exciting" topic to cover, but now I feel like it would just be understanding memory better when I run into a strange error).
Was the 400 as much of a pain to type on as it looks? I was lucky to inherit an 800 when my uncle bought his first IBM clone, and those had nice keyboards.
Before I got the C64 I got an Atari 600XL because the keyboard was a real keyboard. I had really hacked a lot on the Atari 400 keyboard, so much that it was becoming pretty worn in places, so the better keyboard was really an upgrade.
I noted in a different comment in this thread that we teach all first-year compsci and software engineering students at my university MIPS assembly. They may then specialise into other areas, security, operating systems, embedded, etc., and in those specialties may need assembly for more modern CPUs.
We have found that when needed, students pick up the newer/more advanced assembly languages (e.g. ARM, x86) fairly well, so we believe the early and universal introduction to MIPS does provide benefits.
ARM borrowed most of its design from 6502. So if you learn 6502, you've learned many of the tricky parts of ARM (such as how the carry flag works with subtraction).
Many of the instructions have similar mnemonics, such as the conditional branch instructions (BEQ, BNE, BCC, BCS, BMI, BPL, BVS, BVC), as well as the arithmetic instructions like ADC, SBC, EOR.
Not really (perhaps leaving mnemonics aside). The 6502 has little in common with the first ARM. The ARM's designers liked the simplicity and speed of the 6502 and it was their favourite 8-bit ISA. And a visit to the Western Design Center convinced them to design their own CPU.
However, the early RISC papers were the biggest influence on the design of the ARM. There is even a clue in the name 'Acorn RISC Machine'.
ARM designers liked and had extensive experience with 6502 - it WAS their bread and butter for a long time - and this might be why the mnemonics are so similar (and carry in subtraction - that might have been for making porting easy, but I won't risk assuming that without a primary source).
They obviously also studied foundational papers on RISC and understood the possibility of having a simple design (such as the 6502, which was very powerful considering its small transistor count) applied to a simple, more regular, instruction set.
Assigning weight to these factors might a futile exercise, as the designers themselves might not agree.
They were not blind to RISC and, therefore, it made sense to put it in the name of the architecture.
Completely agree with all these points and that without the 6502 the ARM1 would not have looked like it did. One might say ARM1 was inspired by the 6502 and it prevented them going down the CISCy 68k route.
But I don't think that ARM 'ARM borrowed most of its design from 6502' or (to my mind at least) looks like a refreshed 6502. There are just too many fundamental differences:
- ARM1 was a load/store architecture / 6502 wasn't.
- 6502 had a few special purpose registers / ARM1 had loads of general purpose registers.
Plus there are lots of key innovations in ARM1 that weren't in either 6502 or RISC 1 such as conditional execution. Furber and Wilson were really quite innovative and didn't just borrrow ideas from other ISAs.
Things like conditional execution being always available, and the barrel shifter being always available feel a lot more like ideas from VLIW architectures. And when you come from 8-bit instructions and suddenly have 32 bits available, your instructions are a very long word.
Modern CPUs are more difficult to program in assembly.
The simplicity of RISC-V is illusory. Because it lacks many features of normal ISAs, like ARM or Intel/AMD x86-64, writing programs that are both efficient and robust, i.e. which handle safely any errors, is quite difficult in assembly language.
For a simpler programming in assembly language it is hard to beat DEC PDP-11 and Motorola 68000 derivatives.
However those are no longer directly useful in practice. For something useful, the best would be to learn assembly programming using a development board for some Cortex-M microcontroller, preferably a less ancient core, e.g. with Cortex-M23 or Cortex-M33 or Cortex-M55 or Cortex-M85, i.e. cores implementing the Armv8-M ISA (the latter 2 also implement the Helium vector extension).
Probably some development board with a microcontroller using Cortex-M33 would be the easiest to find and it should cost no more than $20 to $30. I would not recommend for learning today any of the development boards with obsolete cores, like Cortex-M0+, Cortex-M3, Cortex-M4 or Cortex-M7, even if those boards can be found e.g. at $10 or even less.
Such a development board can be connected to any PC with a USB cable. All the development tools are free (there are paid alternatives, but those are not better than the free GNU tools).
You can compile or assemble your program on the PC, then load and run it on the development board. You can have a serial console window connecting to your program, by using a serial port on the development board and a USB serial interface. All such development boards have LEDs and various connectors for peripherals, allowing you to see what your program does.
I think that learning to program in assembly such an Armv8-M microcontroller is more useful than learning something like 6502. Armv8-M is less quirky than 6502 or RISC-V and it is something that you can use for implementing some useful home device or even professionally.
Otherwise, the best is to learn the assembly language of your personal computer, e.g. x86-64 or Aarch64, but that is much more difficult than starting with a microcontroller development board from ST (e.g. a cheap STM32 Nucleo variant), NXP, Infineon, Renesas, Microchip, etc.
The most important are the lack of integer overflow detection and indexed addressing. Integer overflow detection is required for any arithmetic operation unless it is possible to prove at compile time that overflow is impossible (which is possible mostly for operations with some counters or indices, whose values are confined inside known ranges), while indexed addressing is needed in all loops that access arrays, i.e. extremely frequently from the point of view of the number of actually executed instructions.
There are absolutely no reasons for omitting these essential features, because their hardware implementation is many times simpler and cheaper than either the software workarounds for their absence or than the hardware workarounds that can be implemented in other parts of the CPU core, e.g. instruction fusion.
6502 is much more similar to a normal CPU than RISC-V, because it has both integer overflow detection and indexed addressing.
While I believe that other ISAs are better than 6502 for learning assembly language for the first time, I consider 6502 as a much better choice than RISC-V.
RISC-V could be used for teaching if that would be done in the right way, i.e. by showing how to implement with the existing RISC-V instructions everything that is missing in RISC-V. In that case the students would still learn how to write real programs instead of toy didactic programs.
However I have not seen any source teaching RISC-V in this way and it would be tedious for newbies, in the same way as if they were first taught how to implement a floating-point library on a CPU without floating-point instructions, instead of being allowed to use the floating-point instructions that any CPU has today.
So it's on par for unsigned, and takes two additional independent instructions for signed 64-bit and one for signed 32-bit.
For teaching, using unsigned XLEN-bit values by default is probably a good idea anyway.
> indexed addressing
I'm not sure how much this actually matters in practice.
It's nice when you access multiple arrays at the same index, such that you only need to implement one index instead of every pointer.
Such loops are often vectorized, and the indexed loads become useless, once you read two values from an array index, e.g. an array of structs.
Edit: removed measurements, because I'm not sure they are correct, might add back later.
The cost of providing carry and overflow is absolutely negligible in any CPU and even more so in an OoO CPU, which is many times more complex.
If you mean that if the flags are not stored in a general-purpose register, which is a possible choice, but it requires an extra register file write port, but in a dedicated flags register, like in x86 or ARM, then the flags register must also be renamed to allow concurrent operations, like any other register, this is a minor complication over having register renaming for all other registers.
What is extremely expensive is not having overflow and carry in hardware and having to implement software workarounds that require several times more instructions.
When loops are vectorized or you have an array of structures, this does not change anything, you still use the same indexed addressing (or auto-incremented addressing in ISAs that have it). Perhaps you think about scaled indexed addressing, which may not always work for an array of structures, but in such cases you just use simple indexed addressing, with the scale factor 1.
Without either indexed addressing or auto-incremented addressing you need an extra addition instruction for each memory access, which increases the code size and it limits the execution speed by occupying an extra execution port.
Because of this, the highest-performing RISC-V implementations have added non-standard ISA extensions for indexed addressing, but such extensions are still rather awkward because the base instruction encoding has not been thought for allowing indexed addressing, so the extensions must use a quite different encoding that must be squeezed in a limited encoding space.
- learn assembly as subroutines inside your "daily driver" language on your regular PC, which is going to be x64
- microcontroller with in-circuit debugger where you can connect it to flashing lights (obvious candidate is Arduino, which is AVR, or STM32 which is ARM)
Back in the day the first of these was easier in environments like BBC Basic inline assembler or MS-DOS COM files (which are headerless executables that can be extremely small.) You could also learn either of those in emulators.
One thing to consider is that there's a whole world of 8, 16, and 32-bit microcontrollers out there being used in new products every day. They are all readily programmable in assembly language, and if you have the basics down then the learning curve isn't terribly steep.
> Can anyone speak on how it is to move from an older assembly, to a modern CPU?
The modern CPU with orthogonal instruction set is just as easy (possibly easier because you have more and larger registers, making it easier to get ‘real’ results) until you start to look at performance.
Then, you hit the problem of having to understand how your target CPU does out-of-order execution, how large its caches are, how they are organized, etc.
Modern CPUs out there that have very little of that exist, though.
My first program was 6800 Machine Language. My second, was 6502 (VIC-20), my first ASM was 8085. With that, I actually wrote an embedded OS. The byte order seemed strange to me, after the Motorola chips. Also, the two-part memory addressing.
Eventually, I ended up doing ASM with 68000 (Mac Plus), but by then, I was well on my way with higher-level languages.
Starting from Machine Language helped me a lot in understanding some of the more fundamental concepts of software. It was also an excellent troubleshooting teacher, which has probably been the most valuable software development skill I have.
These days, though, with the current complexity of CPUs, I don’t think it would be reasonable to start folks off with assembly/machine languages, anymore. I’m not sure that “retro” CPUs would really represent some of the difficulties experienced with highly parallel processors. I would worry that it might encourage an “idealized” view of the underlying architecture.
In addition to be a good starting point for programming, the 8-bit micros are also a good way to begin learning about the hardware side of microprocessor systems, since the external stuff (memory chips, etc.) was correspondingly simpler as well. And without pipelining, you can form a more straightforward mental model of what the system is actually doing.
For me, it was Z80, just because Radio Shack had a book on it. I never actually built or programmed a Z80 system, but I still fall back on that knowledge when working with microcontrollers.
I think the Z-80 is sadly out of production at this point, but the eZ80 SoC is still available, so hopefully you can still build a hardware CP/M system!
Of course there are also synthesizable FPGA implementations, cycle-accurate software implementations, etc.
The hairy part the 6502 instruction set is Subtract With Carry, and the confusion about how the carry flag works as a result of subtracting or comparing.
SBC is implemented by adding the ones-complement (XOR FF) of the number or register. And the carry flag is backwards compared to other architectures, such as the Z80. On input, Carry Set means that you don't want an additional one subtracted, and Carry Clear means you do want an additional one subtracted. Then on output, Carry Set means that no borrow took place, and Carry Clear means that borrow took place. When borrow takes place, you get an additional one subtracted, and that lets you chain the low bytes up to the high bytes and subtract bigger numbers.
CMP is like SBC, except it always adds the extra one whether your input carry is set or not, making it act like you'd expect comparison to act.
Another way to put it is that in SBC context the carry provides for the bit required to turn ones' complement into two's complement (all bits flipped + 1). If it's clear, we fall short by one (so we remain in ones' complement, missing the "+1" part), which makes for a perfect implementation of a borrow. Since SBC is essentially ADC with the invers of the operand (ones' complement) this applies in general (not just for negative values).
(So, technically, the carry is like a ones' complement / two's complement switch. And this is also all we need to implement a borrow in two's complement math.)
Because there aren't any real-world implementations. Why bother with MMIX when you already know it'll never go beyond a toy "hello world" equivalent in an emulator?
MMIX might be good for educational use in an academic setting, but literally anything else would be better if you want to have fun working with it. Nothing beats the magic of seeing your first program run on a real computer.
Then you can move to typical RISC features like full pipeline, cache, etc...
Then you can move to actual hardware etc..
But I am probably bitter, I learned on a Heathkit ET 3400, and I find that a lot of the quirks of the 6800 are something I had to get around to understand modern processors.
MMIX may not be 'useful' for real applications, but being intentionally free of those quirks is very useful for understanding the core concepts, at least for the few people I have had try it.
I started by reading disassembled real-mode x86 code. Not bad, not great. The thing that took me the longest to “get” was how much was implicitly coded into each instruction. Once I figured out how the accumulator worked, plus the way the flags influenced jump instructions, I found over time that most other architectures are more similar than they are different.
The 6502 was a great CPU for its time and price point. I wrote many programs in its assembly language. However, if you're going to work on modern systems, there are too many differences for the 6502 to be a good first assembly language (unless the 6502 is your focus).
The 6502 was designed to be easily implemented with relatively few transistors. For that it was amazing. There is a reason it was popular! But its CPU registers are only 8-bit, its call stack is 256 bytes, and for real work you need to use the zero page (zpage) well. None of these are likely to relevant to a modern software developer using assembly. Its design encourages the use of global locations, again, an approach that don't make sense on modern systems.
I say this as someone who admires what the 6502 was able to achieve in its time, and I even have a long-running page dedicated to the 6502: https://dwheeler.com/6502/
If you want retro and easy, the 68000 has it beat in terms of simplicity of development. The 68K is quite regular & a little more like modern systems (sort of).
However, I think starting retro is often a disservice. Most the time the reason to use assembly is performance. If you're running code on chips more than a dollar or so, getting performance out of modern architectures is MUCH different than retro computers. For example, today a lot depends on memory cache latency, yet is typically not a serious concern on most retro computers. So learners will learn to focus on the wrong problems.
If you want a modern simple architecture, may I suggest RISC V? That's under a royalty-free open source license, and real computers use it today. It's pretty simple (your basic load-store architecture) and you can pick only the modules you care about. Full disclosure: I work for the Linux Foundation, but I'd say that anyway.
Plausible alternatives are ARM and x86. If you subset them, they're okay.
The reality is that assembly languages reflect the underlying system, and that will change over time.
I do not know. I learned it when I was 12, but it was quite challenging for me.
Pros:
It is very ortagonal, has no memory-to-memory instructions
Zero page is like a having 256 generic RISC registers (a bit faster than accessing to other memory areas).
Very easy to understand how the assembly is micro-coded
Cons:
It is impossible to create a "generic" memory pointer with an index >255 without auto-modifying code.
It is a true 8bit only system and cannot manipulate 16bit word easily
Steve Wozniac created a small VM just to manage 16 bit Integers (see the his Byte magazine articles).
Oh, nice. I wished for a similar addressing mode in the 1980s. I see they also added instructions to increment and decrement the accumulator as well.
(Actually, what I wished for was slightly different: only the high byte of the effective address would be fetched from zero page, the low byte being the Y register. In this way, one could keep both bytes of the pointer in zero page, LDY the low byte and use my zp indirect mode. The first use of a pointer would cost an extra instruction / 2 cycles, but further uses of the same pointer would be cheaper.)
Notably, the Motorola 6800 has decrement and increment for the accumulator, both of them. It also features an indexed address mode that is somewhat reverse to how the 6502 does it: there's only one index register (X), but this is of 16-bit length and a single-byte operand is added to this. So, while not exactly a zeropage address, with a bit of courage for self-modification we get there…
The 6502 did intentionally away with quite a number of features, but, as it had since become evident that it was a computer CPU, WDC added (back) for the CMOS version some of what was kind of missing for this application.
(The thing I'm probably missing most on the 6502 is a sequential-shift or barrel-shifter, shifting/rotating by multiple bit positions at once. This would be so great for games, graphics, encoding and decoding… Sophie Wilson had the wisdom to opt this into the ARM architecture.)
The S in RISC does not stand for simple. What the? If someone's going to write about assembly programming, they should at least know what CISC and RISK are.
I'm not saying 6502 assembly can't be an interesting endeavor, but it's a bit restrictive for a first language. Maybe if you already know some low-level programming like C. But it's still easier to do things with a few more registers. And it's not like you have to use every register and instruction available on the CPU to write some assembly.
It's also a fun one to write an emulator for. There are plenty of example programs out there to test, and they draw graphics by writing bytes to a frame buffer at some offset in memory. You can read from that region in memory, interpret each byte as a color from a preset palette, and use it to display visual output / graphics.
I wrote one in Swift a few years back, and then ended up developing it further into an NES emulator capable of playing Donkey Kong. It was a great learning experience.
In university we we're taught a version of MIPS that was implemented into a decent simulator with an IDE showing the state of memory and registers with fine grained debug stepping. Some quirks of processor pipelines (i.e. nops after branch) could be enabled for realism or turned off to make things easier for new students.
That was pretty great for learning assembly, I can't comment on any other approach - no doubt there are other good options.
Sounds like CTU FEE in Prague. I learned a bunch about registers, branch prediction and cache, but most of the assembly went out of my head once the class was over.
6502 assembly was fairly easy to learn, but it didn't translate to being a productive assembly programmer on the Apple //e I had. In particular, I wanted to do fast 2D and 3D graphics (basically what Ultima III did) to write my own games. My goal was simply to implement SetPixel(x, y, c) which turns out to be tricky because the Apple screen was 280 x 192 so you had to deal with 16-bit math. See https://nicole.express/2024/phasing-in-and-out-of-existence.... for the weird nature of the Apple HIRES graphics mode, and https://github.com/fadden/fdraw/blob/master/docs/manual.md#n... for a project that implements fast drawing on the Apple today.
What is a "good" assembly language? The best I've ever seen was the VAX instruction set. The second best, probably m68k. These were big, CISC CPUs that were as nice as possible to program in assembly language.
The 6502 wasn't. Everything about the 6502 was "do more with less" and you had to be really clever to eke out performance on it. But that's a good thing! A clean, big CISC architecture like the former ones is also a good compiler target. The 6502 was deeply weird, but programming it in assembly was necessary to get it to fly. That's an era. If you want to relive that era, then go for it. It's much more quirkily fun that, say, the 8080 (the Z80 is another beast, though backwards compatible) and much more mainstream than the otherwise better 6809, all of which are also from the "assembly language required for performance" era.
Arguably PDP11 is probably the best. Dates back to the assembly language programming era, but nice clean instruction set. But what are you going to run your PDP11 code on? A simulator? Sure. But no real, consumer facing machines with graphics chips.
If you like the PDP11 instruction set, then you'll probably enjoy 68000 as well. It drew a lot of inspiration from the former. You lose the "deferred" addressing modes (though the 68020 adds something similar), but gain access to a few other useful ones. Conveniently it was used in a lot of consumer devices (Mac, Amiga, Sega Genesis, etc.)
I program in 6502 assembly all day. It's certainly lovely as beginning assembly languages go, though it can also be limiting. The addressing modes encourage using the "Zero Page" (the memory from $0000 - $00FF) as something of an extended register set, which basically means you can have up to 128 pointers if you really want them, and lots of quick-access scratch space. If you're used to other ISAs with lots of in-CPU registers, the first thing to do is allocate about 16 bytes of "scratch" space on zeropage and use those instead. I call mine R0 - R15. Once that clicks, the rest falls into place.
Problems with the ISA become obvious the moment you want to scale your program beyond toy complexity. The stack is absurdly tiny, and trying to use it as a calling convention quickly becomes both a performance and a nesting depth bottleneck. I work around this for my larger projects by using zeropage scratch as a sortof manually-managed local scope for routines, but it requires carefully tracking which otherwise global bytes my programs are using at various nesting levels. Recursion is off the table both with this technique (difficult to manage) and with the stack (256 bytes will be exhausted very quickly). The C compilers I'm aware of work around this (slowly!) by using a separate software stack for arguments to functions.
It's pretty fun though! I'm working on a full modern indie game for the NES, which is 100% programmed in 6502 assembly for performance reasons. The existing C toolchains for the NES are either very slow (cc65) or not yet mature (llvm-mos). Most of the true complexity in the project is algorithmic stuff, like resource allocation and deciding how much work to do on a given frame. The 6502 ISA is a bit slower to write than higher level code, but it's not really a major limiter now that I've got my process down.
I got into 6502 assembly when trying to do bit manipulations on an Apple IIc back in the day.
The problem was that Applesoft basic set the high bit on data going out a serial or parallel port which was a problem for printing and sending data out through a modem. So if you wanted to keep the high bit one had to use 6502 assembly.
It's down to either the KIM-1 or the JOLT. The JOLT was advertised in December 1975 ( http://retro.hansotten.nl/6502-sbc/jolt-and-super-jolt/jolt/ ), compared with the 6502 which was being sampled around June or July. The KIM-1 was variously noted as early as winter 1975 but some sources say it wasn't available until 1976, but then the JOLT probably didn't get into people's hands until then either. I think it's logical that MOS would have been the first one to use their own chip but that's not a given. I have a ROR-less KIM-1, though, so there are systems dating from that early.
According to Wikipedia, the Microcomputer Associates Inc. (MAI) JOLT came out in July 1975, the Apple 1 came out July 1976, and the KIM-1 came out in 1976 but no month is specified.
Also, the JOLT was used in the Atari VCS prototype.
MAI was later acquired by Synertek and the company was rebranded as Synertek Systems and then created the SYM-1, a 6502-based SBC, in 1976.
IDK - I guess you are mentally modelling it wrong. Think of the zero page as a bunch of registers - that's kind of how I modelled it and it was also the hump to get over that helped someone building a 6502 compiler backend.
See the slide on imaginary registers.
https://llvm.org/devmtg/2022-05/slides/2022EuroLLVM-LLVM-MOS...
But all that being said, 68000 or modern ARM or RISCV are also good targets. I am probably just a little nostalgic for the 6502.
If you could so something like LD[$0f] #65 - then yeah, you have registers. But you can't do that sort of thing with zero page. You'd still need to do:
LDA #65
STA $0f
Sure, but the point is that you can't treat a zero page address like a register. There's a LDA, LDX, LDY, but no way to load directly to a zero page address. You have to load a real register, then store it in the ZP address.
Zero page isn't a bunch of registers because register operations by and large don't work on them directly.
I'm not trying to convince you that zero page are hardware registers just that they can be thought of as register-like. If you were coding on a 68000 you might do some memory operation indirect via an address register like (a0)++ but on 6502 you do it via a zp pointer with the ++ handled by incrementing Y. You can have many of these pointers. They fufill what you might use registers for on another CPU. So no, they are not registers but they are register-like in the way that can be used as pointers, holding constants, temporaries and have the advantage of a cycle faster than regular memory access. I notice that some people who programmed the Z80 struggle with this way of looking at the 6502 which can be as efficient as the Z80 if coded properly.
From you POV this is probably a specious argument since there is a very definitive view of what a register is in a CPU and that because of the Church-Turing a 6502 can pretend to be anything including a 68000 or RISC-V - but all I will say is that it's still a worthwhile abstraction to hold when you're designing code for the 6502.
Going off track, a further abstraction that I find amusing is to consider the zero page a register file and 6502 assembly microcode. For sure you could then come up with your own ISA for the 6502 that would be nicer to program and more directly amenable to compilers. Steve Wozniak's Sweet16 was fun virtual 16 bit CPU that ran on the 6502.
I have mixed feelings here.
On the one hand yes. It’s a good introduction because the simple patterns and instructions mean you can only do X a specific way. But when you start to write real programs, especially from a mindset of a programmer from the post 2010 times its starts to feel weird because the patterns become harder to grasp. So I‘m not 100% sure myself here. I love the 6502 and the instruction set. But on the other hand I feel that basic c and unoptimized assembly give you a more real live introduction to assembly? The issue is that it’s hard to judge for someone who already got introduced and grasps the concepts.
I think that the first question to ask oneself is "why do you want to learn assembly language?"
If you want to learn to write programs in assembly language or to debug disassembly of programs written in higher languages, then learn the language for the platform that you want to target. This also covers the "I want to write/understand retro stuff" case.
If you want to understand how high-level programs are compiled into assembly language, then read Aho[1].
If you want to understand how algorithms are implemented in assembly, read Knuth[2]. Knuth invented a hypothetical computer "MIX" with its own assembly level instruction set and I believe that MIX has actually been implemented in software several times.
If you just want to play a little bit with assembly, "dipping your toes in the water", then I agree with the OP that 6502 is simple and easy to learn, but it is not practical for many modern use cases and has sharp edges.
I would recommend against learning MIX, and instead learn the successor, MMIX. MIX was designed in the 1960s, and as a result includes things that simply don't make sense today. For example, the number of bits in a byte is not specified. In fact, MIX is agnostic to whether or not it is a binary computer or a decimal computer! I believe for binary machines MIX says there are 6 bits in a byte, and for decimal machines there are two digits in a byte. Words are not uniform: a word consists of 5 "bytes" and a single bit, used to indicate the sign of the number. MIX has no stack manipulation instructions, and subroutines are implemented using self-modifying code. As in, in order to call a sub-routine you take your return instruction point and write it to a jump instruction in the sub-routine so it will jump back to where you want.
Back in the 70s I was fully aware that learning computers was the key to the universe. And if you wanted to learn how computers worked, you have to learn assembler. And if you want to be more than a beta programmer, you had to learn assembler.
> Knuth invented a hypothetical computer "MIX" with its own assembly level instruction set
I never bothered with that. Why, when real computers are available? MIX is a waste of time.
> it is not practical for many modern use cases and has sharp edges
All CPUs have their sharp edges. Like learning to use a milling machine, you don't want to drive the cutter into the vise. Sure, the 6502 doesn't have a divide instruction, but writing a divide routine is a good start to learning how to make the 6502 dance.
Strong disagree. The RV32E subset of the RISC-V spec is even simpler than 6502 and has good modern support in cheap microcontrollers (CH32V003 etcetera) and widely available simulators.
But mnemonics are arbitrary. Are BCC, BCS, BEQ, BMI, BNE, BPL, BVC, BVS really different instructions, or just one instruction with a field specifying the condition.
If you look at 8080 and Z80 assembly languages you see the same binary instructions represented with different mnemonics and some things that have different mnemonics in one are the same mnemonic with different arguments in the other.
6502, original, has 151 of the 256 eight bit opcodes used, but when you add in the immediate data or address it has millions of possible instructions:
with ABS ABS,X ABS,Y modes: ADC, AND, CMP, EOR, LDA, ORA, SBC, STA
Maybe it's just me, but I consider data "data" and addresses "addresses" rather than instructions. I do understand your point, and I'm fine with saying there are 151 opcodes/instructions.
Do you consider binary code patterns that do exactly the same thing except for using different register(s) to be the same or different instructions?
For example, on RISC-V `sub Rd,Rs1,Rs2` has no immediate data or addresses but can have 32768 different results on the machine state (less the 1024 with Rd = x0, and I guess the same number with Rs1 = Rs2)
6502 was my fist assembly language, it’s good from the point of view of learning basics like carry flags and how to do multi byte maths, that’s about all.
If I was to think of the most approachable assembler it would be the ARM assembler for the original ARM2. A real thing of beauty. Which is not what anyone would normally say about any assembly language! I remember going from ARM to MIPS and being utterly horrified by its ugliness compared to ARM!
Biased here (it was my first language period), but I agree. Simple addressing modes. Getting started with x86 seems like it would warp the thinking a little bit, you can always pick its peculiarities up later.
Having worked with the 6809 series as well, it wouldn't be a bad choice either. The Freescale 68HC12 is still in production.
Little systems with limited-yet-real I/O give the programmer very useful feedback.
The 6809 seems like a more powerful (16-bit accumulator and index registers, multiply instruction) and orthogonal 6502 (itself a cheaper/simpler 6800) with fewer (possibly charming) quirks?
The 6502 won in the marketplace though, apparently due to its lower cost (the 6809 had nearly triple the transistors.)
It's nice that it's still in production if you want to build a breadboard system.
Indeed the 6809 has several 16 bit stack registers as well as 16 bit X and Y index registers which mean you can use proper structures and it even has a decent mul instruction.
I wrote some 6502 as my final project for school (An NES game, so a specific environment) and I found it fun and extremely challenging. I'm sure there is a lot to be said for writing ASM in a modern context, but I think it's a good exercise in learning. One of the benefits you get out of working with an 8-bit processor is dealing with data bigger than 8-bits.
I learned some 6502 last year in service of NES ROM Hacking. If you're interested in doing the same, NesHacker's Final Fantasy guide was instrumental in getting me started: https://www.youtube.com/watch?v=HLzEhyvHBos
This discussion contains well argued points about the 6502 vs other instruction sets. Does anyone know of resources/books that I could read that would help me have a more informed opinion on the pros/cons of each instruction set?
My take is that any instruction set that has "centralized" instruction flags is an obsolete design, as it creates dependencies between instructions that could be otherwise executed out of order. So neither 6502 or 68000 is ideal.
6502 is a bit limited, so you'd have to resort to self-modifying code to get things done — and self-modifying code is a big no-no on any processor with cache, and therefore not a good habit to acquire.
For teaching, I would instead go with RISC-V. In particular RISC-V with the bitmanip extensions, which make some things much easier and gives it feature-parity with the base set on x86 and ARM.
The Raspberry Pico 2 has two RISC-V CPUs with those. The board is easy to use, widely available and has a lot of support. There are also alternative boards with the same MCU but other form factors and features.
When I learned assembly at university, we were required to learn Intel 8086. I remember hating every second of it, because it was so useless.
Around the same timey I had to learn AVR8 family assembly (for ATMega328P) which was a joy in comparison.
I don't think AVR assembly is any better or easier then 16bit Intel, but just the fact that it's a usable and actually triable thing makes a world of difference.
I guess what i'm trying to say is that you shouldn't underestimate the importance of learning something useful instead of something antiquated, even when it is not the simplest thing to do.
6502 is a good start. But IMO, ARM assembly is where it’s at as a starting point. Much more useful and it spans so many vendors and classes of devices. It’s not harder than 6502 IMO.
The best beginner assembly hands down is Z80. It may be a bit harder to start with, but if I could write games in it at 14 years old, an adult should have no issue at all.
The important thing to consider is WHY you want to learn assembly language.
I learned M68K in a college class in 1995. When I saw x86 asm 2 years later it looked like a mess. But I've never needed to write asm for a job.
I'm thinking about learning 6502 because my first computer (Atari 800) and first dedicated game console (original Nintendo NES) both used the 6502 and it would be fun to write games for them when I retire.
What about the modern 68K subset still manufactured as Coldfire? Much more orthogonal, modern (as of the 1980s) architecture.
On the other hand, the cleaner, more modern architectures are easily targeted by compilers. If you really want to live the "assembly language required for performance" era, the 6502 is fine. Equally quirky is MCS-51. I've written nontrivial assembly language projects for both "back in the day".
This is like making a big deal of the flaws of the first automobile. OF COURSE, there are more appropriate ways to teach ASM today .... its been 50 years. As designer of the first sets of computer cards (yes, way before Apple) ... the Jolt, Super Jolt, SYM-1 we sold over 75K units mostly used to teach early college students to program.
Ray Holt
FirstMicroprocessor. com
I learned 6502 as a teenager in the 70s (after learning 6800 first). When 68k came along, I immediately recognized it was a superior architecture.
Nevertheless, when I went to teach a short intro to CPU class to high school students recently, I chose the venerable 6502 for programming. Why? Because performing the 8-bit arithmetic and performing hand assembly are all very manageable on the 6502. Any 8-bit CPU would do, frankly, but the connection between 6502 and early microcomputer history is intriguing (and I was familiar with it).
Everyone here is talking about how there are better assembly languages to learn first, but I wonder how many of them are practical for hand assembly. I still maintain that learning (and debugging software for) any CPU architecture beyond the 6502 was easy because of the skills I learned as a teenager, hand assembling 6502 code. That experience put me miles ahead of my colleagues who didn't have that experience when it came to working with low-level coding in the decades to follow.
I first went deep on programming learning Z80 assembly for my TI-83. It has a lot of idiosyncracies, such as 8-bit registers that can be combined for 16-bit operations. But it was a lot of fun learning from tutorials and from reading other people's code. It put me way ahead when I got to college, as I had already learned a lot about computer architecture.
I always advocate for people to learn 6502 because it's easy and small. Not because they are going to write assembly but because they should have a basic idea of what their code is going to turn into. It doesn't matter if someone writes in Rust, Java, .Net, Python, Lua or C, it's all mov and cmp at the end of the day.
Having programmed the 6502 and struggled with 8-bit arithmetic, I can only say that I don't believe it is a good first language at all. 32-bit ARM2 was so much more intuitive, where each and every instruction could be conditionally executed and there were 12? general purpose 32 bit registers in user mode.
6502 was my first assembly, though we didn't have an assembler - we entered raw opcodes into the Epyx FastLoad monitor on the C64. You kids with your symbol tables and your fancy mnemonics.
a9 01 8d 20 d0 a9 93 20 d2 ff ...
When we got a KIM-1, we were able to write meaningful programs on it in a weekend, since we already had memorized all the hex.
Funny. I still remember it $FDED on the Apple // series. But printing text out like that was really slow. For fast text printing you had to write to memory locations between $400-$7FF and deal with the non continuous memory block -> screen location mapping.
It was even more of a pain with 80 columns when half of the text was in the main memory and the other half was the bank switched memory.
Ouch. That sounds painful. Yeah, I usually ended up writing to screen RAM at $400. It was convenient to let the kernal manage cursor position and all that, though.
Yeah, in 40 column mode, each row was separate by 320 bytes. In 80 column mode, each even character on a line was in main memory and each odd column was in the second 64K block in extended memory.
I agree with the article. The 6502 was the first CPU I wrote a lot of machine code for both on the Acorn Atom, where the Basic had a buildin assembler, and the Atari XL-800, for which some roommates and I programmed a compiler in Basic and next made it self hosted.
i have to agree with a lot of sibling comments: i don't think 6502 is a good pick. first, it's highly atypical and limited. second, it's borderline RISC and i'd start with a simpler to use CISC ISA. CISC makes sense and was designed exactly for hand-written assembler.
VAX or 68K would be cleaner CISC ISA to learn first.
8086 (16 bit) x86 has the advantage of being ubiquitous and you can run on "hardware" everywhere. but the disadvantage of being a little weird wrt. segment registers. x86 32-bit is complicated, but at least flat memory space and you can mostly ignore the segments.
MIPS or one of the RISC-V variants is a second one to learn to contrast RISC
load/store with CSIC.
Both reasons apply. There's probably orders of magnitude more learning material for the 6502, and the 65816 is strictly more complicated than the 6502 because it has the entire feature set of a 6502 plus modifications like register size flags and new addressing modes.
The 65816 is a pain to code for. It has what's basically a segmented memory architecture with only two segment registers: D (data bank) and P (program bank).
You can't just make a 24-bit pointer w/o doing things using the direct page, which is admittedly a bigger problem for a C compiler than an assembler.
You always have to be able to know what mode the 'A' register is in (8/16) as well as the index registers. These are separate switches. Even disassembling code on the '816 is tainted by this, because for any section of code you look at, you have to know what mode the thing is in to accurately disassemble the code.
So basically, it's a 16-bit processor, sometimes which is a bit maddening.
The accumulator A is your mouth with 8 teeth, that can bite and chew and taste, the status register is your tongue, and the X and Y registers are your hands, that can help you shove stuff into your mouth and spit it out.
> modern RISC architectures such as ARM, MIPS or RISC-V. At some point, a serious assembly programmer should definitely learn some of them.
Yes.
> However, they are not ideal to start with: the “S” in RISC stands for “simple”, but the simplicity is more about internal implementation of the chips than the instruction set.
That is in fact a much better description of the 6502, which is not even a RISC but rather something almost prehistoric, where you figure out how much circuitry you could fit on a chip (or board) and then figured out what instruction set you could fit to that hardware.
Note that 6502 was the first machine code I learned as a 17 year old in 1980, after I got bored with BASIC after two days. I learned z80 on a ZX81 the year after and PDP-11 the same year. VAX in 1982, 6809 in 1983, and 68000 in 1984.
> Modern microprocessors are almost exclusively programmed with high-level languages and the direct usage of assembly instruction is not high on the list of priorities for CPU designers nowadays.
That's just absolutely not true.
They might be in some way "harder" than a CISC instruction set such as VAX but any of the above are far simpler to program than 6502, especially the base RISC-V RV32I instruction set.
> To illustrate this point, loading a 64-bit constant to a register on ARM64 can take 4 instructions with bit shifting.
Whereas you can't do it at all on a 6502! Loading a 64 bit constant into 8 zero page locations takes 16 instructions and 32 bytes of code on a 6502.
Let's not even talk about adding or comparing or multiplying two such values.
The GNU assembler for ARM64 doesn't help you out, but on RISC-V you can just type ...
li x10,0xfedcba9876543210
... and the assembler will generate the multiple instructions for you.
are affordable and have good tooling. Memory capacity is tiny but it is fast, andthere is a lot of I/O, you can even write console programs that use the serial ports or better, programs that do physical things and also communicate over the serial ports -- in some ways AVR-8 boards are more capable than conventional computers, phones, etc. So far as I can tell, AVR-8 was the last 8-bit architecture, if it has a real weakness it is that, compared to other microcontrollers, it's a dead end because you're going to upgrade to ESP-32 or ARM if you need more.
The 6052 has more registers than the PDP-8 but just barely. Writing compilers for the 6052 is difficult because there just aren't enough registers not to mind a limited set of addressing modes. If you want to do anything interesting with the 6502 you are likely to use virtual machine [1] techniques such as Wozniak's SWEET16 or the atrocious UCSD p-System [3] or Infocom's Z Machine [4]. That kind of stuff is really fun, but so is writing straightforward AVR-8 code.
I have fond memories of the 6809 which supported really advanced software engineering techniques at the time [5] but if I had to pick a classic 8-bit architecture that is easy to find hardware for today it would have to be the Z-80, or the successor eZ80, which I think the only 24 bit micro that didn't suck (had real 24 bit index registers!) Boards are available [6]
6809 is a good point. It's like let's split the difference between 6502 and 68000! I started with the 6502 and graduated to 68000 on ST and Amiga and Sega Genesis. Definitely two of my all time favourites. Only recently have I started to try my hand at 6809 and I can see how you reccomend it, it's a very nice in-between from the 6502 to 68000.
It's hard to fathom the 6502 came out in '75, the 6809 in '78 and the 68000 in '79
The 6502 was the first cpu that I programmed in machine code - on my Vic-20 and BBC Micros at school in the 80s. Later I wrote some code for the 6809. And later still I used 68000 dev boards as an undergraduate.
Its been a while now, but I remember feeling that the 68k was clearly powerful and elegant and a proper cpu, but also that I had little hope of understanding all the details. The 6502 was my first, and with hindsight was far less elegant with plenty of warts, but I felt at the time that I understood it pretty well. The 6802 was somewhere inbetween, I guess, and a very nice cpu to work with
I've never thought about it before, but I suppose I benefited from that progression 6502->6809->68k. Whether or not it makes sense now, I couldn't say.
(Never got on with the Z80. That brash upstart /s)
The 68000 has more registers and wider data types: but the registers are all uniformly the same. It's really just two registers A and D, copied a bunch of times in to D0 to D7, and A0 to A7. Whatever you can do with D0 can be done with D3 and so on. Some A's are dedicated, like for a stack pointer.
Simplicity of structure has to be balanced with simplicity of programming.
It's not that easy to program the 6502, because in moderately complex programs, you're constantly running into its limitations.
The best way to learn how to hack around the limitations of a tiny machine is to completely ignore them and become a seasoned software engineer. A seasoned engineer will read the instruction set manual in one setting, and other pertinent documents, and the clever hacks will start brewing in their head.
You don't have to start with tiny systems to get this skill. When you're new, you don't have the maturity for that; get something that's a bit easier to program with more addressing modes and easier handling of larger arrays, more registers, wider integers.