This looks awesome. If people are interested in this and haven't checked it out yet, http://microcorruption.com is still running: 19 levels of security challenges on an emulated MSP430 board.
Your comment did it. I've been avoiding it, since I know I won't stop until I either finish all the levels or give up after spending too many hours, but I finally decided to give it a try. And it is amazing. As an emulator author myself ( http://www.ubercomp.com/jslm32/src/ ), I'm in awe of the care that was put into the CTF environment. Very nice.
To the people who are wondering if they'll be able to finish the CTF, here are my two cents: go ahead and try for a few hours. I'm sure you won't regret it. I haven't done assembly since the beginning of 2012, and I was able to do a couple of levels in less than an hour, and before I started I new absolutely nothing about MSP430 assembly. The tutorial/manual are so good that I believe even someone who doesn't know assembly at all might be able to finish at least some levels.
To the same group: I've finished 18/19 with no previous assembly experience, so it's certainly possible. If this seems interesting you should try it, even if you've never written or really read assembly code.
I love this! Would've been nice to have something like this when I was learning assembly the first time. After learning to develop with eclipse, sublime text, chrome debugger, etc, using something like QTSpim for assembly just fired off signals in my brain saying 'this is outdated, assembly is probably not what you want to learn'. But of course assembly is important to anyone who wants intimate knowledge of what's going on inside of the CPU when you run your programs. Great idea to put a modern, beautiful interface on top of it!
Given that this looks more like a microcontroller, I say AVR or 8051 would make a better "real thing" recommendation (YASP looks somewhat like a mix of AVR and 8051 to me - the upward-growing stack in particular, is not common and the first MCU to come to mind with one is the 8051.)
This is a project I'd definitely be interested in.
I am a webdev by day (Python/Django) but recently started learning x86 assembly at home (after which I can pick off from where I left in C at college).
Some of my colleagues think I'm crazy, but I find I'm learning a lot about how a computer really works (and I'm also beginning to understand why the great old 'real programmers' spent hours upon hours on the damn machine :-))
> I'd say learning assembly by starting with x86 is indeed a bit crazy.
I've heard this said before. Having had basic exposure to both MIPS and x86 assembly in school, I'm not really clear on WHY people say this. Assembly seems equally unforgiving and obnoxious either way.
My personal reason is because X86 just handles memory segmentation in an awkward manner.
Assembly is unforgiving, but some processors are gentler on the programmer than others. If I was trying to teach someone the basics of register indexing and indirect indexing, I'd rather do it on a 6809 than an x86.
You don't have to care about segmentation if your program is less than 64K (fits in a single segment), and 64K is far more than enough for doing a lot of interesting things in Asm. A basic "Hello world" that runs in DOS is less than 20 bytes.
(Contrast this with using a C compiler on Windows, where "Hello world" compiled with default settings is usually beyond 10KB already.)
Starting with the 8086, it has 14 registers. Four of which are actually eight more registers (AX is AH and AL; BX is BH and BL; CX is CH and CL; DX is DH and DL). Each of these registers has a special purpose (BX can be used to index memory, the rest can't; CX is used in some instructions as a count; AX is your accumulator, and DX is ... just there). There are four more special purpose index registers, SI, DI, BP and SP. SI and DI support post-index and post-decrement indexing mode, but you need to set a flag in another register (CF register) to indicate you want the post-increment mode OR the post-decrement mode. SP is the stack register, but you can't use it as an index register (you can starting with the 80386). You can use the BP register as an index register, but it always takes an offset (there is no special mode for a 0 index).
Then you get to the segment registers. As if things weren't bad enough. There are four, CS to point to code, DS and ES for data, and SS for the stack. You see, the 8086 can address 1MB of memory (20 bits), but it's a 16bit instruction set. So the segment registers contain not the upper 4-bits of the address, but the physical address divided by 16 (or, shifted right four bits). A physical address is formed by SEGMENTx16+OFFSET, giving you 20 bits of address. Most instructions and addressing modes use DS, except if you use BP, which defaults to SS, and the store-post-increment/decrement instructions, which must use ES:DI (DI is incremented---ES doesn't). You can override the segment register for most instructions, but not all (the string instructions, which give you the post-increment/decrement addressing modes are the exception).
And because of these segment registers, you can have 16 or 32 bit function pointers, 16 or 32 bit data pointers, and 16 or 32 bit stack pointers.
So what you have is an instruction set that is almost, but not entirely consistently inconsistent. The 80186 gives you a few more instructions. The 80286 adds protected mode (with four levels of protection) with a change of how the segment registers work (in protected mode---they're now indexes into a table of physical addresses for each segment, each of which can only be 64K in size), and the 80386, which extends all the registers to 32 bits in size (except for the segment registers---they're still only 16 bits long), adds new registers only available in the highest protection ring, and paged memory, in addition to keeping the segmented memory, plus allowing all the registers to more general purpose, in addition to keeping the old instruction set.
I'm really hard pressed to come up with a more convoluted architecture than the x86.
> I'm really hard pressed to come up with a more convoluted architecture than the x86.
At the risk of coming off a little more contrary than I intend: seriously, who cares? joezydeco was talking about teaching an assembly language with "training wheels." 8086 assembly without any doodads works for that. It's useless for normal people but no more useless than 6502, and the student will have development tools that are actually friendly to play with.
The nice thing about learning x86 assembly is that embodies the entire learning curve. If you want to inflict all the horrors of assembly language on yourself - a pointless pursuit for anyone not writing a compiler while simultaneously eating paste, but hey, whatever - x86 lets you start small and work your way up.
I'm just not clear on who needs elegant assembly language programming (outside of compiler writers, etc.). Doing it at all represents failure. The point of an assembly language class is to be very quick and make the student appreciate what C does for them, IMO.
These days, you are right, there isn't much use for assembly, other than compiler writers and the occasional kernel code. And most of what I described was the 8086, warts and all. By comparison, MIPS is relatively straightforward, but I see by your attitude towards assembly in general that you'd rather spend no time with assembly.
Fair enough, you can certainly go your entire career without resorting to assembly language (heck, the last time I got paid for programming in assembly was the early to mid 90s---I'm still amused by the "C is too low level for programming" arguments these days, when back in the early 90s, people were bitching about C being too high level and inefficient).
Thanks. Yeah, the pain points I remember from learning x86 asm have to do with screwing up dealing with both 16-bit and 32-bit ints and pointers in the same program. Those were probably pretty contrived assignments, and you're right, most of the x86 bizarreness is there even if you're just doing 8086 stuff.
I just don't know if it's wrong to jump in the deep end with x86. Obviously the argument can be made either way, and x86 is ugly, but I'm pretty prejudiced toward learning something that will potentially be useful right off the bat. That would make me lean towards x86 or ARM for teaching or for self study, I think, and I have no idea what the ARM tools look like right now.
One of my more masochistic long-term goals is to make a compiler (I've made an assembler of sorts) so I might be revisiting this question at some point. Joy.
Besides, I was answering the question about why learning the x86 was considered crazy, because compared to other CPUs, it is crazy (not that you care about that).
I've been working with x86 for a long time and it's not as bad as it seems once you see the patterns. The 16-bit memory addressing modes can basically be summed up as "one or more of {BP,BX}, {SI,DI}, {disp8,disp16}", and the fact that [BP] is actually [BP+0] is just an encoding detail; the assembler will handle it for you. What would be [BP] is where the [disp16] mode goes.
You don't have to worry about segmentation unless you're working with more than 64K of code or data, which is plenty when you're writing in Asm. And when you do, it's not hard to learn the few extra rules that come with that.
Protected mode and the 32-bit extensions are two separate (but related) topics, you don't need to learn the former to use the latter, and you can already do a lot without knowing either. Unfortunately the 64-bit extensions have gone in a very different direction, and you will need protected mode to use them. But for learning the basics of Asm, I don't think you need 64 bits.
The 8080/8085/Z80 are a little simpler, but learning them coudl be useful if studying x86 since they're its ancestors.
It's no harder (and in many ways easier) than learning the irregularities of English or any other language. One side-effect of learning x86 is that all the RISCs then become really boring and straightforward. x86 has character. :)
I got used to it. The 8086 was the second CPU I learned assembly for (the first was the 6909). While I never did like it as much as the 6809 or 68000, it still had its charms (and I liked it more than SPARC assembly).
Not him, but I'm guessing it's because SPARC looks a lot like the GAS/AT&T Intel syntax -- lots of 'noise' in the form of % and $ everywhere. It also has branch delay slots and not-so-intuitive operand ordering, register windows, and all the other RISC features that look like they discourage using Asm, as a big part of the philosophy was "use a compiler".
Thank you for taking the time to write that up. Great overview.
A nice tidy orthogonal instruction set is a joy to work with. The x86 ISA unfortunately isn't one of those by virtue of it's history in the marketplace.
Yesterday I was working with the built-in debugging and memory search tools in FCEUX to track down a memory location I wanted to script against(a game mode, for a Twitch Plays stream). There are some really great tools in emulators now that let you explore the code and memory in a more interactive fashion. I was even able to script my own tool to poll which code addresses accessed a specific memory location over a given timeframe.
It has an emulator, debugger and simulated hardware built-in. The dialect itself is very simple to learn, so it's perfect for teaching assembler. Also, compared to any desktop-tool, the setup-time is zero.