Hacker News new | past | comments | ask | show | jobs | submit login
The 6502 CPU's overflow flag explained at the silicon level (arcfn.com)
169 points by unwind on Jan 15, 2013 | hide | past | favorite | 33 comments

I loved that chip!

My first real programming job entailed writing a blazingly macro assembler from scratch for the 6502, using a much slower (and non-macro) assembler from the vendor (Ohio Scientific, for those of a certain age).

I simulated the proposed hashing algorithm in FORTRAN at the community college I was attending, and found that it led to a lot of collisions for the base 6502 opcode set. When I pointed this out to my manager (who's still my friend 32+ years on), he made sure that I got my first raise.

I still have those listings and the original design documents around somewhere.

Very interesting

And this is 30 year old tech. The 6502, the processor in you cell phone is about 1000x faster than it (and much more capable)

To me the hardest part (apparently) is converting the electronic circuit to the actual chip drawings. Not sure how this is done (how do you route it). And this was done by hand in the 6502, the drawings were done the size of a desk and reduced photographically. (IIRC)

Almost 38 years, even... If you're interested in the story around MOS and the 6502, the early parts of "Commodore: A Company on the Edge" by Brian Bagnall covers it quite a bit including parts about the manual layout work that Bill Mensch did.

Here's an article about it too: http://research.swtch.com/6502

Not only did they route / layout by hand-drawing, and then cut the Rubylith photomask by hand (it's not a drawing they use to reduce it photographically - they cut out holes..), but Bill Mensch got it right the first time. 3,510 transistors according to the article.

Bill Mensch left MOS to found Western Design Centre quite early, and so is covered less in that book than Chuck Peddle which was instrumental in Commodore for a long time after they acquired MOS. The amazing thing is WDC still sell variations of the 6502 design, including various replacements such as a 16 bit version.

Peddle (the other main designer) and Mensch deserve a lot more attention than they've gotten.

(EDIT: From WDC's website: "Annual volumes in the hundreds (100’s) of millions of units keep adding in a significant way to the estimated shipped volumes of five (5) to ten (10) billion units. " Yikes...)

I'm glad to see the 6502 getting so much attention on HN. I like the article title Unwind used here, so I've changed the article's original title to match. (Note to self: try to come up with better titles.)

I've seen that story before that the 6502 worked perfectly the first time, but I think there's some mythologizing going on. The ROR instruction was totally broken on the first release of the 6502 and wasn't fixed until months later. [1]

The quote above says that the 6502 has 3510 transistors, and this number appears many other places. It turns out that the 6502 has 3510 enhancement transistors and 1018 depletion transistors, for a total of 4528 transistors, according to the visual6502 analysis.

And if you're interested in the inner workings of the 6502, you should definitely check out the huge transistor-level schematic at: http://www.downloads.reactivemicro.com/Public/Electronics/CP...

[1] http://en.wikipedia.org/wiki/6502#Bugs_and_quirks and details at http://www.pagetable.com/?p=406

Have you checked out the documentary that Jason Scott is working on? He ran a Kickstarter to fund the development of three films. One of them is specifically about the 6502:


I suspect that the "worked perfectly the first time" basically implies (whether that is accurate or not) "they only had to create one mask before their tests passed", not that it was entirely bug free. Possibly that the mask faithfully replicated the design, but that the revision of the design they first manufactured still had bugs.

In any case, the main thing is that a lot of MOS' early success came from saving a tremendous amount of money being able to get to market quickly and cheaply compared to the competition thanks to actually having guys like Mensch and Peddle coupled with superior process (though that did not last all that long).

You might also want to check out someone building a computer based off a 6502, very fun to see someone going through the steps to design and build something out of older hardware.


Part of the appeal to me of this hardware is that it is so simple that even without much electronics experience you can get away with a lot more trial and error while learning.

I did all kinds of crazy things with my C64 and got away with it without breaking stuff, like powering it off batteries (the C64 takes multiple different voltages in, but you can get away with just one - don't remember if it was 5v or 9v - just that some things like the user port and realtime clock won't work) and attaching leds and relays to the user port without a clue what I was doing, replacing the IO chips (CIA) with a different version from my 1541 disk drive when one of the ones in the C64 broke (standard troubleshooting: If the CIA chips were hot right after turning the machine on, they had short-circuited - they were the source of breakage on C64's and Amiga's...), and at a later point with one from my Amiga (they're all pin compatible, but some functionality is different, e.g. the Amiga version has a 32 bit timer instead of the realtime clock nobody used...), or replacing the 6510 in my C64 with a 6502 from a 1541 just to see what would happen (the 6510 has 8 general purpose IO lines, mapped to the tape drive IO and I think bank switching - I believe the machine will still start but...)

On my Amiga I at one point made a pause switch by soldering stuff straight onto a pin on the CPU...

You can watch it work here! (warning: epic javascript) http://www.visual6502.org/JSSim/expert.html

Great tale regarding the history of it was posted sometime ago on HN:


How did you come up with 1000x? With Moore's law we are only 3 orders of magnitude better than a processor from the late 1970s?

As others have pointed modern processors are three orders of magnitude faster in terms of clockspeed than processors from back then. There's also the fact that modern processors do a lot more per clock than this guy: using 64-bit wide datapaths, and executing multiple instructions every clock cycle rather than taking multiple clock cycles to execute an instruction, and having multiple independent cores. One rule of thumb is that your performance tends to increase with the square root of the number of transistors you use so you'd expect another 3 orders of magnitude increase in performance from the 6 orders of magnitude more transistors, for a modern chip being 6 orders of magnitude faster at executing some algorithm overall (if there's a normal amount of parallelism to extract).

Now, normally you have to worry about increasing clockspeeds having diminishing returns, since memory latency remains constant despite a faster CPU clock. But anything that could run on the amount of RAM the 6502 could handle would fit in a modern processor's L1 cache, and the scheduler is perfectly able to hide L1 latency so I think ignoring this factor is fair in this case.

The x1000 is a huge understatement. For example these days CPUs are much more optimal in terms of cycles per instruction and inversely instructions per cycle. Back then when multiplication of two word-sized(8 bits back then) values took 24 cycles, these days we can do that in 12 cycles for 64-bit values. Because of superscalar processing and thus instruction level parallelism, we can typically do 2-4 ALU operations in parallel(given that there's no data dependencies) and thus increase the instruction throughput 2-4 fold. Then, because of SIMD features and data level parallelism we can do same operaton on multiple data(say, operate on a vector of 4 elements in a single cycle) and thus we eliminate the need for repeated instructions.

This all gets a bit complicated in modern days because of memory access costs and caches which try to alleviate the costs, but the idea is that modern CPUs are likely to be around 10 times as fast per-clock as 6502 and because of multiple cores and threads that value goes to something like 40-60. Add the huge increase in clock speed and you're a bit south from x100_000 in optimal case.

I would hope every programmer would write some core on a C64 to really learn how much RAM the 64 KB really is. You can actually waste some of it and in some cases it really is "enough so that I don't have to optimize". :) Real hard-core people would go with VIC-20 which as only 5120 bytes of RAM, or Atari 2600 with 128 bytes of RAM. One could imagine there's nothing you can do with them but oh boy how wrong one would be! Heck, a single tweet is 140 characters. And you can fit that in 128 bytes. You really can... :)

There days we have kilobytes of RAM on Arduino and other AVR boards.

Moore's law is about the number of transistors. The 6502 had ~3500. Modern desktop CPUs have ~2,500,000,000 (see also http://en.wikipedia.org/wiki/File:Transistor_Count_and_Moore... ). So that's more like 1,000,000x (thus six orders of magnitude). Then again, the grandparent was talking about speed, and speed doesn't scale linearly with the number of transistors.

The 6502 ran at 1 MHz in the Apple ][. Modern processors run at 1 GHz (1000 MHz) and above. That's 1000x, which is 3 orders of magnitude.

Not to mention further performance advancements in processor design since then (pipelining, SIMD, etc...), further increasing throughput above the 1000x threshold. One should also consider the increases in word length, adding the ability to process more data in less time.

If you want to add 2 32-bit integers, on 6502 you'll need something like the following, assuming this is a 32-bit integer you're actively working with and are probably about to use again fairly soon:

    CLC                  ; 2
    LDA&70 ADC&74 STA&70 ; 3 3 3 = 9
    LDA&71 ADC&75 STA&71 ; 3 3 3 = 9
    LDA&72 ADC&76 STA&72 ; 3 3 3 = 9
    LDA&73 ADC&77 STA&73 ; 3 3 3 = 9
That's for a total of 38 cycles. So on the computer I started programming on, you could do ~52,000 32-bit adds per second.

By comparison, for a modern Pentium, according to Intel's docs, a 32-bit add (again, on data you're using) takes 1 cycle, end to end.

So on the laptop that's in front of me, which is a crap one, you could do 2,530,000,000 32-bit adds per second. A 48,000-fold performance increase. Maybe 96,000 times, if you have no dependency chain (ADD throughput is 2 per cycle).

This ignores the fact my modern computer has 2 cores.

And that's loading/storing to/from the zero page (the first 256 bytes of memory). Loading/storing from higher addresses requires 4 cycles.

But, "ADD ESI,EDX" is adding two registers isn't it? So I think you need to include the loading/storing of those registers back to memory for a more fair comparison.

I haven't touched 6502 assembly in over 20 years. Brings back memories. :-)

This is working data, so you'd keep it in a register if possible. Sadly that just happens not to be possible on the 6502 :)

Two bytes can be kept in the X and Y registers. Immediate load and add instructions only use two cycles.

    CLC                    ;         2
            ADC #b1 STA&70 ; 0 2 3 = 5
    LDA #a2 ADC #b2 TAY    ; 2 2 2 = 6
    LDA #a3 ADC #b3 TAX    ; 2 2 2 = 6
    LDA #a4 ADC #b4        ; 2 2   = 4
                           ; total  23 cycles

If you're adding constants, you might as well load each byte of the result directly, when you need it. (I can't tell where the LSB is coming from in this code - perhaps it isn't a constant? - this example doesn't resemble any code I've ever had to write.)

Perhaps the code is intended to be modified at runtime, but then you'd then still want one of the operands loaded from memory, I think (otherwise why not just precalculate the results?), and I've generally found the (fairly substantial) fixed expense not to be worth it anyway.

Anyway, overall I think you're being a bit unfair to the x86 with this comparison.

Yes, the comparison is relative, it may as well be 10000x better (in orders of magnitude)

You're even playing nice against the 6502, you're using a simple add, now compare with SIMD instructions

Yet it's a shame that we seem to piss that extra performance away instantly in software.

Things felt faster in the 80's than they did now, even doing the same tasks.

I recall waiting a couple of minutes for my computer to just boot in the 80s. When I want to use my phone, it becomes usable in well under a second.

Waiting a few seconds every time I hit save was fun. Didn't stop me from developing a ferocious ^S reflex. Fortunately, save is fast enough not to be noticeable these days, to the extent that it usually happens automatically now.

Watching a WYSIWYG font menu draw each individual entry was fun. We certainly don't get that pleasure now.

But yes, things certainly felt faster in the 80s.... /s

I don't know what you were using but I was booted and operational in under 3-4 seconds on everything I used in the 80s (BBC Master, Acorn A310)

My Apple IIGS could be "operational" pretty fast if all you were after was a BASIC prompt with no disk access. If you wanted a BASIC prompt with disk access, that took some seconds. If you want to actually load useful software, it took quite a while.

Those are (generally accepted to be) the same thing...

One order of magnitude: 10x

Two orders of magnitude: 100x

Three orders of magnitude: 1000x

In terms of clockspeed, the 6502 ran at 1 to 2 MHz. Today's processors are at most running at around 2 to 4 GHz, so in terms of "order of magnitude" 1000x is spot on. Of course, on a clock-for-clock basis modern architecures are a lot wider too, which will also account for better performance. But clockspeed is simple enough.

Wikipedia lists the 6502 as having between 1 MHz to 2 MHz and the Samsung Galaxy SII as having around 1.2 GHz.[0] I'm guessing that's where the 1000x comes from... Or, you know, it's just a nice big number. ;P

[0] Of course, that's ignoring multiple cores, better microcode, caches, etc.

The Apple IIc Plus had an accelerator running the 6502 at 4 MHz.

This is great. If you enjoyed it, you'll probably really like the (oft recommended here) "Code: The Hidden Language of Computer Hardware and Software" by Charles Petzold.


Also a great relevant read:

How MOS 6502 Illegal Opcodes really work http://www.pagetable.com/?p=39

I just finished a 6502 emulator, I recommend doing it for everyone. Lots of fun and very interesting.

Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact