Hacker News new | past | comments | ask | show | jobs | submit login
Dirty tricks 6502 programmers use (nurpax.github.io)
305 points by nurpax 32 days ago | hide | past | web | favorite | 60 comments

Not to belittle the article, because it's definitely interesting. But as an ex-BBC 6502 programmer, my nitpick here would be that the title should really be named "Dirty tricks C64 6502 programmers use".

On the beeb we had our own set of tricks specific to the memory layout and ROM of our beloved beige and black machines.

title should really be named "Dirty tricks C64 6502 programmers use".

In that case, probably Dirty Tricks 6510 Programmers Use would be even better.

While that may be technically correct, the programmer-visibile difference between the MOS 6502 and the MOS 6510 is totally incidental in this case - the 6510 has a a built in 6/8-pin IO port (partially used for bank switching the ROMs in the C64). Unless you touch the IO ports, they should behave identically, down to the cycle timings of instructions and the same behavior of undocumented opcodes.

In this case, the real C64 specific tricks are not 6510 specific, but depending on the specific initialization done by the C64 ROM and calling C64 ROM routines.

A real "dirty trick" is reading the actual RAM in memory locations 0 and 1 on the 6510 (and not the I/O port values).

Yeah the moment they introduced a ROM call to optimize the routine, I kind of lost interest.

Why? The memory layout of the text page would have also been c64 specific.

Because the article was titled "dirty 6502 tricks" and not "dirty C64 tricks".

If the article was more about hidden opcodes or instruction set side effects, that's great. But if the key to shaving 8 bytes out of the routine was calling a 120-byte function that's free in the ROM, there's no real trick to 6502 programming being revealed.

It's the equivalent of a blog post titled "I wrote a web server in 3 lines of code!" and the author calls .start() in some imported framework that contains 250,000 lines of code.

True. But in his defense, once you want to put text on the screen, you’re already using implementation defined behavior. Like I said in another reply, the character to memory location mapping is different between for instance the C64 and the Apple //e.

The Apple //e would have had a text scrolling function in ROM.

It's a grid of 24 rows (or is it 25?) of 40 bytes, top to bottom, left to right, one byte per char... not exactly uncommon for character mapped displays.

That wasn’t how the Apple II stored characters. The memory didn’t map contiguously to screen positions where you could calculate the character position using $400+x+y*40.

>Entries were posted as Twitter replies and DMs, containing only the PRG byte-length and an MD5 hash of the PRG file.

This is clever. So basically rather than getting bogged down reviewing submissions you just pick a winner and then validate post-hoc! (because when you win the hash of your code has to match the one you submitted)

I wonder if you could brute force a particular solution with that information.

I would say no. 34 bytes is 256^32=7588550360256754183279148073529370729071901715047420004889892225542594864082845696 combinations, and even if you could easily narrow it down to only valid programs you would still need to simulate it, which is way slower than computing a hash.

Only a fraction of those would be reasonable programs, and you can test almost all of them immediately by computing a MD5 hash.

34 bytes is equivalent to bruteforcing a 272-bit key. It's already physically impossible to do that for a 256-bit key even if you ignore everything other than incrementing the key counter itself:


But as I said, you’re not brute forcing the entire key space because you likely have an idea of at least some of the bits.

Or, you could evaluate whether said program draws the crossed lines in an emulator. Might not take that much longer than calculating the hash... That makes this kind of an interesting "Genetic Programming" challenge... The solution space is "only" 29 bytes long...

I don't think that would be this quick either. Since the code might/will mess up zeropage and/or other dataareas in use by the C64 basic, you would have to wipe it to a known state for each test, which almost means "boot up the KERNAL and let it run complete INIT".

Not that even this have to take a long while on a 3GHz computer running full speed, but doing it 2^272 times ...

> you would have to wipe it to a known state for each test, which almost means "boot up the KERNAL and let it run complete INIT"

FWIW, if I were actually doing this I’d run it in a emulator and “save” the initial state to restore for the every test.

One way to work around that is to allow authors to include comments in the code that gets hashed.

(I'm not sure if this particular competition did that)

are you talking about padding it out to provoke a collision?

Remember, that at the end of the day, you still have to have actual working code …

No - what I mean is that if someone is worried that their code will be derived from their published hash by someone doing a brute-force attack, they can prevent (or make much harder) this brute-force attack by having comments in their source code.

I'm assuming here that the hashed string is the source code rather than the machine code.

Ahhh - a salt

I guess one should use something more expensive to compute, like a KDF.

What, if any, modern products still use the 6502?

EDIT: That’s not a dig against the 6502. I still fondly remember leaning BASIC in my C64 and wish now I had ventured into Assembly with it. By today’s standards it seems to have a simpler and more approachable instruction set so I’m wondering if there aren’t products I could hack on to learn Assembly with it. Or maybe I should just break out my old Commie.

You can still buy them at https://www.westerndesigncenter.com/wdc/chips.cfm / https://wdc65xx.com/chips/ ( https://en.wikipedia.org/wiki/WDC_65C02 )

From the wdc65xx link:

> The W65C02S is a low power 8–bit microprocessor utilized in a vast array of products for the Automotive, Consumer, Industrial, and Medical markets. This chip features a full external data (8–bit) and address (16–bit) bus for easy integration with 8–bit peripherals and memory.

Digging in the about page:

> Through the last 30+ years as one of the most popular microprocessor architectures of all time the 65xx brand is estimated to have over six billion embedded 65xx processors shipped and is growing by hundreds of millions of units per year, provided by WDC and its licensees. The following is a partial list of high volume applications that have been successful in using 65xx processors:

> ...

> · Toys

> · Automobile dashboard

> · Appliance controllers

> · Industrial controllers

> · Embedded heart defibrillator’s

> · Pacemakers

They are still found as embedded cores in computer mouses and keyboards, monitors (OSD processor/scaler), digital picture frames (https://spritesmods.com/?art=picframe&page=1), MP3 players, Furby (https://news.ycombinator.com/item?id=17751599 , actually a 6502-subset) etc.

C64 and many Atari, Apple, and Nintendo products used it, including arcade machines. People still make new stuff, with new interesting hacks, for all of them, every day. I suspect anything new wouldn't be as interesting as playing with the old systems because if anything needed more interesting features these days, they'd use something much more modern. Back then they had to squeeze all the "interesting" they could out of the 6502 specifically. You can easily play with it in the browser [1].

[1] https://8bitworkshop.com/v3.4.0/?=&file=examples%2Fbrickgame...

Specifically the 6502, or its successors like the 65C816?

The 65C816 is still being made, so someone must use it for something.

There is still a very active 6502 hacking forum at


(No SSL/TLS, unfortunately).

Lots of people build homebrew computers out of these things, there are a few open source OS and build chains available, etc.

I'm unsure but I think 6502's are available as cores for semi and full custom IC's. Where the the processor core and memory is fully laidout. Bonus runs with a GHZ clock which gives you the ability to twiddle bits like mad.

So outside of retro computing you won't see a 6502 IC in the wild. But they likely are buried deep in nondescript IC's

http://www.6502.org/commercial lists some of these.

I believe the Tamagotchi toys used the 6502.


Toys - and some of those have minimal RAM.

They were used a lot in cars. I'm not sure if that is still the case.


6502 is still my favorite architecture, even though I've done assembly language programming (professionally!) on many platforms in the past 35 years.

Why is it your favorite?

I am not the OP either, but 6502 and 6809 are my faves.

6502 was first. It is simple. And that makes it a lot of fun.

6809 is beautiful. I think it is the most powerful and elegant of the 8 bit CPUs. But that spoils a person too.

6502 is like whittling computing down to some useful nubs. There are enough subtleties to make it interesting too.

6502 on a Vic20 was where I really learned to program. As a 12 year old in 1982, I quickly outgrew Basic (with 3583 bytes of free memory) and its 22x23 char screen. I saved my pocket money to buy the assembly language cartridge.

My two most memorable 6502 assembly projects were:

- Text-to-speech - GUI for entering rules to generate phonemes for a text-to-speech system

- 3D graphics - Switching the Vic-20's characters set from ROM to RAM so I could do high-resolution pixel-addressable graphics. I wrote a full set of 3D primitives to draw lines, circles, do perspective and rotations from 3D to 2D, all in 6502 assembly.

One summer holidays I transcribed the entire Vic-20 ROM disassemby into old exercise books, so I could learn how it worked. I remember a sense of victory after reverse-engineering the floating point format and how the transcendental math functions worked.

Happy days!

I'm not the OP, but I always liked its simplicity. There's just enough space to do something interesting without getting bogged down in too much complexity.

I think it's because it was my first. And it is simple. Yes there are many addressing modes (like zero-page, and "absolute indirect", and "indexed indirect") but there's only 56 instructions and you can learn it in one afternoon.

And doing something useful with 56 instructions, 8-bits at a time, is like solving a puzzle.

8-bits at a time

It's a detail that doesn't always come up in these threads but it's worth remembering how belligerently 8-bit a 6502 is. Not only are there next to no general-purpose registers but they're 8 bit and there are no pretend-two-registers-are-one-16-bit-register instructions at all. You can't put an address in a register. Compared to even other popular 8 bit CPUs of the time, that's a bit metal.

It is. There is always 6809 for a bit more civilized fun, IMHO.

Personally I preferred the 6800 family (6800, 6802, 6805) over the 6502, but the 6809 always felt a little too far.

The 6809 is an amazing little processor, you can run multi-tasking and relocatable code on it with relative ease. And with some bank switching magic you can even do that with appreciable amounts of RAM for each task. It is also one of the few instruction sets that is very predictable, if you know some base formats then you can 'compose' instructions and they usually exist as a valid opcode.

“UniFLEX is a Unix-like operating system ... for the Motorola 6809”:


I haven't used the 6800 family, but I'd expect most people would have preferred them - the motivation for the 6502 was to drastically cut cost, and while a lot was achieved by an amazing design, it was also a number of feature trade-offs. The main designers of the 6502 (Mensch and Peddle) were both on the team that worked on the 6800, and Peddle pitched the 65xx proposal to Motorola first, before taking it to MOS when Motorola was more concerned about protecting their margins. Before the 6501 and 6502 hit the market, a 6800 cost $175. 6501 and 6502 were introduced for $20 and $25 respectively. A year later the 6800 cost $35.

The price adjustments are so fast (and, checking the Wikipedia page, it looks like they managed to squeeze in a lawsuit and a settlement in the same time period!), it makes you wonder what it was beside price that made the 6502 so popular.

Commodore bought MOS, and managed to squeeze margins out of MOS that were essential to their ability to get the PET out as cheaply as they did.

Combine that with the Apple I and II, and the 6502 was a major player, and the 6800 didn't have any massively compelling benefits.

The 6809 did have benefits over the 6502, and did get some design wins, including Commodore's SuperPET (which bizarrely had both a 6809 and a 6502), but I think the 6809 was too late - on the low end cheap machines like the VIC-20 and then C64 completely trounced the 6809 based machines, and so it just didn't get enough mindshare, and just a few years later it was effectively too late for it to get any traction in the home computer market.

I like the 6800. Two accumulators and a 16 bit index register is a good alternative. Totally understand your preference. The 6805 seems too cut down.

I have not written much 6800 code, but have read a fair bit. Had it been more available to me, I would definitely enjoyed it

The 6809 is the Cadillac of 8 bitters. That is what makes it fun. One can pack a ton of features into small spaces and doing reentrant, relocatable code is very well supported.

Stack abuse gets one a really fast memory to memory move too.

Simplicity? - it has some many addressing modes. Pretty much forced into zero page+Y, but zero page is so limited. Direct code modification, i.e. counters within the code was the next most common.

Of course division was an art. That being said I enjoyed it alot and still know quite of the opcodes (hex) by heart + the clock count of many instructions.

I only know 6502 in the Atari 2600 programming context. It's such a great pair because the 2600 hardware is very unique, so it gives you fun problems in that domain as well.

My first 6502 program was self-modifying; I wrote it just before reading the book chapter on using registers for indexing relative to a base address. That book was Programming the 6502 by Rodney Zaks.

I have some 1986-dated 6502 assembly code of mine in hard copy (on dot matrix paper with the "holes" intact). I'm going to scan it one day and post.

A lot more 6502 code was self modifying than necessary - I know a lot of people (myself included) did not pick up zero-page indexed indirect/indirect indexed address modes and instead kept using absolute x/y indexed and modified the absolute part for larger loops. A large part of the reason why I didn't learn about it until fairly late was that I mostly saw absolute x/y indexing in the code I looked at to learn. It's interesting how many bad habits you'd see in code like that, given e.g. the C64 ROMs were extensively dissected and documented and published, and they used zero page all over the place.

Raises hand. Yeah, me too.

When I first got zero page, I thought it looked like up to 128 address registers, with only a couple cycle penalty.

But, like you, a lot of code self modified the easier a solute indexed address mode instructions.

And it was right there, easy to see.

Sorta related, some 6502-based demos:


Apple II, but in the comments are links to a C64 demo, a IIGS demo, and a ZX spectrum demo (which is Z80, not 6502, but same era).

There can be a significant trade-off on size vs speed, the more tricks you do to shave down bytes usually adds to the complexity of the iterations.

So assembly programmers may go for the more kludgy looking code as the execution far outpaces the optimized byte count version. Ive heard of such things in video timing and game loops.

This really depends on the specific architecture and the application. In some cases, you will want to optimize mostly for size, so that your hotspots fit entirely into I-cache. Modern CPUs spend most of their time waiting for data (or instructions) to become available, so often computations are essentially free.

the average IPC over a variety of loads is, IIRC, estimated to be ~1. So no, most modern cpu do not spend most of their time waiting for data.

Screen blitting is a frequent unroll for speed case, often combined with self modifying code to make compiled sprites.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact