

Assembly Routine for Realtime Decoding of Floppy Data - NateLawson
http://linusakesson.net/programming/gcr-decoding/index.php

======
NateLawson
LFT has come up with a clever routine for decoding Commondore floppy data in
realtime. This is quite an achievement that mixes bit twiddling hacks, self-
modifying code, cycle counting, and undocumented opcodes.

While the Commodore serial transfer routines are well-known for being terribly
slow (300 bytes/sec), various fastloaders have been used since the mid 80's.
The 1541 drive has its own 6502 CPU, so the computer could download code into
the drive's RAM and then talk to it via an optimized protocol. These routines
have been reasonably fast and Krill's loader is one of the more modern
variants.

However, there's another aspect that hadn't been optimized before. Data is
stored on magnetic media in an expanded form (5 bits for every 4 bits of data
for Commodore's GCR coding) so that the clock signal can be regenerated from
the data. Similar encodings are used for CDs and DVDs these days as well.

The data needs to be decoded from its media format into the original data.
Normally, the 1541 drive does this decoding since it has plenty of time. It
reads a sector, decodes it, and leisurely transmits the results back. Then it
repeats the process for the next sector. Meanwhile, the disk is still spinning
underneath, rotating several times per sector read.

Some fastloaders speed this up by avoiding decoding on the drive. They
transfer the raw data to the C64 and decode it there, letting the drive just
read sectors and transmit the data. But decoding in realtime has been the holy
grail.

LFT's routine is very clever and has lessons for data packing/unpacking for
other platforms as well. It's a classic time/memory tradeoff where some
expansion during the unpacking process saves CPU cycles. In short, the
standard approach of masking and shifting is not the only way to deal with
non-byte-aligned fields. Without a more clever approach, realtime decoding is
not possible.

The result is code that can decode from raw GCR to data bytes on the fly,
transmitting data to the C64 at almost the rate of the floppy spinning
(roughly 25 microseconds/byte).

I say "almost" since checksums still need to be validated. This means that an
interleave of 2 (every other sector) can be introduced when writing the data
in order to have the next sequential sector available to be read after the
previous one has been decoded.

Congrats to LFT on a great accomplishment 30 years in the making.

------
LeafStorm
In my assembly and computer architecture class (CSC 236 at NC State
University), the instructor grades us on efficiency - sometimes it's the
number of instructions written, sometimes instructions executed. Additionally,
he posts the all-time efficiency records for each program.

The end result is that I am now addicted to assembly programming. There's so
many clever tricks you can use, like:

* Manipulating the addresses your code assembles at so that the addresses in a jump table can overlap.

* Using `lea` on data values to convert ASCII numbers to their normal equivalent while in motion.

* Making a 128KB lookup table.

* Unrolling all the loops in the program.

* Reusing as many registers as possible (including using bx as a frame pointer) just so you don't have to pop them.

And when you're studying your code, and you suddenly have the flash of
inspiration about how to make it faster, there's this feeling of, "Oh, that's
so brilliant, but so evil..."

(Of course, then you have to debug it all. :-P)

~~~
daurnimator
Although I love this stuff, encountering code like this is a nightmare.

I hope you never have to revisit your programs ;)

------
kragen
Lovely. There are lots of peripherals from around the same time that I wish
were as programmable as the 1541, or even just a little bit. For example, if
you'd been able to send, say, 50 bytes of 8085 code to the VT-100, and use a
30-character buffer, you could have handled backspace locally in the terminal,
both improving responsiveness and dramatically lowering load on your VAX back
in the days when handling an I/O interrupt to echo a character was a
substantial load. If the VT-100 or H19 had had 20 bytes of SRAM mapped into
its font ROM space, you could have had four programmable glyphs — enough to
display a pixel-positioned mouse pointer on top of the text, or really
dramatically spice up a lot of video games of the time. (You'd probably want
to download code into the terminal to make the animation smooth, although 9600
baud could have gone past 30fps.)

And of course today we have junk heaps full of TVs, calculators, and feature
phones that people are throwing out basically because they don't have any way
to reprogram them into something useful.

~~~
vidarh
The keyboards on some of the Amiga's had a SoC version of the 6502 (it's a
model with a tiny PROM an some RAM and a few IO lines). Unfortunately not
reprogrammable like the 1541.

I used to freak out PC users by explaining how my Amiga 2000 had a M68000, but
with a M68020 accelerator board, a PC bridge board, with a 286 accelerator
(the A2000 with bridge board could run the PC board "in a window"). On top of
those CPU's, it also had a SCSI controller with a Z80, and aforementioned 6502
compatible CPU on the keyboard. Now that's multi-processing :D (Of course the
M68000 and x86 were disabled).

~~~
kragen
That's cool! I had no idea about the keyboard!

If you had a reprogrammable 6502 in your keyboard with a few hundred bytes of
RAM, what would you use it for? Keyboard macros are one possibility. Or you
could encrypt the keyboard-computer connection. Maybe some games would benefit
from a "rapid repeat" key that let you send a single key or keyboard macro at
100Hz?

~~~
cturner
I think keyboard macros - completely isolated from the system they're
dispatching to - are an interesting "what if" in our history.

There's plenty of room for innovation, but I don't know of any attempts at it.

Thinking about this, I realise - could play around with these ideas in tmux.

