
Why MacPaint's Original Canvas was 416 Pixels Wide - mdpane
http://www.looksgoodworkswell.com/elegance-of-macpaint-code/
======
teamonkey
I know 68k from the Atari ST, and the MOVEM.L trick was well known. The ST had
a similar M68K to the Mac 128K, but the screen buffer was 32k (640x400)
instead of the Mac's 21.9k (512x342). The ST's CPU was more than capable of
copying 32k from location to location within a single screen redraw, so I
imagine the same was true for the Mac, especially since it had to shift less
data.

In addition, there's no reason why one MOVEM can't contain data from multiple
rows. For a 640x400 buffer like you find in the ST, each row requires
approximately 1.5 MOVEM commands. No problem, since your buffer will have
access to contiguous memory. Your second MOVEM call will copy the last half of
the 1st row and the first half of the 2nd. Your buffer has to have a
horizontal size that is a multiple of 32 pixels, but not 416.

I suspect the real reason for this weird resolution is boring memory
constraints. If I'm reading the 68K code correctly, a MacPaint buffer was
416x200 - 10.4k. According to Wikipedia[1] MacPaint used 2 off-screen buffers
= 20.8k. The ROM is 68k, then there's probably at least 1 system-reserved
screen buffer of 10.4k. The OS has to fit in somewhere. That's quite a lot for
a 128K machine.

Find the most comfortable painting layout where the horizontal canvas size is
a multiple of 32 pixels and 2 buffers fit within 21k.

[1]
[http://en.wikipedia.org/wiki/Macpaint](http://en.wikipedia.org/wiki/Macpaint)

~~~
klodolph
> In addition, there's no reason why one MOVEM can't contain data from
> multiple rows.

The screen buffer is 512x384 (24.5 KB, not 10.4 KB), so you have to copy one
row at a time. The source and destination pointers have to increment by
different amounts after each row (13 and 16 words, respectively). See the
following lines in the article's assembly:

    
    
        ADD #52, A0
        ADD screenRow, A1
    

screenRow must be defined to 64.

~~~
teamonkey
A fair point, although I think the M68K was fast enough to copy the remaining
pixels with a few MOVE.L calls if need be.

[Edit] by which I mean, yes, you're right that the MOVEM can't cross line
boundaries in this case, but the CPU should be fast enough to deal with longer
lines by using a MOVEM and adding a few extra MOVEs to copy individual
longwords. Shorter lines would simply require using fewer registers in the
MOVEM.

------
kabdib
Yup, MOVEM was pretty neat on the 68K. For the Atari ST I wrote a very fast
memory-fill using a tower of MOVEM instructions; you could get to within a
percent or so of the system memory bandwidth, and you might as well stop
trying at that point.

(Also had to deal with some _other_ company's programmers who thought their
way to fill memory -- with an overlapping memory copy -- was more clever. But
theirs was five orders of magnitude slower).

It's interesting that Atkinson's code doesn't unroll the MOVEM loop at all. He
could have gotten another couple percent of loop overhead out of it :-)

------
coldcode
Brings back memories of writing 68K assembly, back in the day when writing raw
was the difference between screaming fast and dog slow. 6502 was even more
fun.

~~~
protomyth
The 6502 was fun and it was the first assembly I programmed[1]. The 6809 also
deserves a look if anyone is reviewing old instruction sets / assembly code. I
am still not sure which of the three I preferred.

1) on an Apple II, sadly not on my Atari 400

~~~
joezydeco
I spent many years working with the 6809.

One of the biggest advantages over the 6502 was the ability to work with
16-bit registers. X and Y were 16 bit, and a user/local stack pointer was
added, U. Even the A and B accumulators could be combined into one 16-bit
register, D.

It's one of my favorite architectures. You could do a hell of a lot with very
little room.

------
userbinator
I wonder if MOVEM was inspiration for ARM's (V)LDM/STM instructions?

x86 has provided an optimised memory copy instruction ever since the 8086: REP
MOVS. The history and evolution of this instruction has quite an interesting
story from a CPU architecture perspective.

~~~
daeken
> The history and evolution of this instruction has quite an interesting story
> from a CPU architecture perspective.

I'd love to read about that if you have any references. I've used the MOVS
instructions a million times without really thinking about them much.

~~~
userbinator
Originally the instruction was the fastest way to do a block copy, and
generally this was the case until MMX appeared, and then it fell into the set
of "microcoded CISC instructions no one really uses" \- so Intel didn't bother
to optimise it much (the RISC fad was also really starting to take off in the
PC world at the time) and it started falling behind. But then, in the post-P4
era, when CPU designers realised that high clock speeds weren't everything,
and it was better to make instructions do more per clock instead, it got a lot
more attention and a lot of detailed information about that can be found in
this thread:

[http://software.intel.com/en-
us/forums/topic/275765](http://software.intel.com/en-us/forums/topic/275765)

Even more recently (Nehalem and beyond), they _really_ started paying
attention to optimising this instruction, so that even the byte/word variants
will copy entire cache lines at once if possible.

[http://stackoverflow.com/questions/8858778/why-are-
complicat...](http://stackoverflow.com/questions/8858778/why-are-complicated-
memcpy-memset-superior)

(IMHO the 2nd answer to that question should really have been chosen, since
the 1st answer would be closer to reality a decade or two ago.)

------
cromwellian
If only the Mac 128k had an Agnus like the Amiga. ;-)

------
chaosphere2112
It was a little surreal to be reading an article that prominently featured
assembly, and realizing that it was the only variety that I actually know (my
college's assembly class was taught in 68k because of reasons).

------
al2o3cr
This sort of "bending" of opcodes designed to make saving registers on the
stack easier (MOVEM was a common part of most 68k functions' preamble) seems
like it was a favorite trick:

[http://blog.moertel.com/posts/2013-12-14-great-old-timey-
gam...](http://blog.moertel.com/posts/2013-12-14-great-old-timey-game-
programming-hack.html)

(above ran on HN a couple weeks back)

------
frozenport
Except it didn't because there are 14. Unsurprisingly it was aesthetics.

~~~
jrlocke
One was reserved for the stack pointer, so 13*32=416.

~~~
lostlogin
Thanks. This is close but off then: The drawing area is evenly divisible by 13
long words (13x32=416). This is exactly the number of registers that were
available in the loop above for the MOVEM.L operation (It appears it could
have used 14 registers, but I am guessing the extra 32 pixels would have made
the drawing area more cramped by reducing the gray whitespace.

~~~
joezydeco
MOVEM.L can address all _16_ registers on the 68K but keeping the source and
destination pointers in the register bank made the routine run much quicker.
You also need SP to get back home, so 13 was all Hertzfeld could use.

~~~
Stratoscope
> Hertzfeld

Atkinson

~~~
joezydeco
Ooops, you're right. My mistake.

------
supercoder
Or maybe the Motorola 68k was designed specifically so MacPaint could have a
416 pixel wide canvas.

