
Improvements to the Xerox Alto Mandelbrot drop runtime from 1 hour to 9 minutes - darwhy
http://www.righto.com/2017/06/improvements-to-xerox-alto-mandelbrot.html
======
gumby
Shutting off redisplay was not an uncommon technique on the Alto (the first
time I saw it I was somewhat disconcerted, and though the machine had crashed
as that wasn't all that uncommon in those days).

The key, if you read Thacker's original design paper, is that screen refresh
rate was 2/3 of the bus bandwidth. This was considered absurd: why so much
resource devoted to I/O (many machines in those days still had channel
controllers, a design which has implicitly returned).

These old tidbits are fun because they make you think about different design
tradeoffs. I love the line in K&R (IIRC -- is it still in there?) where they
explain that the standard library is required (and say, including the quotes
"you mean I have to call a function to do I/O?"). And the implicit defense in
the 801 paper (the original RISC paper) that yes, the compiler really could do
a good job.

~~~
wooby
The "801 paper" (I think):
[http://www.ece.ucdavis.edu/~vojin/CLASSES/EEC272/S2005/Paper...](http://www.ece.ucdavis.edu/~vojin/CLASSES/EEC272/S2005/Papers/801-report_nov81.pdf)

------
pmontra
> One way to speed up the Alto is to shut off the display. I tried this and
> improved the time from 24 minutes to 9 minutes, a remarkable improvement
> [...]: to display pixels on the screen, the CPU must move the pixels for
> each scan line from RAM to the display board, 30 times a second.

Ah, same as the Sinclair ZX81 fast mode for the same reasons. The ZX80 was
always running in fast mode and displayed the screen only if no program was
running. Old times.

~~~
adrianmonk
Yes, I had the TS1000, a slightly modified version for the US market (with
NTSC video output). I remember its built-in BASIC actually had "FAST" and
"SLOW" commands. I love that they felt "SLOW" was an appropriate name for a
command.

I guess I had always assumed this was about memory bandwidth, but this Alto
article raises the possibility that the CPU may have been doing double duty
(probably for cost reasons and not flexibility).

And this documentation says it was "four times as fast", which is probably too
much of a difference for just memory bandwidth:
[http://www.worldofspectrum.org/ZX81BasicProgramming/chap13.h...](http://www.worldofspectrum.org/ZX81BasicProgramming/chap13.html)

I wonder if the implementation is as simple as just swapping the priorities of
user code and display code.

~~~
rwmj
The way this actually worked is a giant hack, explained here:
[http://zx81.us/zx81vid.txt](http://zx81.us/zx81vid.txt) But basically yes it
really was about 4 times faster in FAST mode.

------
chromaton
An even larger speedup could be realized by performing "solid guessing" like
Fractint: Render at 1/2 resolution (that's 1/4 the number of pixels), then
take a second pass only in areas where there's a color boundary. This cuts
render times down to less than half.

~~~
duskwuff
IIRC, there's an even more powerful trick available: render the edges of a
rectangle. If there are no color boundaries on the edges, you can _usually_
assume it'll be solid-colored and skip rendering any of the contents. Some
exceptions apply.

~~~
tetraodonpuffer
what are the exceptions? I thought that Mandelbrot was always guaranteed to be
ok for this optimization (unlike, say, Julia)

~~~
duskwuff
The exception I'm aware of is when the rectangle is so large that it surrounds
the entire set.

------
ChuckMcM
Very nice. I think the whole art of dealing with limited resources are lost on
new programmers who have gigabytes of rams and gigacycles of compute. I love
the old 1K demos for that reason, it really puts the puzzle back into
programming.

In my experience, if you are familiar with RTL level synthesis of hardware
then microprogramming isn't a huge leap (to me they feel very similar).

~~~
pishpash
Programmers should have machines worse than the users. Flame away.

~~~
ChuckMcM
Always a controversial position :-) Back in the day hosting the full
development environment would often level out the differences in performance,
not as much today. With mobile development you don't have much of a choice,
your stuff has to run on the phone.

~~~
pjmlp
It still holds though, when developers get to only test on Google Pixie and
Samsung S8, while customers are on ZTE Blade, the stuff might even install,
but run it won't.

------
Someone
_" However, Alto microcode is pretty crazy, so I'm not going to try a
microcode Mandelbrot."_

I don't understand that logic, coming from somebody who has a restored Xerox
Alto and wrote a Mandelbrot generator _in BCPL_ for it _" to learn how to use
the Alto's bitmapped display, not make the fastest Mandelbrot set"_.

I also expect this guy will get source code for the 'simple' things such as an
OR instruction or a fixed-width multiplication instruction within a few weeks.

Edit: I’ve been browsing [http://bitsavers.trailing-
edge.com/pdf/xerox/alto/Alto_Hardw...](http://bitsavers.trailing-
edge.com/pdf/xerox/alto/Alto_Hardware_Manual_Aug76.pdf).

There apparently are only 16 basic micro-instructions ( _”The ALU function
field controls the SN74181 ALU. This device can do a total of 48 arithmetic
and logical operations, most of which are relatively useless. The 4-bit field
is mapped by a PROM into the 16 most useful functions”_ ), and _OR_ is one of
them:

    
    
           ALUF  FIELD FUNCTION  
             0   BUS            (A)
             1   T              (B)
             2   BUS OR T*      (A+B)
             3   BUS AND T      (AB)
             4   BUS XOR T      (A XOR B)
             5   BUS + I*       (A PLUS I)
             6   BUS - I*       (A MINUS I)
             7   BUS + T        (A PLUS B)
            1OB  BUS - T        (A MINUS B)
            11B  BUS - T - 1    (A MINUS B MINUS I)
            12B  BUS + T + 1*   (A PLUS B PLUS I) 
            13B  BUS + SKIP*    (A PLUS I) 
            14B  BUS.T* (AND)   (AB) 
            15B  BUS AND NOT T  (A & NOT B)
        16B-17B  UNDEFINED
    

So, following the best source I have, (that PDF) which states:

 _”For the most part, since the Alto is such a simple machine, writing Alto
microcode is a straightforward exercise in rule-following”_

that shouldn’t be too hard, provided that there is a free spot for writing the
microcode instruction, or that you can find an instruction to give up in
exchange for a simple OR.

Edit 2: that PDF also describes the way you do an OR, so that apparently was
the way to go:

 _”To "or" together the contents of ACO and ACI; results ACO:

    
    
        COM 1.1
        AND 1,0
        ADC 1,0

”_

~~~
kens
Even if I were crazy enough to write microcode, there's a problem that we
don't have the microcode assembler MU.RUN. Maybe it will turn up on one of the
PARC disks we're archiving...

~~~
edmccard
Theres a mu.run on (at least) one of the disks archived at the Computer
History Musuem, listed at

[http://xeroxalto.computerhistory.org/_cd8_/alto/.index.html](http://xeroxalto.computerhistory.org/_cd8_/alto/.index.html)

[http://xeroxalto.computerhistory.org/_cd8_/alto/.mu.run!2.ht...](http://xeroxalto.computerhistory.org/_cd8_/alto/.mu.run!2.html)

but I don't know if there's any way to transfer it from a disk image to an
actual Alto disk?

~~~
kens
Thanks - I searched that archive for MU.RUN but somehow missed it. We got a
gateway from the Living Computer Museum that lets us transfer files to the
real Alto. So I guess I need another excuse for why I'm not writing microcode
:-)

------
e12e
Fascinating article. One question, you write that: "Using what I learned with
the Mandelbrot, I wrote a program to display images; an example is below." \-
I couldn't find source code for that on github - is it published anywhere?

It would be interesting to see some code that exercise more parts of the
system, beyond the Mandelbrot generator.

About coding style - I guess this is tabs-vs-spaces territory - but is there a
particular reason why you only indent blocks in the loops, not in the
procedure/functions (like under main)?

I also wonder a bit about some of the magic constants - like 30705 - is there
an overhead to use variables in BCPL (eg: indirect de-reference, no automatic
in-lining by the compiler)?

Finally, how about procedure call overhead? Granted, the Mandelbrot generator
is rather simple (shifts and other tricks notwithstanding) - so I can see why
it makes sense to keep it all in a single procedure - but what does a call/ret
look like on the Alto? (eg: in simplified assembler, in terms of stack-push,
registers etc)?

~~~
kens
I've put the image code on github: [https://github.com/shirriff/alto-display-
image](https://github.com/shirriff/alto-display-image)

Coding style: I tried to match the style of the existing Alto files, which
starts functions at the left margin and then indents by 3 (!) spaces from
there. e.g.
[http://xeroxalto.computerhistory.org/Pixel/IFS/Sources/.IfsM...](http://xeroxalto.computerhistory.org/Pixel/IFS/Sources/.IfsMailUndeliv.bcpl!1.html)

Magic constants: I'm lazy so it's easier to leave them inline when I'm hacking
on code. I don't know if there's runtime overhead.

BCPL procedure call overhead: kind of nasty. There's no stack support in the
instruction set, just jump-and-link. So a called procedure first calls a
subroutine "getframe" that sets up the stack frame (very similar to a C stack
frame). Then a second subroutine "moveargs" moves the call arguments into the
stack frame. At the end of the procedure call, a _third_ subroutine call
cleans up and does the return.

~~~
e12e
> I've put the image code on github (...)

Thank you! It's interesting to see how similar this code is to assembly code
calling OS procedures, after declaring them external - I've recently been
playing (again) with assembly for the win64 arch, and one sample program (a
mish-mash of example code available for nasm, fasm and "go" assembler - is
rather similar IMNHO - note this just displays a window, no loading of (image)
files):

    
    
      ; Assemble and link with:
      ;   nasm -f win64 .\hello-nasm.asm
      ;   golink .\hello-nasm.obj \
      ;     Kernel32.dll User32.dll /entry:WinMain
    
      global WinMain
      extern ExitProcess
      extern MessageBoxA
    
      section '.text'
    
      WinMain:
        sub rsp,8*5 ; reserve stack for API use and make
                    ; stack dqword aligned
    
        xor rcx, rcx         ; uType = MB_OK
        lea rdx, [szCaption] ; LPCSTR lpCaption
        lea r8, [szTitle]    ; LPCSTR lpText
        xor r9d, r9d         ; hWnd = HWND_DESKTOP
        call MessageBoxA
    
        mov ecx,eax
        call ExitProcess
    
      section '.rdata'
      szTitle:   db 'Hello, Title!', 0
      szCaption: db 'Hello, World!', 0
    

The more things change...

nasm: [http://www.nasm.us/](http://www.nasm.us/) go linker (and assembler):
[http://godevtool.com/](http://godevtool.com/) (not to be confused with
Google's "golang" Go programming language and tools.

------
tyingq
A pretty in-depth hardware manual exists if your interest was piqued by this
article.

[http://bitsavers.trailing-
edge.com/pdf/xerox/alto/Alto_Hardw...](http://bitsavers.trailing-
edge.com/pdf/xerox/alto/Alto_Hardware_Manual_Aug76.pdf)

------
RmDen
Just like FAST on a commodore 128, it would run at 2 MHz instead of 1, and the
screen went black. To go back to 1MHz, you would use the SLOW command IIRC

~~~
vidarh
That was a bit different in that the CPU clock frequency was actually changed.
The screen blanking in that instance was because the VIC-II could only run at
1MHz, so the screen would get messed up in 40 column mode if the CPU (and by
extension the bus speed) was switched to 2MHz.

You _do_ get a speedup by just disabling the screen too on both the C64 and
C128, though, as with the display on the VIC hogs a lot of memory bus cycles.
That's a bit more similar to the situation on the Alto, except instead of
different "hardware threads" they are separate chips and the VIC just has the
ability to take control of the bus whenever it needs memory access.

------
roywiggins
My Computer Org&Architecture course in college had us programming on a PDP-8,
which has an even more ridiculously restricted instruction set. I always
intended to build a Mandelbrot renderer for it, but never got around to it.

The actual physical PDP-8 we had never had a printer hooked up, so actually
drawing the thing would have been impossible, other than using the
blinkenlights to render a line at a time.

------
stuaxo
Great!

It would be good if it was updated every so often, it probably wouldn't kill
the speed too much.

------
philovivero
It is time to coin a new phrase: the Hacker News inverse comment-interest
ratio. The fewer comments on a story, the more likely it is to be extremely
interesting and relevant to a hardcore geek.

Anytime I see a slightly-curious headline with only 0-3 comments, I know it's
going to be good.

~~~
maxerickson
That's probably the wrong explanation.

There's some kind of velocity factor in the position ranking, so a story that
quickly gets a bunch of votes will have a high rank and few comments. But the
comments are likely to come as the link sits there for a while.

~~~
vxNsr
I think this is fairly accurate. It's really easy to upvote something you find
interesting, but takes much more time to come up with good comments. Usually
these types of articles will generate quite a few comments even though they
start out on the top with basically none.

~~~
SAI_Peregrinus
Simple time to read the article also applies. I've been following this series,
so I upvoted it, then read it, then came back to read the comments.

