
Massively Interleaved Sprite Crunch – C64 Demo Effect (2016) - sleazy_b
http://www.linusakesson.net/scene/lunatico/misc.php
======
dzdt
This reminds me of a quote by a master magician, which I unfortunately can't
locate right now. The gist was that one way that magic tricks work is that the
audience would never believe the sheer amount of work that goes into
developing the skill to pull of the trick.

The C64 demoscene is at this point a pure case of magicians developing tricks
for other magicians. Compare to 20 years ago, when the tricks were made to wow
users familiar with the platform and its limitations.

Now someone without intimate knowledge of the C64 would not understand what is
the hard part of this demo part. Letters scroll up and sometimes expand or
shrink a bit. We've seen lots of different scroll text parts. Is this hard?

Consider the brute force approach. The text scroll area is 192 pixels by 200
pixels. If this was just a bitmap, that is 4800 bytes to update. Pure unrolled
code to move bytes would do

    
    
        LDA src,x
        STA dest,x
    

for a minimum of 8 cycles per byte (extra if 256-byte page boundaries are
crossed, and to update the x index, and do the logic to pick out which letter
to draw) or 38,400 cycles to update that bitmap. But there are just 19,656
cycles free per frame! The best a brute force approach would get then is one
update per 3 frames, or 17fps.

So all the cleverness is getting the machine to do something at 50fps that
naively it could do at best at 17fps. This is by racing the display raster and
playing tricks with the hardware bugs in the cpu/video chip interface.

~~~
wronskian
I wonder if you're thinking of this interview by Teller :
[http://www.smithsonianmag.com/arts-culture/teller-reveals-
hi...](http://www.smithsonianmag.com/arts-culture/teller-reveals-his-
secrets-100744801/)

"Make the trick more trouble than it's worth."

I have read that article many times, it's a wonderfully illuminating piece of
writing.

~~~
astrodust
Magic tricks and con artists employ many of the same tricks. A convincing con
is one so elaborate it looks like it'd be too expensive to perform, would
require way too much rehearsal if it was really a con, so people are lulled
into complacency.

Maybe it's that one form is being deceptive to entertain, the other to steal,
but they have a lot of commonality.

------
camtarn
Totally worth watching the entire video of the demo, too - with headphones on,
as the music is great :)

[https://www.youtube.com/watch?v=XcAUlEkU05A](https://www.youtube.com/watch?v=XcAUlEkU05A)

~~~
mrspeaker
Holy cow! I was big into the demoscene as a kid - they consumed most of my
hours spent on the C64... but WOW, that was an amazing demo: not only
implementing some crazy-new tricks, but also aesthetically beautiful! Now I
need to go back and watch all the other gems I've missed over the last 25
years.

~~~
ghusbands
Don't miss Edge of Disgrace - one of the best C64 demos, yet:
[https://www.youtube.com/watch?v=nLIUkBa_mA0](https://www.youtube.com/watch?v=nLIUkBa_mA0)

------
Pxtl
Having watched the video of what he accomplished on the C64 I am now
retroactively furious at every drop of framerate I have experienced on every
platform ever.

~~~
derefr
When you work back through the causal chain enough, you end up having to be
furious about 1. standardized architectures with many compatible hardware
models; and 2. non hard-real-time operating systems.

Interestingly, in a modern environment (unlike in the 80s-00s), both of these
seem like choices that could be made either way.

Re #1 — all the major OS manufacturers (Microsoft, Apple, Google) now have
their own flagship devices running their OS, and could standardize their app-
store certification processes to involve perf testing on their own hardware if
they wanted. Especially for the mobile hardware: there's no reason the QA
process for e.g. releasing a game on iOS can't involve certifying zero stutter
on a given iPhone, and then restricting the game to only be playable on that
iPhone or newer (the way consoles effectively work right now.)

Re #2 — there's no reason, other than inertia, that _personal computer_
(including mobile) OSes still put every single task on the system into one big
(usually oversubscribed, from a realtime perspective) scheduling bag. We
could—using the hardware hypervisor built into all modern architectures—split
PC OSes into two virtual machines: one for the foreground "app", and one for
everything else; and give the foreground-app VM fixed minimum allocations of
all the computer's resources. Then you could make real guarantees† about an
app or game's performance, as long as it was said foreground app. It'd be very
similar to the way some game consoles (e.g. the Wii) have separate
"application processors" and "OS service processors", to ensure nothing steals
time from the application.

† And those guarantees would also be requirements: if the foreground app says
it needs its VM to have 5GB of RAM, you literally won't be able to run it if
you don't have 5GB of free physical memory to hand it (though the OS would
probably first try to OOM-kill some sleeping apps to give it that memory, like
iOS does.) Much clearer than the current "this game will be _really slow_ if
your computer is more than four years old, but it's not exactly clear which
part of the computer is below-spec" we have today.

~~~
Someone
_" there's no reason the QA process for e.g. releasing a game on iOS can't
involve certifying zero stutter on a given iPhone"_

To guarantee that, testing has to go through _all_ possible game states. For
almost any non-trivial game, that's infeasible.

 _" and then restricting the game to only be playable on that iPhone or
newer"_

There is no guarantee that newer hardware would be faster for every possible
program execution, and even if it were, timing differences could affect game
play.

There also is no guarantee that newer hardware produces the exact same
results. For example, better anti-aliasing or fonts drawn at double resolution
could affect hit detection.

This even isn't guaranteed on the _' same'_ hardware. For example, there might
be C64's that don't have the bugs that this demo exploits.

~~~
Pxtl
Solution: Developer provide a script that generates a demo video from the
application that will be posted in the store listing.

So right there on the store website is the "demo video".

~~~
Someone
If only things were that simple. A demo video can show that there is _an_
interaction that doesn't stutter, not that there is _no_ interaction that
stutters.

For an historical analogue: you can show hours of paying PacMan without
finding its kill screen
([http://www.donhodges.com/how_high_can_you_get2.htm](http://www.donhodges.com/how_high_can_you_get2.htm))

~~~
Pxtl
I realize it's imperfect, but it would raise confidence to see a video of
something that looks fairly consistent with the advertised usage that
satisfies the benchmarks.

The video doesn't guarantee bug-free play, but it can give us confidence that
the benchmarks are an honest test since we can see what the test looked like.

If it's a text editor and it just shows that opening a new blank file and
typing a few words is fast, or a game that shows the game runs well in an
empty room level, then we know the demo is dishonest. And having a "boring"
video in their store page won't win them much buyers. You want all the ad
content to look as good as possible, so the demo video would be best if it
shows off the most impressive features of your application, which is where
we'd expect performance problems.

That said, it's probably not feasible because creating a full benchmark script
that also doubles as a real-world demo for an app would be too much burden to
put on developers.

------
jepler
I particularly like Linus A's contributions to the C64 demoscene, because more
than likely he'll do an excellent writeup like this about a newly discovered
or perfected technique that was central to it.

~~~
vidarh
It's particular a big deal given how hard it used to be to get information
about these things... When I finally learned how to do DYSP's, it was thanks
to finding someone willing to photocopy something like a 3rd generation
photocopy of a cycle-count diagram that someone had drawn by hand.

So much effort was lost because of the communications barriers.

------
mvindahl
I used to program demos for the C64 back in my teens. A lot of the learning
was simply reverse engineering the code from other demos, sometimes verbatim
copying snippets, always trying to understand. At age 16 I was pretty
confident that I had achieved the skille level of a wizard but in reality my
understanding was pretty sketchy.

An example: Rendering graphics (i.e. sprites) at the far horizontal edges of
the screen would require the CPU to perform some shenanigans to trick the
video circuits; this would need to be done on every scanline and required the
timing of the CPU to be in close sync with the video hardware. I understood
that. I had also experienced how sprites and every eight scanline (aka
"badline") would mess up the carefully planned timings. Eventually, I kinda
understood that concept. I had also seen, from code that I copied, how
triggering a badline could be used to force the CPU in sync with the raster
beam but it was akin to black magic for me. Wasn't until years later,
programming on the Amiga, that the penny dropped for me.

And of course, grasping the concept and implications of DMA was pretty basic
stuff compared to what's going on in this article. I don't think that I'll
ever devote the time to understand it in detail but I find it fascinating how
people keep discovering new unintended features in the old C64 architecture.

------
Zitrax
The Lunatico demo got 2nd place at X'2016\. Here you can find videos of the
winner and the rest: [http://www.indieretronews.com/2016/11/x2016-c64-had-one-
hell...](http://www.indieretronews.com/2016/11/x2016-c64-had-one-hell-of-
demoparty.html)

------
the_cat_kittles
linus if you are reading, i just discovered your website via this post, and
have really enjoyed checking out all the stuff on it. lots of great fun on
there, and also lots of very thought provoking ideas about music and the
compositional process. thank you!

------
sehugg
This looks like a pretty nutty technique, essentially exploiting an
undocumented state (or bug?) in the VIC chip.

The Atari 2600's TIA had lots of sharp corners caused by the reliance on
polynomial counters, which saved silicon but made for lots of seemingly-random
edge cases (after all, polynomial counters are also used to generate pseudo-
random noise!)

~~~
vidarh
> essentially exploiting an undocumented state (or bug?) in the VIC chip.

You just described most of the new effects in C64 demos over the last 30+
years...

But, yes, this is one of the nuttier.

When I was a kid and trying to do demos, the "simple" stuff like tricking the
VIC to keep the borders open was still amongst the more exciting things you
could do (opening the top and bottom border is trivial once you know how;
opening the left/right border was harder - especially if moving sprites in the
Y-direction as it affects the number of bus cycles available to the CPU).

That was quite literally childs play compared to this one.

------
EdSharkey
So, let me see if I can sum this wild gem up.

1\. On each raster scan line, at precisely cycle 15, you need to clear the
Y-expand register (vertical pixel size doubler thingy that sprites can do).
This throws the hardware into confusion, the internal registers keeping track
of where in memory a sprite is being drawn from is scrambulated and you wind
up with the interesting graph presented on the right as to how the indexes
progress from scanline to scanline. Y-expand is a single byte register where
each bit belongs to one of the eight sprites. Simply clearing a sprite's
Y-expand bit on clock 15 every scan line is sufficient to introducing glitch
pandemonium.

2\. On some rows you want your sprite's Y-expand to be cleared to trigger the
glitch and sometimes not to have the next row be read in sequence, so before
the scanline ends, we need to toggle Y-expand for a sprite to 1 on a per-
sprite basis. How did the author do this efficiently? ...

3\. ... by using sprite-to-playfield character collision detection! He put the
sprites in the background, behind the character graphics and placed a single
pixel vertical bar using redefined characters or bitmap graphics to cover the
right-most pixel of the sprite. In the sprite definition's right-most pixel
for the current row, he would encode either a 0 or 1 to decide if the next
row's Y-expand should be 0 or 1. The natural collision detection of the sprite
hardware would transcribe a 0 or 1 into the sprite-to-playfield collision
register for all 8 sprites when both the sprite and the playfield had a filled
bit. You'd wind up with a byte that is ready-made for the Y-expand setting for
the NEXT scanline for all 8 sprites. (I assume before the end of the scanline,
the idea is to read the collision register and write its value to Y-expand.)
What a clever way to save a memory fetch! Also by reading from the collision
register before the end of the scanline, it resets the VIC chip's readiness to
test collisions and collisions will be tested again on the next scan line, so
the whole process can repeat.

I find this hack to be really beautiful. That use of collision to dynamically
build the next scanline's Y-expand values kindof reminds me how modern 3D
games may encode all kinds of different scene information into various layer
buffers and color channels as a game frame is rendered over many passes.

As a kid I reused the 64 sprite hardware over and over to fill the screen with
sprites. As I recall, I lost a lot of CPU time because the VIC chip was
hogging access to the memory more than it normally would. This trick would
have let me fill the screen with more sprites than ever. One of my dream goals
as a kid was to reuse the sprite hardware faster, to change their colors and
be able to get more colors on the screen.

I recall trying to find a way to get the sprites to be only one scanline high
and spend all my time simply changing colors and memory locations on the
sprites. It never worked, it was like the VIC chip was locked in on the
settings for a sprite until it was done drawing. And that in fact is the trick
with this Y-expand thing - you can trick the sprite hardware to finish a
sprite in fewer scanlines than should be possible (apparently as few as 4,
according to the author!) Once the hardware thinks the sprite is finished, the
hardware is relinquished and can be commanded to reuse that sprite on
subsequent scanlines to paint more pictures.

It seems the demoscene may achieve my dream someday - perhaps my fancy super-
color-1-pixel-high-sprite bitmap display mode hack may one day become a
reality!

