
8088 MPH: We Break All Your Emulators - drv
http://trixter.oldskool.org/2015/04/07/8088-mph-we-break-all-your-emulators/
======
corysama
Quick explanation of compiled sprites:

Most commonly a sprite is represented as a 2d array of pixels that you for X,
for Y over and use math or branching to blend on to the screen. But, that's a
lot of reading and writing, and a lot of the math ends up doing nothing
because a lot of the pixels are intentionally invisible.

So, you could do some sort of visible/invisible RLE to skip over the invisible
pixels. That's better, but it's still a complicated loop and still a lot of
reading pixels.

So, many crazy democoders have decided to write "sprite compilers" that read
the 2D color array of a sprite and spits out the assembly for the exact
instructions needed to write each visible pixel one at a time as a linear
instruction sequence with no branching. The sprites are then assembled and
linked into the program code as individual functions. I believe they can even
exclusively use immediate values encoded inside the instructions rather than
reading the colors from a separate memory address. So, rather than read
instruction, read data, write data; it becomes a read instruction, write data
in two straight lines in memory.

~~~
ekianjo
> it becomes a read instruction, write data in two straight lines in memory.

So basically this kind of hack could not be used for a game, then, where
interaction is needed?

~~~
corysama
IIRC, because of relative-address store instructions, the destination address
does not have to be hard-coded. So, the sprites can still move around
dynamically.

What's harder is clipping against the sides of the screen. With no branching,
there's no way to prevent the sprite from writing past the end of a
line/screen (wrapping/mem-stomping). So, there does need to be a test per
sprite to detect that case and fall back on a more complicated blitter.

The Allegro game framework had a compiled sprite jitter as a feature early on.
So, that would be existence proof of them being used in games :)
[http://alleg.sourceforge.net/stabledocs/en/alleg016.html](http://alleg.sourceforge.net/stabledocs/en/alleg016.html)

~~~
nitrogen
_What 's harder is clipping against the sides of the screen. With no
branching, there's no way to prevent the sprite from writing past the end of a
line/screen (wrapping/mem-stomping). So, there does need to be a test per
sprite to detect that case and fall back on a more complicated blitter._

In fullscreen modes, you could also just make your screen buffer bigger than
the actual screen by the width and height of your largest sprite.

~~~
lscharen
In my game engine I wrote for the Apple IIgs a long time ago, I used compiled
sprites and maintained a 1-scanline-wide mask that I used to clip the compiled
sprite to the screen edge.

This only cost one extra AND instruction and allowed the sprites to be clipped
to any size rectangular playfield while still maintaining almost all of the
speed benefits.

~~~
vidarh
That's actually cool, as it would _also_ allow clipping against a "foreground"
by varying the address of the scanline mask. E.g. imagine foreground trees in
a jungle scene.

~~~
lscharen
Exactly!

I did extend it to use a full-screen foreground mask that implemented this
sort of clipping. I was able to make the mask scrollable which allowed the
compiled sprites to appear "behind" fences and other complex shapes with per-
pixel accuracy.

It could even be used to mask out individual pixel bits that allowed for fake
"lighting" changes with a carefully chosen palette.

~~~
vidarh
I keep wanting to do a "retro-game" and make use of what I've learned about
these types of effects now. Despite how far machines like the C64 for example
were pushed, I don't think they were pushed nearly as far in terms of games as
with demos and it'd be fascinating to try to push the limits..

(maybe one day..)

~~~
sitkack
I'd love to see a demoscene version of "Hacker's Delight" [0]

[0] [http://www.hackersdelight.org/](http://www.hackersdelight.org/)

------
bane
The party this was released at had some absolutely mind-blowing stuff. The 8k
and 64k competitions were _amazing_. The demoscene continues to be an
astonishing force in computing.

Some of the stuff would be completely at home as installation art in any top
modern art museum. My favorite is
[https://www.youtube.com/watch?v=XF4SEVbxUdE](https://www.youtube.com/watch?v=XF4SEVbxUdE)
done in 64kb!

Here's the results with links to productions (most of them have youtube videos
by now)

[http://www.pouet.net/party.php?which=1550&when=2015](http://www.pouet.net/party.php?which=1550&when=2015)

~~~
13
I think Chaos Theory still tops 64k.

[https://www.youtube.com/watch?v=4DjBq2O0XXk](https://www.youtube.com/watch?v=4DjBq2O0XXk)

------
jpatokal
I found the raw video awesome enough to submit a few days ago, but the super-
detailed explanation in this blog raises this to a whole new level of epic.

I wonder if youngsters who didn't grow up thinking 1 MHz is a perfectly
acceptable CPU speed and that 640 KB is a whole lot of RAM will understand
what the fuss is about here...

~~~
ekianjo
> I wonder if youngsters who didn't grow up thinking 1 MHz is a perfectly
> acceptable CPU speed and that 640 KB is a whole lot of RAM will understand
> what the fuss is about here...

They won't, and thus they don't understand the value of demos in the first
place. But don't blame them, even back in the days I knew many people who
could not appreciate demos either.

~~~
skrebbel
Stop generalizing. Plenty will.

~~~
ekianjo
> Stop generalizing. Plenty will.

Look at modern forum discussions on Smartphones, it's full of youngsters
comparing specs of their respective phones without grasping at all what they
mean. Or maybe you are referring to a highly educated subset of youngsters,
but that's very few of them.

~~~
tripzilch
You should meet some of these youngsters :) A few weeks ago, I taught 12 year
old kid to write a mandelbrot zoomer in Processing--no, really I taught him
the required basics of complex numbers, took about 2.5-3 (very fun) hours[0].
Afterwards we had a big A3 piece of paper with notes, graphs and formulas all
over it, which he took home with him.

Two weeks later I met him again, of course I wanted to continue teaching him.
I thought maybe the Z^4+C variant would be a nice step further. Turns out he
already had written the Julia version of the zoomer ... on his Android phone,
while waiting at the dentist's ... O_o

Now I used to be all about fractals when I was his age, later grew up to be a
4096 byte democoder (around 2000), I was sooo jealous, what I wouldn't have
given for a pocket computer _that_ powerful! Lucky kid :)

Aaanyway, apart from sharing this cute anecdote, my point is this. There's
some extremely clever young bastards out there. Now there's not many, but
they're also not extremely rare. I know a handful, although this particular
guy is probably the cleverest right now. They come from all sorts of
backgrounds, too. But the important part is not that they're highly educated,
but that they're highly _educatable_ , and given the opportunity to develop
this. Their little hacker brains are hungry enough :)

Having written their own graphics code, running against CPU-limits (although
we hit float precision before it got really slow), I'm sure he'd appreciate
some of the awesomeness of stunts on a limited machine like the 8088. In fact
one of the earlier graphics I programmed with him was something very similar
to the circle interference pattern described in the article (it was mostly his
own idea, playing with interference patterns, I just carefully nudged towards
the classic demo effect, because I knew it'd result in a really cool effect).

[0] He already knew how to draw stuff with Processing. He already tried to
look online for how the Mandelbrot algorithm works, but couldn't quite wrap
his head around it. Missing bit of information turned out to be
(a+b)(c+d)=ac+ad+bc+bd, hadn't learned that in school yet. If you explain i as
rotation by 90 degrees, the rest becomes quite intuitive, visually. We also
took a quick skim through that great WebGL-illustrated article about "How to
fold a Julia fractal" (google it), while coding, way better than the five
stapled pages I had when I was 15 :) :)

~~~
Yen
What environment did he use on an android? I would think programming on a
small touchscreen device would be really tedious, wondering if there's
something good out there.

~~~
tripzilch
Didn't ask. Probably just the standard Processing for Android thing? I'll get
some more information next time I seem him.

Yes it's probably tedious, but when you're really into something, at that age,
you just persevere because you _can_ :) Also young children are incredible on
touchscreens, small fingers and they grew up with them :) [I'm the opposite, I
have some stress/burnout related tremors in my fingers, some mornings (when
it's worst) I can hardly control the device's apps, let alone typing]

------
kyberias
These days I find it slightly weird that they don't share the source code of
the demos or related tools. Demo scene has this wonderful alpha-male thing
going.

~~~
diydsp
I am sorry for whatever pain you're feeling.

I don't feel the ethics of open-source apply here or to any works of art.

Programs like Microsoft Word, which have a near-monopoly on the work that
literally billions of people do everyday to be productive and feed their
families, when not distributed in a free manner, are tools of unjust power.

I don't feel this person's expressive work, a lifelong dream with no monetary
gain, that might merely provide a few weeks of bliss and 15 minutes of
internet fame a little inspiration for the rest of us, then become horribly
forgotten to the sands of time, is a tool of unjust power.

> alpha-male thing going

I am sorry for whatever you've experienced that leads you to sexist comments
like this. I hope that it's able to work its way through your life until you
reach the point that you can simply share another person's joy without feeling
entitled to have a piece of it yourself.

~~~
kyberias
Lol, I don't feel any pain. :)

I can see that my words can be easily interpreted like you did but I was quite
literal is stating that the culture is WONDERFUL. I have no hatred against
them although I'm definitely an outsider.

Also, I was literal when I referred to the closed source of these creations. I
don't demand or expect them to release any source code. I'm merely wondering
whether these demos would be MORE interesting with the source code released as
well. As always. In 2015 it seems slightly weird that they don't.

To be clear, I definitely share the joy the authors feel accomplishing these
feats. Deep respect.

But I don't get your sexism comment. I maintain that the sub-culture involves
some behavior that can be described as "alpha-male". Maybe I was inaccurate
with the wording. Could have said "competitive" as well.

~~~
skrebbel
> But I don't get your sexism comment. I maintain that the sub-culture
> involves some behavior that can be described as "alpha-male". Maybe I was
> inaccurate with the wording. Could have said "competitive" as well.

I don't think you've ever been to a demoparty :-)

If there's any "alpha male" behaviour whatsoever, it's purely in a self-
ridiculing way. It used to be there, back in the nineties when all demosceners
were insecure teenage nerds and some of them felt a need to compensate for
something. That part of the scene is gone for twenty years now, but it's still
a lot of fun to make references to that part of history.

My favourite example of this is the demo "Regus Ademordna" by Excess [0]. The
title is the reverse of "Andromeda Sucks" in Norwegian. Andromeda is another
Norwegian demogroup who had _just_ made a reappearance in the demoscene at the
time after having been gone since those nineties. They hadn't gotten the memo
that all that alpha male stuff was something of the past, so they took serious
offense, much to the enjoyment of the rest of the scene.

[0]
[https://www.youtube.com/watch?v=NkVbS4CTtfc](https://www.youtube.com/watch?v=NkVbS4CTtfc)

------
faragon
Amazing. 80×100 resolution 1024-color mode doing CRT controller (6545) tricks,
on plain IBM CGA adapter. Also fullscreen bitmap rotation and 3D rendering on
a 4.77 MHz Intel 8088 CPU. Wow.

------
blackhaz
I can't believe my eyes. 256 colors on CGA?! HOW?!

~~~
draugadrotten
On Atari ST, it was possible to change the palette colors in midflight, and
using that trick to display more colors on screen. The technique was used by
Spectrum 512 and Quantum Paint and many demos.

 _" QPs 512 mode is pretty straightforward; its only color limitation being
that it can display a maximum of 40 colors on a single scan line. Mode 4K uses
a special technique called "interlacing" in order to display a supposed 4,096
"colors_ \--
[http://www.atarimagazines.com/startv3n2/quantumpaint.html](http://www.atarimagazines.com/startv3n2/quantumpaint.html)
see also [http://www.atari-
wiki.com/index.php/ST_Picture_Formats](http://www.atari-
wiki.com/index.php/ST_Picture_Formats)

~~~
aerique
I'm not aware of the specifics anymore but by using well-placed NOPs when
drawing a scanline you could make the borders disappear on the Atari ST thus
getting a higher resolution. This was one of the many advantages the Amiga had
over the Atari ST: being able to use the whole screen while the ST had a
screen like a letterboxed movie except on all four sides of the screen.

~~~
qrmn
Toggling the register to change between 50/60Hz mode at the right time,
specifically, could reset the counter in the ST's Video Shifter and trick it
to carry on drawing screen when it should have been outputting blank border.
Top and bottom borders were however much easier, because you could open them
with just one carefully-timed interrupt each (I used Timer-B, which was linked
to horizontal blank and counted lines, if I remember!). Described as far back
as the B.I.G. Demo (check the scrolltext).

Opening the left and right borders however required doing it for each line, I
recall, which uses a _lot_ more CPU time. (Unless, of course, there's a trick
I don't know!)

Spectrum 512 uses NOP timings to swap the palette at regular intervals
throughout the screen; the "4096 colour interlaced" mode just flickered
between one colour and another on alternate blanks to give the visual
impression of flickery intermediates (before the STe came out, which used the
high bit of each nybble to _actually_ have 4-bit-per-channel palettes of 16).
That technique, in turn, came from the C64 scene, as did the border trick,
though I think they wrote the screen address?

What's old is new again: plenty of lower-end TN LCD panels pull the same
colour trickery to fake 8-bit colour from 6-/7- bit panels (or, reportedly,
10-bit deep colour from 8-bit ones in some cases).

This demo is crazy. I don't think CGA even gives them a VBL to hang off!
Wonderful.

~~~
vidarh
You're close with respect to the C64 border trick.

On the C64, you could pull the border in the width / height of one character
in order to support smooth scrolling (coupled with registers to set a 0-7
pixel start offset for the character matrix). This was done so that the
borders wouldn't move in/out while scrolling.

By turning this option on/off precisely timed, the VIC graphics chip never
found itself at the "correct" location to enable the borders, and so never
did.

Opening the top/bottom borders was done very early because it didn't require
much timing.

Opening the left/right borders with static sprites happened soon afterwards.

Opening the left/right borders with moving sprites was particularly nasty
because the VIC "stole" extra bus cycles from the CPU for each sprite present
on a given scan line, so if you wanted to move sprites in the Y position and
open the borders, you needed to adjust your timing by the correct number of
cycles for each scan line, often done by jumping into a sequence of NOP's.
There were additional complications, but that's the basics.

I think DYSP (Different Y Sprite Positions) on C64 was first achieved in 1988.

------
raverbashing
Now, I wonder what people could be doing 10, 20 years from now on today's
hardware

Probably a lot less tricks up the sleeve are possible (especially with 3D
accelerators and dependency on a lot of proprietary AND very complex software)

Or maybe just drop to the framebuffer and push pixels like it has always been
done

~~~
tripzilch
> Probably a lot less tricks up the sleeve are possible (especially with 3D
> accelerators and dependency on a lot of proprietary AND very complex
> software)

Sorry but doesn't that mean there are a LOT MORE tricks up the sleeve
possible? They might be hard to find if you have to reverse a proprietary
driver, but why not? :)

------
ptype
Wow, amazing. Would be interesting though to know which hacks make the
emulators fail

~~~
Maakuth
I bet their color hacks end up doing something weird or possibly nothing with
emulators. No idea about the implementation details of any PC emulator, but it
sure would be tempting for an emulator to just display the bitmap copied from
the emulated machine's display buffer instead of emulating the actual display
adapter. Even emulating the adapter wouldn't be enough, I suppose, as the
color tricks rely on some unintenional (from vendors PoV) bleed behavior,
probably on the analog side of things. So I guess a proper emulator should
also emulate some of the analog details of the display itself.

~~~
ajenner
There are emulators (for other targets) which do emulate NTSC decoding
properly, but until I did the research for this demo nobody understood how the
CGA card generates composite signals well enough to be emulate it properly. I
have some code which I hope to be adding to DOSBox (and any other emulators
that want it) soon.

~~~
acqq
Do you know, how was the video uploaded to the YouTube made, if the emulators
don't work? Was it really recorded with the plain camera?

~~~
Scali
We used a real PC obviously (my machine), and a capture device plugged into
the NTSC composite output of the CGA card.

~~~
Zardoz84
There is a link to download the music of the ASCII credits at the end ? I
really like it.

Also, I would like to propose a little challenge. What do you think that could
do you archive on this "virtual" computer :

\- Specs : [https://github.com/trillek-team/trillek-
computer](https://github.com/trillek-team/trillek-computer)

\- Implementation/Emulator : [https://github.com/trillek-team/trillek-
vcomputer-module](https://github.com/trillek-team/trillek-vcomputer-module)

In a short resume :

\- 32 bit RISC CPU running at 100KHz (to 1MHz)

\- CGA like text mode, but can be remaped to any desired RAM address and the
font can be changed on the fly. Fixed palette of 16 colours

\- VSync interrupt + two timers

\- 128KiB of RAM (to 1MiB)

\- Floppy drive (max 1.2 MiB)

Extra : Displaying it against a Commodore 1084S monitor ->
[http://imgur.com/GuTVEdj](http://imgur.com/GuTVEdj)

------
rcthompson
I'm interested to know why it breaks all the emulators. Certainly I wouldn't
expect emulators to reproduce all the graphical glitches that this takes
advantage, but what is it doing that actually _crashes_ them?

~~~
ajenner
The emulator I was mostly using for development (DOSBox) doesn't actually
crash itself, but here's a nice example of how it goes wrong. In the final
part of 8088MPH there is an instruction that modifies the instruction after
it, but then (on the real hardware) the old version of the instruction is
executed (because it's already in the CPU's prefetch queue when the
modification is made). DOSBox executes the modified instruction because it
doesn't simulate the prefetch queue. I tried moving the to-be-patched
instruction above the patching instruction but that made the code take longer
to execute and it no longer met the precise timing requirements necessary for
the best audio quality.

~~~
userbinator
Modifying code in the prefetch queue was a well-known anti-debugging/emulation
trick in the pre-Pentium days but this is probably the first time I've heard
it being used as an optimisation - seriously amazing work.

------
tdicola
This is the coolest thing I've seen all year, way to go to everyone involved!

------
mark-r
Slight technical inaccuracy at the start: the Z80 also required a minimum of 4
clocks for a memory access, it wasn't better than the 8088 in that regard.

~~~
mikepavone
Nope. An opcode fetch cycle takes 4 clocks, but a normal read or write is only
3.That's why an instruction like LD a, (hl) takes 7 cycles. I believe the
GBZ80 always takes 4, but it's more of a separate 8080 clone that borrows from
the Z80 a bit than an actual Z80.

~~~
mark-r
Thanks for the correction. I can't believe I've had it wrong for so many
years.

------
brink2death
Possibly stupid question: does the demo run inside DOS or is it completely
self-contained? (I would assume the latter.)

~~~
pjc50
Not clear. They mention writing a custom loader, reading EXE files; whether
that's calling the INT21h DOS functions or the INT13h BIOS disk loader is
unclear. I don't see why they'd bother to write a custom disk filing system
when you can just use the DOS one.

DOS is more something you run "on" than "in". It's just a filing and utility
layer, no task scheduler, no memory protection.

~~~
ajenner
It runs on top of DOS, using INT 0x21 to spawn the effect executables. The
loader doesn't read the executables directly, it just decides when to start
and stop them, plays music and bounces some text up and down while the effects
are loading/decompressing/precalculating. Writing a demo that is it's own OS
and has total control of the machine is definitely something like I'd like to
have a go at in the future, though.

------
alxndr
> "[Step] 5. Effect starts, magic occurs"

------
72deluxe
Incredibly impressive.

