
Elite's crazy tokenized string routine - luu
http://xania.org/201406/elites-crazy-string-format
======
spogbiper
Memory used to be precious. Last year I ported some text adventures games from
the TRS-80 machines of the early 1980s to Android. While working on one of the
games (Bedlam), I went through a process very similar to the article. Looked
for strings that I knew had to be there in some form, ended up having to use a
step debugger (the one in MESS) to trace through the code and discover the
print routines and then the magic behind them.

It turned out Bedlam packed strings into contiguous bytes of ram, but using 5
bits per character. Byte 1 had all of character 1 in the lower 5 bits and the
first 3 bits of character 2 in the upper 3 bits, and so on. Of course with
only 32 values the character set was a bit odd. As I recall you had most
letters but not Z or Q,the numbers 0,1, and 3, a period, comma, and space.
Something like that. The routine to unpack the characters was quite small,
just used shifts to build an offset into a table containing the ASCII values.

The games also ran in a somewhat ingenious "virtual machine", with all the
game logic being expressed through sets of very tightly encoded high level
rules rather than implemented directly in assembly. In the Android port, I
literally load the original ROM into an array and then just process these same
rules verbatim using a java implementation of the VM. Kind of amazing to me
how portable yet efficient the design is.

~~~
gergoerdi
Using a VM was common for text adventures of the time; check out the Z-Machine
used by Infocom games: [http://inform-
fiction.org/zmachine/standards/](http://inform-
fiction.org/zmachine/standards/)

Like this game you describe here, the Z-Machine also stored characters in less
than one byte, packing three characters (plus an extra control bit) into two
bytes ([http://inform-
fiction.org/zmachine/standards/z1point1/sect03...](http://inform-
fiction.org/zmachine/standards/z1point1/sect03.html))

------
AlyssaRowan
Directly related to the planet name generation routine, if you look closely -
the same tokens are used there (although that's not all of it). Part of this
legacy carries over to the sequels, even the recently-released _Elite:
Dangerous_ , although the "old worlds" are patched in by hand along with a
bunch of discovered stars.

A small side-effect, however: Bell & Braben had to try several galaxy seeds in
_Elite_ before they happened upon one that _didn 't_ generate the planet Arse!

Lots of things of the era did things like this, of course. 8-bit BASICs would
often tokenise on input to reduce memory consumption and the amount of lexing
needed during the interpreting.

Even as late as the PlayStation, similar things were still very common
practice: take a look at the English translations of Final Fantasy 7 through
9, or Chrono Cross, for excellent examples of the type of thing, which (if I
recall correctly? It's been a few years!) don't use ASCII (but map to offsets
in the tilesets), use control characters to handle colours and the like,
sometimes have digraphs and in the case of Chrono Cross, due to a lack of disc
space (caused by English text being bigger than Japanese text of an equivalent
meaning), the localisation team got highly creative and made an accent engine
for the 44 or so different characters so that quite a lot of the lines could
be reused and changed on-the-fly (as the 'developer ending' documents).

------
fidotron
This link was worth it just to find out jsbeeb exists.

It always surprises me when you see such extreme space saving efforts,
especially on machines where you would expect the processing overhead to be
high. I've noticed that as the tech improved this concept remains important
because I/O bandwidth is so often the real limiting factor. Things like piping
input/output through gzip can actually speed batch processes up dramatically.

~~~
illumen
Yeah, even in-memory processes can be sped up by compression/decompression.
When you're not running entirely in cache, which many old games can be done.

Doom 1+2 (including all the data) can be entirely loaded into modern high end
CPU cache.

------
SigmundA
Weird seeing this here, have been playing a lot of Elite: Dangerous lately, if
you are a fan of the original Elite it is a true successor:
[https://www.elitedangerous.com/](https://www.elitedangerous.com/)

Here is a video with interviews of the orginal developers that discuss the
extreme need for byte savings:
[https://www.youtube.com/watch?v=Rapa3VfUWfs](https://www.youtube.com/watch?v=Rapa3VfUWfs)

Of course you guys know David Braben is one of principle founders of the
Raspberry Pi foundation.

------
fsk
It isn't that crazy when you realize that, on those old systems, every byte
counted. If he could save 50-100 bytes with that weird method, it was worth
it.

~~~
Rabidgremlin
The entire universe was procedurally generated because you couldn't hold all
the data in the memory of an early microcomputer.

I have a write up on how it worked here:
[http://blog.rabidgremlin.com/2015/01/14/procedural-
content-g...](http://blog.rabidgremlin.com/2015/01/14/procedural-content-
generation-creating-a-universe/)

------
junto
My first computer was a BBC B Microcomputer when I was about 7 or 8 years old.
I played Elite with my brother for about one and a half years. Great memories.

Anyone other Acorners remember Repton? Also a cool game. You had to wait 10
minutes for the game to load off a cassestte though!

~~~
murkle
Repton (without the 10 minute wait :)
[http://bbc.godbolt.org/?disc=sth%3ASuperior%2FRepton.zip&aut...](http://bbc.godbolt.org/?disc=sth%3ASuperior%2FRepton.zip&autoboot#)

~~~
junto
That's awesome. Thanks for sharing!

------
RandomCode
The C64 did not use ASCII. It has upper and lower exchanged.

My operating system uses LZW compression.

[http://www.templeos.org/Wb/Kernel/Compress.html](http://www.templeos.org/Wb/Kernel/Compress.html)

