Hacker News new | past | comments | ask | show | jobs | submit login
Code Golfing in Commodore BASIC (imrannazar.com)
54 points by Two9A 11 months ago | hide | past | favorite | 40 comments



Back in the 80s when type-in program magazines were common, in France we had the wonderful "Hebdogiciel" with a perpetually running BASIC programming contest called "deulignes" -- which means "twolines".

"Deulignes" programs could target any platform, but must only take 2 lines of BASIC (most implementations allow only a limited line length, often 255 characters).

Some programs were really impressive; I remember one complete breakout implementation in MSX-BASIC for instance. People actually made whole (small) games in 2 lines of BASIC!

Here's an example page : https://archive.org/details/hebdogiciel-french-098/page/n15/...


In a roundabout way BASIC on those machines is like modern high-level languages.

Slow and inefficient for sure, but most of the magic is happening directly at the hardware level (sprites, memory-mapped IO, etc.), so there's a surprisingly large amount of stuff you can do with very acceptable performance.


That's actually really insightful somehow.


Do you think you'd be able to find that 2 line breakout? I'm currently doing a breakout implementation on my Commodore 64. In assembly though, so definitely more than two lines ;-) Nevertheless, a two line breakout in Basic would probably give lots of pointers on how to make things more compact.


Not the GP but you can find it on the "HEBDOGICIEL, les listings" Website [1]. Deulignes section, page 15 [2] (Casse-briques by Laurent Auble). Published in issue 104, page 12 (second box) [3]:

    1 SPRITEON:IFK=2THENR=-2:B=2*SGN(X-Z-4):RETURNELSEIFK=0THENCOLOR4,0,0:SCREEN2,1:DEFINTA-Z:J=186:X=140:Y=80:VPOKE14336,128:VPOKE14344,248:LINE(80,40)-(183,61),7,BF:LINE(78,7)-(184,J),2,B:LINE(Y,8)-(183,10),1,BF:R=2:B=2:Z=X:ONSPRITEGOSUB1ELSEIFL>5THENRUN
    2 K=2:S=STICK(0):Z=Z+2*(S=3)*(Z<174)-2*(S=7)*(Z>80):PUTSPRITE1,(Z,161):Y=Y+R:X=X+B:PUTSPRITE0,(X,Y),11:P=POINT(X,Y):IFP=0THEN2ELSEIFP=7THENR=-R:A=(X\4)*4:LINE(A,Y)-(A+3,Y+1),0,B:GOTO2ELSEIFP=1THENR=2:GOTO2ELSEIFY>180THENL=L+1:Y=80:K=1:GOTO1ELSEB=-B:GOTO2
Direct link to the source code [4]

[1] http://www.hebdogiciel.free.fr/

[2] http://www.hebdogiciel.free.fr/2lignes_15.htm

[3] https://archive.org/details/hebdogiciel-french-104/page/n11/...

[4] http://www.hebdogiciel.free.fr/hd-roms/2lignes/2lignes_MSX_n...


Great find, thanks!

It didn't seem like any of your links was to a playable version, so I had a Goog' and a paste and came up with [1]. Pretty impressive game for that amount of code, I'd say.

I've never written a line of code on an actual MSX machine (even though I was around in the 80s), it's kind of amazing the amount of emulation Magic Power we casually throw around, these days. Massive thanks to all the emulation authors and (of course) retro computer archivists.

Edit: typo, more compliments.

[1]: https://msxpen.com/codes/-NflxlFRyvOOYbih_5WK


I remember this one well because I had an MSX, and like everyone else back then I didn't have that many games to play with, so I used this code quite extensively :)


Ok, so no Whitespace. I wonder how the parser works that handles RETURNELSEIFK or ONSPRITEGOSUB1ELSEIFL.


The lines of BASIC code are tokenized according to the shortest words.


Well I don't remember in which issue it was, but the whole collection is here:

https://archive.org/details/hebdogiciel-french

I'm pretty sure it's really difficult to convert MSX-BASIC to 6502 assembly though... But there are lots and lots of great C64 deulignes too :)


+1 for mentionning the best (objectively :-) ) computer magazine of all time (in french, that is).

Et les dessins de Carali...


that's a nice amount of code there.


If you enjoy this, then you should definitely checkout 8-Bit Show And Tell’s YouTube channel. The presenter, Robin, regularly does deep dives into code optimisation and fixes on Commodore 64 and other machines.

https://www.youtube.com/watch?v=jhQgHW2VI0o


https://gkanold.wixsite.com/homeputerium/games-list-2023 Games written in ten lines of vintage BASIC. (Not related to the article but its title.)


^^^ An excellent event, which I hope more HN'ers will check out!

(If you're in Vienna, Austria, you can go to the Retro Gaming Museum and see the winning entries on a real computer, in person..)


BASIC defaults to the tape device if you leave off the device number in LOAD/SAVE commands. So you can save another byte or two by saving to tape instead.


In a C64 BASIC program keywords like SAVE and PRINT can be abbreviated:

https://www.c64-wiki.com/wiki/BASIC_keyword_abbreviation

That would shave off some more precious bytes!


This is how the program is actually stored:

  10SAVE"4",8:PRINT4

  0801  0F 08               link to next line at $080F
  0803  0A 00               line number (16-bit binary): 10
  0805  94                  token SAVE
  0806  22 34 22 2C 38 3A   ascii «"4",8:»
  080C  99                  token PRINT
  080D  34                  ascii «4»
  080E  00                  -EOL-
  080F  00 00               -EOP- (link = null)
As we may see, "SAVE" has been compressed already to a single byte (0x94), as is "PRINT" (0x99). Moreover, the line number is a 16-bit binary integer, meaning, the number of decimal digits in the listing has no effect on the in-memory format.

BTW, abbreviations of BASIC keywords work, because of how upper-case/shifted letters are encoded in the PETSCII character set: they have their sign-bit set. (So normal letters are all smaller than 0x80, and shifted characters are >= 0x80. We may also note that codes > 0x80 are used exclusively for tokens in the stored BASIC text, discriminating them from any other text.) Now, the tokenizing routine uses a table, which also uses a set sign-bit: as a marker on the last character on each of the keywords, which are stored in a table. It will compute the difference of each letter in an input word to the entries in that table, and, if the difference is exactly 0x80 (the sign-bit), this means, (a) we arrived at the end of the word stored in the table, and (b) all the letters up until here did match (otherwise, we would have already exited the loop, in order to test the next keyword). We have a match! The routine then adds 0x80 to the table index of that keyword, and voila, there is your BASIC token.

Notably, if we're dealing with single-byte values, for a difference of 0x80 it doesn't matter, which of the two bytes, this is the difference of, holds the bigger value. It's effectively unsigned and agnostic of which was the larger byte. For our tokenizing routine, this means it will only "know" that one character has the sign-bit set, while the other has not (but is otherwise the same), but it will not "know" which of the two this is. Therefore, adding the sign-bit to an input character will fool the routine into assuming, it already went over the entire keyword and hit the sign-bit set in the last character of the table entry. And we achieve this by shifting the character in the input text. And, voila, there is your abbreviated BASIC keyword.

(We can also see how the length of the input keyword doesn't contribute to the storage format, as it will be compressed to a token, which is 0x80 + the table index of the keyword, anyways. We may also see why "iN" matches "input#" but not "input", because the longer version has to come first in the table, in order to match at all, and it will be also the first to be recognized by the erroneous match.)


Doesn’t BASIC tokenize those abbreviations to the exact same in-memory bytes like the full keywords?


Yes. Mentioned in the link as well: "As a program is typed into the BASIC interpreter, it's tokenised: any keywords in the line get replaced by token values before being stored in memory. We can see in this line that SAVE has been replaced by command token $94".


Yes it does but the amount of bytes the author speaks about regards the amount of characters used for storing the program. That's why he uses the : and removes the spaces.


The stored version also uses token code points (single bytes >127), not literal tokens.


I don't know anything about C64 or C64 BASIC, but would it be possible to intentionally write a shorter binary which will break the interpreter and do what we want instead? For example jump directly to a middle of the kernel ROM routine (akin to ROP in the modern days), or use a bad address in the "next line" offset etc.


In Commodore BASIC there's already SYS, which lets you jump to an arbitrary address anywhere in the 64k address space, including ROM. You can even include raw bytes in a BASIC program and have the CPU execute them as machine code.


however encoding such program in BASIC would take much more amount of commands/bytes than writing it in BASIC itself. You would need DATA statement and POKE FOR LOOP... In case of such a small scenario BASIC wins


There are ways around that too... store bytes in a string or a REM and then execute it directly. No DATA or FOR needed.

There were some workarounds posted on of Robin's recent video.


Using line number 1 instead of 10 seems like an easy 1 byte save.


they're stored as 16 bit little-endian word, so unless it's used for goto/gosub (whose targets are stored in petscii) the line number makes no difference.


Are they stored as 16 bit words before or after parsing the BASIC code?


The BASIC code is only in it's full textual form on screen. The moment you press return on it, it's tokenized, and it's stored tokenized both in memory and when saved. Unlike modern systems, the full textual representation of the code is never stored anywhere.


When I SAVE a program in C64 BASIC and LOAD it again the syntax doesn't change no matter what I do, add spaces or not, use shorthand or not, colons, etcetera. So I get the feeling that my whole program gets saved as a string and then parsed, not tokenized and saved.

Also there is a line limit in C64 BASIC that would overflow if certain shorthand would be expanded and for beginners to see their fully written keywords being transformed to shorthand after loading would be even more confusing.


The keywords are tokenized, the line number is converted to a 16bit integer, leading spaces are stripped (which is why some "formatted" BASIC uses ":" as the first character in a line, like the following), everything else is kept intact.

10 for i = 1 to 10

20 : (arbitrary number of spaces) print "hello"

30 next

The short hand issue is real, too:

1?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?:?

expands into six lines of "1 print:print:....:print" that you can't simply edit because the limit is 80 characters (two lines)


It's an accident of history this didn't continue. So many code style wars could have been avoided over the eons.


It wasn't that it didn't continue, but that this was unique to a branch of languages that largely were sidelined.

And the tokenization didn't prevent you from style differences anyway - as the article points out it e.g. keeps spaces etc. It only tokenized a few things, like keywords and line numbers.

(EDIT: in the late 90's I worked on a project written in Word BASIC.... It was also tokenized and that was used as an opportunity to translate the keywords in the localised versions of Word. But someone had managed to write a bunch of code in the Danish version and somehow exported it as text and imported it into the Norwegian version - the languages are similar enough that it was really hard to tell (no syntax highlighting, and they'd edited a bunch before realising and I had the fun job of untangling it... Yay...)


I think I read about a tool for the Acorn BBC micros which you could apply to your program to remove unnecessary spaces etc. in the tokenised form and shave off a few bytes.

This had the side-effect that you could still display and (presumably with a bit more mental effort, read) the program listing fine, but re-entering a line as shown in that listing would fail because the computer depended on the spaces to do the parsing, even if they were redundant after the tokenisation happened.


On the ZX Spectrum, numeric values were saved as both text, and in a five-byte floating point format. So making lines shorter often involved using keywords to avoid that: NOT PI, SGN PI, VAL "2" etc.


Atari BASIC tokenized source a bit more thoroughly... eliminated spaces, parsed constants into their floating point representation, etc. It did enforce formatting to a degree.

I was always surprised that it did all that, but wasn't any faster than Commodore BASIC.


remember when people didn't have FDDs but cassete drives instead, because FDDs were too expensive? Pepperidge Farm remembers


PRESS PLAY ON TAPE


Username checks out.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: