Back in the 80s when type-in program magazines were common, in France we had the wonderful "Hebdogiciel" with a perpetually running BASIC programming contest called "deulignes" -- which means "twolines".
"Deulignes" programs could target any platform, but must only take 2 lines of BASIC (most implementations allow only a limited line length, often 255 characters).
Some programs were really impressive; I remember one complete breakout implementation in MSX-BASIC for instance. People actually made whole (small) games in 2 lines of BASIC!
In a roundabout way BASIC on those machines is like modern high-level languages.
Slow and inefficient for sure, but most of the magic is happening directly at the hardware level (sprites, memory-mapped IO, etc.), so there's a surprisingly large amount of stuff you can do with very acceptable performance.
Do you think you'd be able to find that 2 line breakout? I'm currently doing a breakout implementation on my Commodore 64. In assembly though, so definitely more than two lines ;-) Nevertheless, a two line breakout in Basic would probably give lots of pointers on how to make things more compact.
Not the GP but you can find it on the "HEBDOGICIEL, les listings" Website [1]. Deulignes section, page 15 [2] (Casse-briques by Laurent Auble). Published in issue 104, page 12 (second box) [3]:
It didn't seem like any of your links was to a playable version, so I had a Goog' and a paste and came up with [1]. Pretty impressive game for that amount of code, I'd say.
I've never written a line of code on an actual MSX machine (even though I was around in the 80s), it's kind of amazing the amount of emulation Magic Power we casually throw around, these days. Massive thanks to all the emulation authors and (of course) retro computer archivists.
I remember this one well because I had an MSX, and like everyone else back then I didn't have that many games to play with, so I used this code quite extensively :)
If you enjoy this, then you should definitely checkout 8-Bit Show And Tell’s YouTube channel. The presenter, Robin, regularly does deep dives into code optimisation and fixes on Commodore 64 and other machines.
BASIC defaults to the tape device if you leave off the device number in LOAD/SAVE commands. So you can save another byte or two by saving to tape instead.
10SAVE"4",8:PRINT4
0801 0F 08 link to next line at $080F
0803 0A 00 line number (16-bit binary): 10
0805 94 token SAVE
0806 22 34 22 2C 38 3A ascii «"4",8:»
080C 99 token PRINT
080D 34 ascii «4»
080E 00 -EOL-
080F 00 00 -EOP- (link = null)
As we may see, "SAVE" has been compressed already to a single byte (0x94), as is "PRINT" (0x99). Moreover, the line number is a 16-bit binary integer, meaning, the number of decimal digits in the listing has no effect on the in-memory format.
BTW, abbreviations of BASIC keywords work, because of how upper-case/shifted letters are encoded in the PETSCII character set: they have their sign-bit set. (So normal letters are all smaller than 0x80, and shifted characters are >= 0x80. We may also note that codes > 0x80 are used exclusively for tokens in the stored BASIC text, discriminating them from any other text.) Now, the tokenizing routine uses a table, which also uses a set sign-bit: as a marker on the last character on each of the keywords, which are stored in a table. It will compute the difference of each letter in an input word to the entries in that table, and, if the difference is exactly 0x80 (the sign-bit), this means, (a) we arrived at the end of the word stored in the table, and (b) all the letters up until here did match (otherwise, we would have already exited the loop, in order to test the next keyword). We have a match! The routine then adds 0x80 to the table index of that keyword, and voila, there is your BASIC token.
Notably, if we're dealing with single-byte values, for a difference of 0x80 it doesn't matter, which of the two bytes, this is the difference of, holds the bigger value. It's effectively unsigned and agnostic of which was the larger byte. For our tokenizing routine, this means it will only "know" that one character has the sign-bit set, while the other has not (but is otherwise the same), but it will not "know" which of the two this is. Therefore, adding the sign-bit to an input character will fool the routine into assuming, it already went over the entire keyword and hit the sign-bit set in the last character of the table entry. And we achieve this by shifting the character in the input text. And, voila, there is your abbreviated BASIC keyword.
(We can also see how the length of the input keyword doesn't contribute to the storage format, as it will be compressed to a token, which is 0x80 + the table index of the keyword, anyways. We may also see why "iN" matches "input#" but not "input", because the longer version has to come first in the table, in order to match at all, and it will be also the first to be recognized by the erroneous match.)
Yes. Mentioned in the link as well: "As a program is typed into the BASIC interpreter, it's tokenised: any keywords in the line get replaced by token values before being stored in memory. We can see in this line that SAVE has been replaced by command token $94".
Yes it does but the amount of bytes the author speaks about regards the amount of characters used for storing the program. That's why he uses the : and removes the spaces.
I don't know anything about C64 or C64 BASIC, but would it be possible to intentionally write a shorter binary which will break the interpreter and do what we want instead? For example jump directly to a middle of the kernel ROM routine (akin to ROP in the modern days), or use a bad address in the "next line" offset etc.
In Commodore BASIC there's already SYS, which lets you jump to an arbitrary address anywhere in the 64k address space, including ROM. You can even include raw bytes in a BASIC program and have the CPU execute them as machine code.
however encoding such program in BASIC would take much more amount of commands/bytes than writing it in BASIC itself. You would need DATA statement and POKE FOR LOOP... In case of such a small scenario BASIC wins
they're stored as 16 bit little-endian word, so unless it's used for goto/gosub (whose targets are stored in petscii) the line number makes no difference.
The BASIC code is only in it's full textual form on screen. The moment you press return on it, it's tokenized, and it's stored tokenized both in memory and when saved. Unlike modern systems, the full textual representation of the code is never stored anywhere.
When I SAVE a program in C64 BASIC and LOAD it again the syntax doesn't change no matter what I do, add spaces or not, use shorthand or not, colons, etcetera. So I get the feeling that my whole program gets saved as a string and then parsed, not tokenized and saved.
Also there is a line limit in C64 BASIC that would overflow if certain shorthand would be expanded and for beginners to see their fully written keywords being transformed to shorthand after loading would be even more confusing.
The keywords are tokenized, the line number is converted to a 16bit integer, leading spaces are stripped (which is why some "formatted" BASIC uses ":" as the first character in a line, like the following), everything else is kept intact.
It wasn't that it didn't continue, but that this was unique to a branch of languages that largely were sidelined.
And the tokenization didn't prevent you from style differences anyway - as the article points out it e.g. keeps spaces etc. It only tokenized a few things, like keywords and line numbers.
(EDIT: in the late 90's I worked on a project written in Word BASIC.... It was also tokenized and that was used as an opportunity to translate the keywords in the localised versions of Word. But someone had managed to write a bunch of code in the Danish version and somehow exported it as text and imported it into the Norwegian version - the languages are similar enough that it was really hard to tell (no syntax highlighting, and they'd edited a bunch before realising and I had the fun job of untangling it... Yay...)
I think I read about a tool for the Acorn BBC micros which you could apply to your program to remove unnecessary spaces etc. in the tokenised form and shave off a few bytes.
This had the side-effect that you could still display and (presumably with a bit more mental effort, read) the program listing fine, but re-entering a line as shown in that listing would fail because the computer depended on the spaces to do the parsing, even if they were redundant after the tokenisation happened.
On the ZX Spectrum, numeric values were saved as both text, and in a five-byte floating point format. So making lines shorter often involved using keywords to avoid that: NOT PI, SGN PI, VAL "2" etc.
Atari BASIC tokenized source a bit more thoroughly... eliminated spaces, parsed constants into their floating point representation, etc. It did enforce formatting to a degree.
I was always surprised that it did all that, but wasn't any faster than Commodore BASIC.
"Deulignes" programs could target any platform, but must only take 2 lines of BASIC (most implementations allow only a limited line length, often 255 characters).
Some programs were really impressive; I remember one complete breakout implementation in MSX-BASIC for instance. People actually made whole (small) games in 2 lines of BASIC!
Here's an example page : https://archive.org/details/hebdogiciel-french-098/page/n15/...