> And having them be all three letters meant assemblers could pack the text ...

ralph · on July 8, 2012

BBC BASIC tokenised BASIC keywords before the line was stored in memory, e.g. PRINT was represented by a single byte. Some tokens needed more than one byte, especially in later versions.

But I'm talking about assembler here. BBC BASIC has a built-in assembler, e.g. 6502, Z80, or ARM, depending on the CPU it's running on. The assembler source in the BASIC program is not tokenised on input but stored as plain text. Instead, when those lines of BASIC, since that's what these embedded lines of assembler, wrapped in [...], are, get run the machine code is assembled at the address in BASIC's P% integer register variable and P% is moved on. At that point of execution BASIC must hunt for the mnemonic, stored in the "tokenised" BASIC line as plain text, in its table; the table I reference in the case of ARM BASIC. That table can be laid out as it is because each mnemonic is three characters long, e.g. mov, ldr, stm, and bic.

You mixing tokenising BASIC, which BBC BASIC did, and the embedded ARM assembler, which it didn't, and then adding in an "assembler's editor", and there wasn't one of them. Just lines of BASIC program, 10, 20, ..., some of which switched to assembler with a [.

vidarh · on July 9, 2012

I'm not talking about the BBC specifically all - the specific system is irrelevant - and so I'm not "mixing" anything. Many 6502 based systems did have assembler editors; many more had "monitors" that would assemble line by line on the fly - if not built in then as common extensions.

(in fact I did most of my M6502 assembly programming in a monitor, with a notepad to keep track of where various functions started; it was first a couple of years after a I started doing assembly that I got a proper macro assembler for my C64, and even then exactly because "every byte mattered" it was not at all uncommon to still stick to a monitor on a cartridge rather than have a macro assembler "waste" precious memory for the assembler and source text)

What I'm talking about is the general idea that longer keywords somehow would prevent an assembler from using fixed length records to represent lines, though reading it in context of what you wrote above I see your reference to fixed length records referred to the table used for assembling, not to the source lines in which case it makes slightly more sense to me.

Though not fully, as it'd be both faster and take less code to use custom search code to match the input against the available opcodes than to insist on a fixed length record - did a quick check and it should be doable to save at least a dozen or two bytes and reduce the average search time significantly by range checking and using lookup table for the first character. It might've been convenient to write the code with fixed records, but it's far from optimal in terms of either performance or code size, so it doesn't seem like code size bothered them that much in this case.

The "every byte mattered" applies to source too on these systems, and I actually find it really curious that they went to the step of supporting inline assembly but then didn't apply that optimization to the source given the limited memory and performance of these systems. Especially since the opcode itself makes a very obvious token candidate, potentially leaving the "assembly" step itself reduced to mostly copying data and applying address fixups.