Your code reminds me of the good old C64 (MOS 6502) days which had a similar architecture.
I have no clue how a Color Computer works related to the hardware details. But for C64 the logic for ROM / RAM visibility was special. If this is also the case, it can be used for a further speedup.
The ROM/RAM switch was only relevant for read but not for write access. The consequence was that you could poke/write to a ‚ROM‘ address (e.g. 40960) hereby changing the underlying RAM.
So your code might not need to switch between ROM/RAM, it only needs to ensure ROM readability at the start.
I.e. time required is proportional to a constant C (time per byte, which is being optimized) and task size n (number of bytes).