Writing a good compiler to target the 6502 is difficult, because the CPU is so unlike modern hardware (8-bit with 16-bit addressing, only one register that isn't crippled, zero page, no arbitrary shift amounts, no multiply, weird addressing modes, etc. etc.) I'm always impressed when anyone gets something working :)
Basically every code on it it super-hardcoded for one thing.
Sure, you can use the zero page(the first 256 bytes of the memory) to act like 128 16-bit registers, but you know, it feels like a cycle waste most of the time. Self-modificating code is easier most of the time, but not reentrant.
 For those who don't have any exposure to 6502 assembler, here's a basic subroutine to copy blocks of memory - after ~25 years of not touching it so bear with me (and corrections welcome) especially with respect to syntax.
Lets say we have the high byte of a 16bit address we want to copy from in the A register, and the high byte of a 16bit address we want to copy to in the X register (often it wouldn't be unusual for the calling code to directly modify the src/dest addresses instead to set up the loop...), and for simplicity the low byte of both is 0, and the number of 256 byte chunks we want to copy in Y.
STA $src+2 ; the 6502 is little endian + 1 byte for opcode
LDA $0000,x ; the STA above overwrites the first two digits
STA $0000,x ; the STX above overwrites the first two digits
BNE $innerloop ; branch back to start unless X has wrapped
INC $src+2 ; update the high byte.
DEY ; reduce counter of blocks to copy.
I've written a "window server" for the Apple II (7-pixel aligned, window stack you could only add and remove rectangles from the top) with about 1K of code. It was not that difficult to reason about, but I had a much younger brain at the time.
But on top of that, a lot of software on the C64 at least (I'd like to think this didn't apply to any "professional" software, but I suspect you'd find a lot of hairy stuff there too) was not written with a macro assembler but directly into a machine code monitor with no ability to store labels or comments.
In many cases you'd be lucky to have a sheet of handwritten notes about which addresses contained which code (and yes, that meant re-writing bits and pieces of code if you wanted to insert something new).
For my part I used a machine code monitor without labels etc. for several years before I got a proper macro assembler (Turbo Assembler) with a full screen editor.
BASIC implementations of everything in that Youtube video came with it, and it had funky things like pre-emptively multitasking for BASIC programs (fairly simple - just use an interrupt or hook into one of the BASIC entrypoints and count time slices and shift some pointers around) + separate programmable sprite animations (so you e.g. could start sliding a sprite over the screen and let your BASIC program keep doing other stuff