In this case, I guess this demo works (if it does, I haven't tested it on my flash cart) because the amount of data transferred to VRAM is small and the timing of things happens to line up so that it all gets sent at the right time, but for anything more complicated you can't rely on that.
There's two ways around this: either turn the screen off before copying your tile and map data over (and you can only safely turn the screen off when it's not rendering, too) or set up a system to blast as much data as possible into VRAM as quickly as possible when a frame is finished drawing.
To test if the screen is off, either spin until bits 0 and 1 of $FF41 == 01 or set up a interrupt handler that'll get fired on vblank.
OP, if you did write this code (or if the author reads this, or if anyone else is interested), feel free to hit me up with any questions. My primary side project is a GB game, and I'd love to talk shop.
this is precisely the sort of quirk i wanted to cover in this template so i appreciate hearing about it.
I recommend using the BGB emulator - you can configure it to crash into a debugger whenever it detects an invalid VRAM access. Plus the debugger is insanely useful.
For your demo checking $FF41 should be sufficient, unless you're writing a lot of data you'll usually have enough time in one vblank period. You could wait for $FF41 == 01 and then copy all the data, or check it each iteration of the loop (less efficient but will never fail). I've seen actual games do it both ways (decent amount of disassembly/ROM hacking experience).
Also, the "API"'s you have to learn are incredibly simple compared to learning API's for every framework and protocol in high-level programming. If devices are memory-mapped, than all I'll ever need to know is load and store. I just need to look at the registers that are available for each I/O device, what their addresses are, and what the bits represent in them. From there I access them all the same way: load and store. Couldn't possibly get any easier than just moving things around between registers and memory.
Compare that to high level development like web development, where you need an understanding of HTTP on top of one of infinite different web backends, written in different languages, all using different libraries with different naming conventions and different programming paradigms, plus knowledge of HTML, CSS, possibly another smorgasbord of frameworks, plus knowledge of decent security practices, databases, operations, and the list goes on.
It's obviously apples to oranges though, and I can produce a lot more real-world value doing web development than I can doing assembly programming, at least just as one person. My point is that I really like the simplicity and straightforwardness of writing assembly, at least in the few very short times I've done it. Maybe large assembly projects have all the same challenges as high-level development or worse.
All but the most simplest embedded systems are complex enough to require some pre-developed library or abstraction layer to stand between a single tinkerer and the hardware. Otherwise it would take you an eternity just to stand up a "Hello World!" The cost of a Linux-capable SoC is low enough that you can buy one for $5 in the form of the Pi Zero. If a system complex enough to run Linux only costs $5 to sell, and probably even less to make, what incentive is there anymore to produce hardware that's simple enough to work with directly at the assembly layer?
The last holdouts are extremely power-efficient embedded systems. And even then you can do those with C.
I'm an old fart who can remember the "good ol' days" just as well, and I still program the SNES as a hobby. But really, that's all it'll ever be anymore; a hobby for legacy systems. There's no reason anymore to even skip C, when architectures have been so fine-tuned for code written with C.
For instance, in our case and the case I know best, we could use a full 32 bit ARM arch in our hardware but we do not; we retarget a very simple microcontroller with a few kilobytes of memory for this because it is far cheaper and far less power hungry. Result is that we have to use asm or C with quite a high % of asm. We decided to use only asm because it makes the audits easier too. Because of this work I got to meet many manufacturers and these components are in a lot of appliances and are often coded in asm.
It's definitely out of fashion, but when something complicated breaks, where is the "adult" you call to figure it out? :)
I do 100% agree that commercial systems / enterprise / game dev is past the low level era, but the knowledge of the area is anything but obsolete for the reason I mentioned before.
my hope was to try to stand these up on systems that i actually use on a day-to-day basis, so it's kinda disheartening that that may be overly ambitious, but i'll write about it if i have any success :)
So one of those platonic store instructions might actually be a no-op; you can't tell by looking at it in isolation.
(The emulator itself started as a fork of Gameyob, but he ended up rewriting it entirely, piece by piece.)
Aladdin source code article was really nice, I wonder if we have anyone here who was involved in game development for gameboy at its time, who can share a similar article.
 - https://news.ycombinator.com/item?id=15458200