It would be nice to have a superoptimizer for Z80 in a form of web tool. You provide a C expression to it and the tool will return in turn the shortest code sequence. Demosceners would love it.
Nice, but I wouldn't call it a visualization. More like a "live preview" or "instant compiler" or something.
Also, why is it so laggy? It feels like it's being sent to a server for compilation, but it looks like it's all in JavaScript. (And V8 is honking fast, should be nearly instant.)
It's all done in a worker running asm.js modules. I've found it to be fastest on Firefox. Would love to speed it up; haven't found the magic emscripten flags yet.
This brings back a lot of memories. I remember in high school printing out a sheet of Z80 opcodes (at size 6 font), writing a program on a sheet of paper, and then manually compiling that program and typing in the hex codes (using the AsmPrgm instruction to prefix the program). It was great for boring classes as long as you sat far enough in the back. As a prank, we started grabbing people's calculators and turning off the LCD display using some sequence of assembly. Restarting the calculator wouldn't fix it, so you either had to memorize the button sequences to re-enable the display or remove the batteries and let it sit for a while.
I did the same thing with my ZX Spectrum - writing programs in assembly language on paper, then manually looking up each opcode in the table at the back of the manual.
After writing out all the opcodes I'd then have to work out the values to use for any relative jumps, once I knew how long each instruction had been.
Finally the whole thing would be POKEd into memory, and tested.
Writing code like that, without a compiler/assembler was simultaneously a lot of fun and utterly exhausting. I know I got into the habit of putting in random "nops" just in case I had to add new functionality in later revisions - that would allow some unconditional jumps to be patched into place to avoid having to change offsets, etc.
The TI graphing calculators have a mode for writing your own programs. Typically this is done using TI-BASIC, but if you prefix the program with the AsmPrgm symbol, it will allow you to directly write Z80 opcodes as hex values. Combined with a TI-provided guide for making bcalls (calls into common routines that have already been written, like changing LCD settings or multiplying floats), it's not too difficult to write a short ASM program by hand.
Depends on how it expects the parameter and the result.
If the parameter is the second item down on the stack and the compiler wants the result in hl with the stack unchanged.
pop bc
pop hl
push hl
push bc
isn't redundant stack manipulation. it's moving stack values to bc and hl while keeping the stack unchanged.
Of course at another level a compiler should be able to inline x=>x*1 but that depends on the build architecture. If it's a linkable function then the contents may be opaque to the code that calls it.
In SDCC function params are always passed on stack, but return value in HL. So pop hl/push hl moves function parameter to return value register HL. (pop/push bc is of course just used to get past return address value in stack.)
Seems odd compiler doesn't pass parameters in registers in a relatively register rich z80 arch.
There's also something else: Try shifting unsigned char x to right. >> 1 through 3 produce correct results, 4 changes the shift to some optimization too early, and the >> 5 the correctly optimized version. It's definitely not a very good compiler.
at the time Z80 was popular ('87 ish) there were few/no tools outside of Hisoft Devpac (http://www.secarica.ro/images/zx/genp351.gif) neither was there any internet / forums etc. to learn how others were solving things - like that bit rotation example
when I found out people were moving the stack pointer register to the screen memory base address and using stack pushes (instead of mov) to move data - to save a clock cycle or two each byte move - it blew my mind - hacks like that would be common knowledge in minutes now
> when I found out people were moving the stack pointer register to the screen memory base address and using stack pushes (instead of mov) to move data - to save a clock cycle or two each byte move ...
So the user might occasionally see interrupt stack frame on screen?
memory is hazy, but using SP to move the memory was fast enough you could blit most (not all) of a screen in one refresh cycle - which is, I think, why games like Uridium (https://www.youtube.com/watch?v=TvULd4zHz8Y) were running the main gameplay in a smaller area
also, on that Uridium vid - notice how they are using the standard font but spacing it out vertically (so they could save memory)
Ahh. I remember doing lots of TI82/83 calculator programming in Z80 and never considering using a C to Z80 compiler since it was so suboptimal. C was considered slow and a memory hog.
Most of these (other than the sprite routine) aren't that bad. Every byte counted when you only had 26KB free.
I really like stuff like this. This tool is amazing for C++: https://godbolt.org/ It's really handy for popular architectures and it's nice that it has "real" compilers (GCC vs. clang vs. ICC.) I find it fun to compare small pieces of code between the different compilers.
----
Shameless plug for an abandoned project of mine: (note: these demos appear to work in the latest Chrome on a wide monitor - YMMV; I didn't know what I was doing.)
Back in 2012 I wanted to learn JS and had some ideas about a UI for teaching C programming with good visualizations. I was interested in building tools that would allow you to visualize in-memory data structures and step through the code (so you could watch a sorting algorithm, or tree manipulation, that kinda stuff.)
Anyway the full thing never panned out but I've got some lower-level demos:
(Hit "run" and click "faster" a few times) It started off as straight-MIPS but I realized emulating a real machine wasn't necessary for the overall goal (teaching C and the C abstract machine) so I started adding convenient op-codes. All words (I think) of memory had tags to specify if the memory was initialized (maintained by the VM) stack/heap (maintained theoretically by the C runtime) or code/data. The goal was to make a C machine that was maximally friendly for education.
Here's a C REPL that compiles to that assembly and runs it on the VM:
http://csclub.uwaterloo.ca/~j3parker/things/evalc/evalc/ it compiles on keyup (I had vague goals of very quick feedback) and there is a line-edit below the source code that you can put C expressions into (e.g. type "fib(12)" without the quotes and hit enter - it should print 233 if you haven't modified the code.)
The parser is mostly-complete C99 (typedefs were an issue - the first parser used a YACC-like generator which is known to have troubles here - the next version used a hand-rolled recursive descent parser. That one didn't get finished but also had way better error messages, as expected.) The compiler (semantic analysis + codegen) didn't support much of the language though, e.g. it doesn't know how to code-gen for > but it knows <. So, the compiler is not at a very usable state.
I also had a V2 of the VM with an insane fixed 8-bit instruction set (with 200+ instructions) for a stack-based (not register) machine with a 32 bit address space. It would generate (with a poorly written bash script) the VM from the LaTeX documentation for the CPU which was weird. Here's the PDF: http://csclub.uwaterloo.ca/~j3parker/things/evalc/evalc2/vm/... the .tex file (in the same directory) has the impl of the instructions (not rendered to the PDF - that was the intent though) and it generates this: http://csclub.uwaterloo.ca/~j3parker/things/evalc/evalc2/vm/... It never did get finished but it had some promise.
---
It was a fun short project that got abandoned. I still think there would be value in an educational implementation of C, something that conforms to the spec but is in no way performant--instead, UB sanitizers everywhere and special hooks inside the runtime to aid in visualization and debugging (e.g. I want to look at memory and see all the allocations with links back to what line of code allocated this word of memory, etc.)
Thank you very much, it is very interesting! I started with a weird VM with 16-bit opcodes for variable-length instructions (VLIW) and now I'm trying to write a compiler for it. At the beginning I considered a stack-based CPU so I could use less bits per opcode, and implement Forth-like VLIW stuff, but I discarded it for a register-based VLIW VM, so I could write a C compiler easier.
Given that Z80 is super easy to learn assembly language, this is a great introduction on how C code would internally. I would suggest give the option for 8085 too as the its so much similar.
As luck would have it, the very same site under discussion also happens to host a neat online "IDE" for learning all about the 6502-based Atari 2600¹.
It is a companion to the book, "Making Games for the Atari 2600". A wonderfully clear and concise primer that I thoroughly enjoyed, even though I had no interest in the Atari 2600 per se. The NES was my first console -- and naturally the one I'd always wanted to program. Since it shares the same 6502 processor, I used this book as an introductory text before diving into the murkier waters of NES development wikis and forums.
Other than the obvious 8 vs 16 bit issue: I don't know any platform where bit shift left is faster than addition.
Usually it's just as fast. But on some platforms, you need to first clear carry bit, because they only have rotate and rotate through carry instructions available.
Superoptimization: https://en.wikipedia.org/wiki/Superoptimization
I found this long abandoned project which is described in non-English, see the examples on the bottom of the page: http://www.ricbit.com/mundobizarro/superopt.php