Has this been the case for you?
Also, 8088mph is super cool; great job on it!
My "cooperative threading" consists of just having a "wait(n)" function which is called from the CPU Execution Unit emulator to emulate everything else for n cycles (i.e. the CPU Bus Interface Unit, the bus itself and everything else connected to it including the DMA controller and interrupt controller). That wait() function is the major bottleneck according to profiling, so if I modify it to do as little as possible (most of the time do very little other than increment a counter and compare to the top item in a priority queue) that would probably help a lot.