I found the article a little confusing to be honest. I wonder if the author has written a traditional widget toolkit that isn't Firefox oriented.
In old widget toolkits, going back to the 90s here, there was a single UI thread per app that did all drawing and sending of commands to the graphics hardware. Keeping the UI responsive on such toolkits simply meant doing things as much as possible in the background. Touching the UI data structures from other threads was forbidden.
This architecture was adopted due to painful experiences with attempts to build thread-safe toolkits in the 80s such as Motif and the original Win32 widget library. None of it worked very well. Motif apps tended to be deadlock prone and Win32 was just a total API nightmare because it tried to hide the thread affinity of the underlying widgets, but didn't do a good job of it.
Some systems in the 90s like NeXT and BeOS started experimenting with moving the rendering into a separate process, the window server. Note that X Windows, despite having a window server, did not use "retained mode" rendering and still required the app to respond to do every repaint such as if an occluded window was moved to the top. Systems with this sort of retained mode rendering pushed "draw lists" into the window server so the OS could draw the window from memory without having to wait for the app to respond. This used more memory but meant that overall window UI stayed responsive and fluid even if apps were under heavy load. However, anything that could change the UI like needing to respond to user input, of course stayed in the app and on the UI thread.
MacOS X introduced a variant of the design, which I know less about, but I believe it basically just stored fully rendered copies of the image. Very RAM intensive and one reason MacOS X was considered very slow and heavy in the early days, but it made it possible to do things like the genie effect and exposé later on where the window server could animate the contents of windows without the app needing to be responding.
All that is OS level compositing. The app itself did not do any asynchronous compositing. So dragging windows around was fast, but animations inside the app didn't benefit.
So the next level of asynchronicity is toolkits that push app level rendering into a separate thread too. iOS, JavaFX, modern versions of Qt and modern versions of Android work this way. In these toolkits, the app's GUI is still constructed and manipulated on the primary/UI thread, but when the main thread "renders" the UI, it doesn't directly draw it, it constructs a set of draw lists for the apps own use. Again, these draw lists look a bit like this:
1. Clear this area of the window to this colour.
2. Draw a gradient fill from here to there.
3. Draw texture id 1234 with that shader at these coordinates, at 50% opacity.
4. Invoke remembered draw list 111.
5. Remember this set of instructions as draw list 222.
Once these lists are created they're handed off to a dedicated render thread which starts processing them and turning them into commands to the GPU via an API like OpenGL or Direct3D. Note that these APIs are, in turn, simply creating buffers of commands, which eventually get dispatched to the GPU hardware for actual rendering. Because the render thread doesn't run any callbacks into app code, and because it's cooperating with the GPU hardware to remember and cache things, it doesn't have that much actual work to do and can process simple animations very fast and reliably.
However, responding to user input is still done on the main thread. If you block the main thread, your UI will continue to repaint and may exhibit simple behaviours like hover animations, but actually clicking buttons won't work. That's because the most common thing to do in response to user input is change the UI itself in some way, and that must still be done on the main thread.
Thanks for this very informative post! However, I think you disparage good old win32 just a tiny wee bit. It was quite well designed for its time - perhaps a little too ahead of its time. Everything was async, to communicate between windows you needed to post events to queues, you could customize the window classes any way you wished - it was perhaps an extraordinarily sophisticated and flexible framework and lots of talented folks made it dance. It had a really good and long 20+ year run. (and still running strong in some desktop software)
To me, Win32 was already falling away from the more intriguing model, which was Win16.
Effectively, a Win16 system is/was basically exactly equivalent to an Erlang node, but one where your "processes" just happened to be paired, component-wise†, with handles to structs of GUI properties held in window-manager memory.
Like an Erlang node, Win16 is/was:
• green-threaded — i.e. they both have very low-overhead in-memory structures containing a tiny heap/arena and a reference to an in-memory delegate code module, through which execution would pass in turn. In Erlang, these are "processes"; in Win16, these are windows—i.e. actual windows, but also "controls" in the controls library.
• cooperatively-scheduled (yes, Erlang is cooperatively scheduled—when you're writing C NIFs. When you're executing HLL bytecode in the Erlang VM, this fact is papered over by the call/ret instructions implicitly checking reduction-count and yielding; but if you're writing native code—like in Win16—you do that yourself.)
• Message-passing, with every process having a message inbox holding dynamically-typed message-structs that must be matched on and decoded, or discarded.
• Offering facilities to register and hold system-wide handles to large static data-blobs (Erlang large-binaries, Win16 rsrc handles);
• Capable of doing IPC only by having processes post messages to another process's queue, and then putting themselves into a mode that waits for a response;
• Based on a supervision hierarchy: in Erlang, processes spawn children (themselves processes) and then manage them using IPC; in Win16, root-level windows spawn controls (themselves windows) and then manage them using IPC.
And, crucially, in both of these systems, real machine-threads are irrelevant. Both Win16 and Erlang were written in an era when "concurrency" was a desired goal but multicore didn't yet exists—and so they don't really have any concept of thread affinity for processes/windows. Both systems are designed as if there is only one thread, belonging to a single, global scheduler—and then their multicore variants (SMP Erlang and Win32) attempt to transparently replicate the semantics of this older system (though in different ways: Erlang allows processes to be re-scheduled between scheduler threads, while Win32 pins windows to whatever scheduler-thread they're spawned on.)
Win32 later introduced an alternative model to take better advantage of multithreading: COM "multi-threaded apartments", allowing windows (or, as a generalization, COM servers, which could now execute and participate in IPC on a thread without spawning any window-instances) to interoperate across thread boundaries without requiring a message be serialized and passed through the scheduler/window-manager process.
Well, I hope we're not getting sentimental about Win16. I've written more than my fair share of Windows API code and even though some of the concepts are now coming back into fashion again, largely due to the limits of browser engines, it really isn't an era I'd return to at all. Those models were all abandoned for solid reasons and Apple's failure to do so in time nearly killed the company.
I'm not sure your description of COM is quite right. The way I remember it, windows (HWNDS) were and still are objects with thread affinity. COM had the notion of a "single threaded apartment" which basically meant the COM server received RPCs using regular Window messages, and the MTA that you mention simply meant no inter-thread marshalling was done at all i.e. the object was inherently thread safe using locks or whatever. But Windows never changed to a model where the controls library was thread safe: changing the contents of an edit box from another thread, for instance, always required a context switch.
COM's usage and abusage of the window message system for fast inter-thread switching was only ever an ugly hack, which caused all kinds of weird problems and glitches. Most obviously it caused Windows' reliance on actually having a GUI layer to deepen considerably because now inter-thread/inter-process RPC - that on Linux and MacOS were well modularised into things like Mach IPC, SunRPC, DBUS etc - were totally tied to the windowing system.
IIRC the entire apartment concept was also stupidly designed, so there were constant problems with Microsoft using COM internally to implement some APIs, which would by default enter an STA and require the _caller_ of the API to pump the message queue otherwise the API they'd just used would silently fail to work. In the era I was working with it that fact wasn't always properly documented, I think.
> COM's usage and abusage of the window message system for fast inter-thread switching was only ever an ugly hack
This is effectively the entire point I wanted to dispute in my original post above; I guess I didn't get it across clearly enough.
As can be seen from how pre-COM (e.g. DDE, OLE) IPC was achieved on Windows, Microsoft truly believed that sending messages through the window-manager was a good way to do IPC. Their designs just kept doing it, over and over. STA COM messaging wasn't a hack; it was more of the same, a doubling-down on a long-standing design paradigm. MTA COM messaging was the hack—a way to make everything continue to look like HWND messaging (with an abstraction layer added), but have it transparently optimize to SHM IPC in cases where that was beneficial [and where the developer had ensured their ADTs were compatible with it.]
> Most obviously it caused Windows' reliance on actually having a GUI layer to deepen considerably because now inter-thread/inter-process RPC - that on Linux and MacOS were well modularised into things like Mach IPC, SunRPC, DBUS etc - were totally tied to the windowing system.
And, driven by the "evidence" of this repeated doubling-down above, I would conclude that this was the point: Microsoft considered Windows to be about, well, windows.
As I was saying above, a "window" in the Win16 sense was effectively the same thing as an Erlang process, but with some extra (optional-to-use!) GUI data stuck to it. The "correct" way to achieve async parallelism in Win16 was literally to create a "background window" that would register a kernel timer to send it tick events, and then do work when it received one. Which is the same thing you do if you want to write an Erlang process to wake up and poll some data source every so often.
My point isn't just that there are parallels here; my point is that Microsoft expected you to use the "window" primitive in exactly the ways that Erlang expects you to use the "process" primitive. Windows are the "process" primitive of Win16—they're tiny, green-threaded processes, and the window-manager is their scheduler.
That statement should make Microsoft's views on IPC clearer. Of course Windows IPC is achieved by putting messages through the window-manager. The window-manager is the scheduler†; knowing about other window-processes and routing messages to them is its job. It is DBUS—and it is also, given DDE, the equivalent of macOS's LaunchServices daemon.
---
† ...or rather, the window-manager is the scheduler for anything that's not a DOS VM. Windows, from 2.x through to 9x, was effectively a two-layer system: a bare-metal hypervisor "kernel" (KERNEL.EXE/KRNL386.EXE) with one Windows dom0 and N DOS domUs; and then an OS "kernel" running in that Windows dom0. That dom0 OS kernel is GDI.EXE, and cooperative message-passing is its scheduling algorithm. It also happens to do graphics. (It's a paravirtualized kernel that relies heavily on the hypervisor kernel above it, yes, but it's still the kernel of the Windows domain.)
The part about what iOS does isn't quite correct—apps pass a high-level layer tree to the render server with properties like corner radius and shadow offset, not a list of low-level drawing commands involving textures and shaders. That lets you animate these high-level properties without sending over new commands. I think the basic history is right, though!
> Note that X Windows, despite having a window server, did not use "retained mode" rendering and still required the app to respond to do every repaint such as if an occluded window was moved to the top.
My memory are quite fuzzy but I think that this is incorrect here, what you're describing was only the default behaviour: you could program your application to tell the X Window server to use a 'backing store' for your window and it would memorize your window to re-render it by himself.
There is/was an extension that did that at some point, but it was never used (on Linux) due to poor implementation and a desire to target low RAM machines. Mac's backing store implementation was quite heavily optimised - macOS can do on the fly memory compression and the window server uses shared memory to obtain the window bitmaps.
In old widget toolkits, going back to the 90s here, there was a single UI thread per app that did all drawing and sending of commands to the graphics hardware. Keeping the UI responsive on such toolkits simply meant doing things as much as possible in the background. Touching the UI data structures from other threads was forbidden.
This architecture was adopted due to painful experiences with attempts to build thread-safe toolkits in the 80s such as Motif and the original Win32 widget library. None of it worked very well. Motif apps tended to be deadlock prone and Win32 was just a total API nightmare because it tried to hide the thread affinity of the underlying widgets, but didn't do a good job of it.
Some systems in the 90s like NeXT and BeOS started experimenting with moving the rendering into a separate process, the window server. Note that X Windows, despite having a window server, did not use "retained mode" rendering and still required the app to respond to do every repaint such as if an occluded window was moved to the top. Systems with this sort of retained mode rendering pushed "draw lists" into the window server so the OS could draw the window from memory without having to wait for the app to respond. This used more memory but meant that overall window UI stayed responsive and fluid even if apps were under heavy load. However, anything that could change the UI like needing to respond to user input, of course stayed in the app and on the UI thread.
MacOS X introduced a variant of the design, which I know less about, but I believe it basically just stored fully rendered copies of the image. Very RAM intensive and one reason MacOS X was considered very slow and heavy in the early days, but it made it possible to do things like the genie effect and exposé later on where the window server could animate the contents of windows without the app needing to be responding.
All that is OS level compositing. The app itself did not do any asynchronous compositing. So dragging windows around was fast, but animations inside the app didn't benefit.
So the next level of asynchronicity is toolkits that push app level rendering into a separate thread too. iOS, JavaFX, modern versions of Qt and modern versions of Android work this way. In these toolkits, the app's GUI is still constructed and manipulated on the primary/UI thread, but when the main thread "renders" the UI, it doesn't directly draw it, it constructs a set of draw lists for the apps own use. Again, these draw lists look a bit like this:
1. Clear this area of the window to this colour.
2. Draw a gradient fill from here to there.
3. Draw texture id 1234 with that shader at these coordinates, at 50% opacity.
4. Invoke remembered draw list 111.
5. Remember this set of instructions as draw list 222.
Once these lists are created they're handed off to a dedicated render thread which starts processing them and turning them into commands to the GPU via an API like OpenGL or Direct3D. Note that these APIs are, in turn, simply creating buffers of commands, which eventually get dispatched to the GPU hardware for actual rendering. Because the render thread doesn't run any callbacks into app code, and because it's cooperating with the GPU hardware to remember and cache things, it doesn't have that much actual work to do and can process simple animations very fast and reliably.
However, responding to user input is still done on the main thread. If you block the main thread, your UI will continue to repaint and may exhibit simple behaviours like hover animations, but actually clicking buttons won't work. That's because the most common thing to do in response to user input is change the UI itself in some way, and that must still be done on the main thread.
I hope that helps.