Photon, the old GUI for QNX, supports some degree of multithreading. You can update the values of various display elements such as meters, progress bars, and text displays from other threads. That GUI is often used for control panels, with real-time data coming in that needs to be current on the display, which is why they made that work.
It's going to be interesting to see what happens when someone implements a new GUI in Rust. The classic problem with GUIs has been that ownership management for both allocation and locking was a big problem. Rust's borrow checker can help a lot with the bookkeeping needed to get that right.
I don't remember discussing either of those while writing Qt.
Rather, single-threading followed from two big points. First, the user calls the program rather than the other way around, and the user isn't multithreaded. Second, there aren't performance problems with the UI, and certainly none that require fighting #1.
Some programs need more than one thread. But that need does not originate within the UI, and complicating the UI for it would comply with RFC 1925 point 5.
Locks turn out to be unnecessary for multithreaded code. A ringbuffer of messages is only slightly more complex, yet the gains are massive. Both in performance and simplicity.
It's possible to manage a ringbuffer without any locks. The trick is to have a counter for the producer thread, and a counter for each consumer thread. Whenever the producer wants to know "Is it safe to add a message?" it takes the minimum of all consumer counters, modulo the size of the ringbuffer. The result is the smallest index that the producer must not write beyond.
In other words, you always know when you're producing messages too quickly and need to wait on the consumers. And the consumers know when there's a message waiting -- they just look at the producer's counter. Blazingly fast, and no locks. Cool trick!
Ringbuffers (or any queue, actually) don't solve the problems associated with locks. Often when people talk about locks when discussing concurrent algorithms, they don't necessarily mean the particular construct called a lock but any synchronization mechanism that might end up suspending some computation when waiting for another to happen, so a queue may well be a lock in this kind of discussion.
To see the duality between locks and queues note that any queue can be implemented with any list/array and a lock, and a lock itself is nothing more than some atomic operation, plus a queue plus a mechanism to suspend computation. Whether that suspension involves an actual parking of the kernel thread or spinning, is an implementation detail from the perspective of the algorithm.
You can use queues without deadlocks, but then you won't have the same advantages locks can give you (transactions), or you can have the same advantages, but then get the same problems.
There are multiple ways to make a lock-free ringbuffer; most use some sort of similar trick with per-consumer atomic counters, though I hadn't heard of something as simple as the minimum-modulo trick described in the gp! Some implementations:
I highly recommend Nitsan Wakart's blog. He covers a lot of interesting ground, all in Java. Probably best to start at the early blog posts and work your way forward.
One can probably argue it's not even more complex at all.
Locks just look simple because it's "just chuck a mutex around it", in reality though it's all a mirage. Locks are comfortable, not simple - is how I like to put it at least.
Well, its also a poorly motivated dream unless you have a sexual fetish for the Java style of abstraction where everything is completely independent. Who the heck wants multi-threaded GUIs?
User interaction proceeds sequentially, so most objects don't require locks. The rare exceptions in my software is rendering or IO on a separate thread, and these don't nicely fit abstraction models, as mentioned in other posts they involve C-style state machines like OGL.
A multi-threaded GUI seems like a great way to kill performance, with little advantage.
Anyone who has seen the responsiveness of e.g. AmigaOS under heavy load next to many modern systems might be inclined to want (more) multi-threaded GUIs.
Heavy use of multi-threading to disconnect GUI updates from the actual work was essential to making that happen.
AmigaOS sacrificed throughput over responsiveness all over the place (e.g. something as trivial as cut and paste from a terminal would easily involve half a dozen threads with message passing).
You don't need separate threads for every little component, though.
Was the UI really heavily multithreaded compared to today's systems? I thought there was a lot of events and message-passing going on in AmigaDOS much like in current GUI systems.
Depends on what you mean by "heavily multithreaded". And yes, you're right, there were lots of events and message passing, but that message passing went between different threads.
I mean to write this up for a blog post and do some proper diagrams, but here's a rough overview of the state transitions when handling terminal IO for AmigaOS and reimplementations of the API, like AROS (this is where I got hands on experience with it - I extended the AROS terminal handling):
Low level interrupt sources will be handled by "devices" such as "keyboard.device" and "gameport.device" (the latter handles the mouse/joystick ports). These will feed input events into "input.device".
The input.device is opened by any component that wants to handle input events. This includes the "console.device" which is responsible for providing a "raw" terminal in a specified rectangle in a window. It handles low level input processing, and turns keyboard and mouse input that are relevant to the console/terminal into higher level events which it passes on to clients, as well as take commands (such as "move cursor to position (x,y)" or "write text xyz" and render the terminal).
Above the console.device sits the console-handler (applications can, and often do open console.device directly if they want a low level interface). This is responsible for opening a window, creating a console.device that covers the window, and "cooking" low level input into higher level input and vice versa for output.
The "gadgets" (widgets; buttons etc. in the windows) will be handled directly by intuition (the GUI system) in a separate high priority thread.
If you then do cut-and-paste, there are additional complications: "conclip" needs to be running. This receives requests to cut or paste via messages, and mediates access to the clipboard.device. The clipboard.device again manages reading/writing files in the relevant clipboard volume. That will involve talking to the appropriate filesystem handler, which again may write to a device (such as trackdisk.device for the floppy drives).
Pretty much all of these components will run as their own separate threads. And most of their interaction is via messages put on a queue.
So if you choose to "cut" a section by pressing a key combination, an interrupt will be fired to keyboard.device, which will add an event via the input.device which the input handler thread ("task" in AmigaOS) will pass to the console.device thread via a message, which will pass it on to the console-handler, which will pass a message to conclip, which will pass the data on to the clipboard.device which will send a message to the relevant filesystem, which may send a message to a low level device. After sending a message to conclip, the console-handler will send a message back to the console.device if there's any rendering required.
The reason for all of this is that coupled with careful priorities (UI rendering and input is running in high priority threads), the system appears very responsive, while a lot of this happens behind the scenes.
E.g. the clipboard system on the Amiga has to deal with a system where the clipboard could have been reassigned from the ramdisk where it'd usually be, to floppy, so it really couldn't reasonably be "inline" without making the system unresponsive.
In that respect AmigaOS was more multithreaded: There's all kinds of things we consider fast enough to do "inline" now that was put behind a thread-boundary because it was either unpredictable or too slow to be done inline back then.
But I think the thing missing is that each of these threads ran on the same CPU (right?), and thus things like torn reads and writes weren't an issue, so they didn't need to use expensive std::mutex or std::atomic everywhere.
Today, we have single concurrent execution by enforcing a single GUI thread, at the time of the Amiga they had single concurrent execution because that was the only execution they had.
They ran on the same CPU, but you still had to use mutexes and atomic operations because it had fully pre-emptive multi-tasking, and so your application had to be ready to lose execution from one instruction to the next. Torn reads/writes definitively were an issue for higher level code (unless you could be guaranteed that your construct would translate to a single m68k instruction), and needed to be kept in mind even in assembler in some situations (see below).
In fact, you'll find lots of Amiga-software being more brutal and enforcing serial-execution for critical section by using Forbid()/Permit() pairs, which will outright disable the scheduler, or even using Disable()/Enable() (disables interrupts too). Of course this is/was very much frowned upon for all but implementing atomic operations, though even this is not guaranteed to be totally atomic in an Amiga system without taking care.
The need to protect against other threads/tasks is/was one of the first things hammered into the heads of Amiga-developers exactly because it was so new to most, who would usually come at it from 8-bit home computers where the standard procedure was that you fully controlled the computer except perhaps for some very trivial interrupt handlers (which most software would take over control of anyway).
And while each of the normal threads would be running on a single CPU in a basic Amiga, any number of devices could DMA - the Amiga depended heavily on this -, and additionally both the Copper (very basic "GPU" of sorts used to set up "display lists" to manipulate various registers etc. though not really limited entirely to graphics) and the Blitter could access memory at any time too, so you very much had to at least in theory be prepared to deal with memory changing during execution of an individual instruction if working in "chip-memory" (the Amiga roughly works with two types of memory: "chip-memory" is memory where auxilliary hardware can steal bus-cycles from the CPU; "fast-memory" is memory that only the CPU can access).
Also note that while unusual, there were true multi-processor Amiga-setups: There were "bridge boards" for the A2000 which effectively were an x86 PC on a card, where the "graphics card" was a buffer in chip memory that would get displayed in a window, and which would receive input from the Amiga keyboard and mouse. There were also PPC accelerator boards (a release of AmigaOS4 for "classic" Amiga hardware with PPC accelerator boards exists; it basically runs everything it can on the PPC, just like for "new" Amiga hardware), though usually these would disable the M68k while the PPC was executing stuff (but I'm not sure if this was enforced by hardware or if it was done by the OS patches for simplicity).
I used to love to tell people of all the different CPUs in my A2000: A 68020 with the 68000 as fallback (if you soft-disabled the 68020 for compatibility) on the motherboard. A 6502-compatible core on the keyboard (the A500 and A2000 keyboards had an embedded SOC chip with a 6502 core + RAM + PROM as the keyboard controller). A Z-80 on my harddisk controller. An 80286 accelerator board + 8086 fallback on my bridge-board... Of course of the 68020/68000 and 80826/8086 pairs only one of each architecture could ever be running at once.
It was a fantastic machine. Unfortunately Commodore all the way through (from long before the Amiga) was an absolute dysfunctional disaster of a company, and it was a miracle they lasted as long as they did (and a testament to the calibre of people that kept saving the company from self-inflicted wounds)
The biggest problem being perpetual under-investment in R&D, and management meddling that systematically whittled away at the lead they once had. E.g. the archetypical example is the Amiga4000. On one hand it is the "flagship"; the biggest, fastest classic m68k Amiga produced.
On the other hand, it arrived late, was ridiculously expensive, and was slow for what was there. The problem? New management wanted to start all projects over from scratch and put their stamp on them.
IDE, for example, was suddenly pushed onto engineering. Without understanding that the Amiga used SCSI for a reason: IDE of the time loaded the CPU too much. Fine on a single-tasking OS, or on machines with more CPU, but the Amiga was built around offloading everything. It was the only thing that kept it competitive in the face of mounting problems for Motorola with upping the speed of the M68k range (work was underway to evaluate alternative CPUs; PA-RISC was the lead contender at the time; in the end Commodore went bankrupt before making a decision, and third parties chose PPC).
The A4000 was the result: IDE dragging down IO performance; a broken memory sub-system due to rushed redesigns; a butt-ugly case compared to the sleek A3000, and trying to compensate for the other problems by going for a 68040, but going "cheap" and picking one of the slower versions and yet stil ending up too expensive.
The truly crazy thing, though, is that as they were doing this, the "A3000+" was pretty much done. It didn't have quite as fast a CPU, but was a step up from the A3000. It had AGA (the last custom chips that the A4000 also got), and a range of other improvements, such as a DSP providing high-end sound (8x CD quality channels), and that could also double as a built in modem. And it kept SCSI...
The best part? It was far cheaper than the A4000, and would've been ready much faster. Of course Commodore had to axe it...
Being a fan of the Amiga at the time was painful...
It still introduces context switches and insertion into a queue, and I can tell from first hand experience tuning the terminal code for AROS that even running in a single address space, using a single CPU and no MMU on modern hardware, being careless about how you do the message passing will still kill your performance.
Widgets aren't static; presumably you want them to keep updating (spinners, size changes, status updates) when the user is interacting with other elements.
Modern single-threaded UI frameworks don't go into a loop while you're interacting with something. Say you click on a button and drag. This enters the UI framework as a "mouse down" event followed by several "mouse dragged" events. After handling each event (or between events), the framework can decide to do other work, like updating a spinner.
Yes, event loops are quite an old idea. I just put "modern" there to exclude old systems that encouraged polling. For example, I'm pretty sure classic Mac OS entered a loop while tracking the mouse in menus.
Multithreading the GUI adds a lot of responsiveness. Sure, even your phones CPU has 2+ cores at 1+ Ghz, but if it still shudders then your OS programing model is wrong.
PS: I have yet to see a single threaded GUI I can't make shudder while playing video.
You want a multi-threaded GUI when it's running a long task. Otherwise it won't update until it's finished running, there's no way to cancel the task, etc.
Typically GUI work is instantiated with the GUI toolkit on the call stack. It calls foo.onClick(), etc. Now, if one particular onClick starts long-running task, then there are three possible designs:
Either that particular onClick() starts a worker thread and returns before the worker is done.
Or the GUI toolkit delivers the onClick() in a thread of its own, e.g. from a pool of workers.
Or everything is done in one thread, and the UI blocks.
The last one seems sucky, but the insidiously sucky one is the one in the middle. That's where every user's implementation of onFocusOut() must take care to lock because all of bar.onFocusOut(), foo.onFocusIn(), foo.onMouseUp() and foo.onClick() are called concurrently in four different worker threads. The tail wags the dog.
Sure. You put a message queue in there and create command objects for all the updates that tasks might want to make to the UI. It's not hard; in fact it's thoroughly mechanical, so it's exceedingly tedious to program.
So why isn't the computer doing it for me? I'd happily sacrifice some performance if I could just write the change I wanted to make in the thread where I wanted to make it, and have the computer take care of the bookkeeping.
You can just send a closure to the UI thread and have it executed there, provided you are using a good enough programming language.
Or start the long-running computation from the UI code in a way that returns a promise, and chain the UI update on it, in a way that causes that continuation to be scheduled on the UI thread.
Or have an UI that can be updated from any thread (but not simultaneously) and take the big UI lock.
BeOS did this fairly well, although I would say the biggest problem is when you activate the UI and the underlying app has become unresponsive... When your app crashes, you want your UI to reflect the crashed state of the app and similarly become unresponsive.
The developers of BeOS went on to do Android and went with the single thread model, partly because BeOS had a big problems with buggy apps full of race conditions and deadlocks.
Though IMO the Android API is incredibly confusing for a lot of developers. I've found very often that some devs (the more junior ones) don't understand that services and activities are just objects all hanging off a single process and event loop. Bizarre hacks to let services "communicate" with activities when a simple static global would have worked fine tend to abound.
Those bizarre hacks help the service don't crash when the service needs to talk back to an activity that just got replaced between request/response cycle, because the user had the strange idea to rotate the phone.
If you had just used a static there'd be no need to talk to the activity directly. The bizarre hacks tend to arise because Java devs seem to consider static globals as icky.
I haven't heard of this before, so I went searching. The only pieces I could find are:
Several ex-Be employees went to work for Danger after the company told to Palm. Some of them moved on to Android, which was co-founded by Danger co-founder Andy Rubin and acquired by Google. Others stayed on at Palm, but ended up joining Google after PalmSource (which was spun out of Palm) was acquired by Access. (http://readwrite.com/2011/06/29/a-look-back-at-the-beos-file...)
Today (June 2004), Baron [Arnold] and a bunch of other ex-Be engineers are working at Danger. (http://www.osnews.com/story/7265)
So, "Baron Arnold and a bunch of others" went from Be to Danger Inc., which was acquired by Microsoft in 2008. Andy Rubin went on from there to become one of the four founders of Android, Inc., but I can't find any indication of Andy Rubin having worked for Be, and none that other people followed him from Danger.
Is there more to it?
Edit: a few more citations (still rather vague):
Many of Be's engineers moved on to a company called Danger, the company behind the Sidekick. When Danger founder Andy Rubin left the company in 2003 to found Android,[...] many of those developers went with them. (http://www.slideshare.net/newsworthy2457/this-os-almost-made...)
As for Android, Andy Rubin, the founder of Android inc and current head of Android at Google, also worked at Danger, a company where several Be employees such as Baron Arnold ended up. The Kin project, which largely inspired the UI of Windows Phone 7, was created by Danger after it became part of Microsoft. Unfortunately after Microsoft axed the OS Danger was working on, Project Pink, and the Kin was a dismal failure, nearly all of the Danger employees left Microsoft (including of course Andy Rubin), leaving little if any shipped code behind. (http://forums.sonicretro.org/index.php?showtopic=25221) (Not sure if calling Rubin "the founder" is misleading since they were 4 co-founders.)
Edit 2:
Danger were made up with more than just ex-Apple employees. A number of the engineers from Be Inc ended up there (Ficus Kirkpatrick, for example, and Baron Arnold also) and Danger was the genesis from which Android was born. (http://www.osnews.com/comments/27498?view=flat&threshold=0&s...) (Not providing details on what influence Danger had on Android other than Rubin, either.)
The [Racket gui toolkit](http://docs.racket-lang.org/gui/) is successfully multithreaded, and manages this despite being implemented on top of existing non-multithreaded and non-thread-safe toolkits. It's hard, but it's certainly not impossible.
One of the things the article doesn't mention, which is odd, is that frames are precious objects that need to be produced in whole. If you have multiple threads all producing parts of the scene, when do you actually kick off the frame so that the user never sees a part of a frame? Even if your worker thread is just updating the text of a few fields when it's done, you need to batch that up into a single atomic update on the screen.
Which means you need transactions, at which point you basically have a single thread and a message queue so why complicate things internally and add all the lock/unlock overhead?
What about functional reactive programming? And what about flux pattern? Didn't these resolve this "failed dream" of multi-threaded GUI update? (I am asking real, not rhetorical, questions.)
Don't see how this has anything to do with anything.
For example, the "flux" pattern is similar to model/view which can be done by a single thread (ie, Qt). In this scheme the main thread services events sequentially, calling the necessarily object's methods on the same thread.
The discussion is about the 'dream' of pushing a button from a worker thread.
I thought than in flux, the worker thread could send "push button" message to the dispatcher and it will deal with it asynchronously. Likewise, the updates to the views are asynchronous (and can be potentially multi-threaded) to the actions that are coming into the dispatcher queue.
Flux is like an event model, every change goes through a single dispatcher and in the store(s) the code is single threaded. If you use multiple stores and have multiple components subscribing to them (to paint the gui) you aren't really guaranteed any order of the events, last write wins and the components basically redraws everything inside them for every event.
If the CPU bottleneck is on the store, maybe you need some sort of transactional store, where the change is signalled to the views only after a transaction completes? Just thinking out loud.
I am not sure what exactly is the original author trying to solve here. But it seems to me that if you want to expand the single GUI thread to multiple threads, for whatever reason, flux and/or FRP may be a good start.
A big reason why multithreaded UI doesn't work is that the platform code (win32, cocoa) generally has poor support for interacting with OS objects from multiple threads. For instance, NSView instances can only be operated on from the main thread (https://developer.apple.com/library/mac/documentation/Cocoa/...).
Handling OpenGL context access from multiple threads is terrible, too. I'm quite excited about the additional threading niceties that we're getting in Vulkan, which should allow separate threads to re-render parts of the UI and then send the command buffers back to the main thread.
the author says that with AWT it was a design decision against thread save GUI, one problem here - there is no way they could have made AWT thread save - AWT uses the native windowing toolkit as its basis; on Windows this implies that all clients of the Win32 sdk are from the same thread as the event loop; there is no other means to work with USER32.dll - you have to send windows messages to the window handle, that's only possible from the thread of the event loop
it would have been possible without locks if the client would have sent IPC messages to the other threads event queue and then wait until the result comes back. This way all access to the GUI widgets is serialized via the event loop.
In windows they had apartment model COM objects that used to work by this principle (ouch, i feel so old now ...)
Maybe creating complex, multithreaded systems should better be left to AI. It could be able to track all the required factors for correctness in a way we humans mostly can't do.
It's going to be interesting to see what happens when someone implements a new GUI in Rust. The classic problem with GUIs has been that ownership management for both allocation and locking was a big problem. Rust's borrow checker can help a lot with the bookkeeping needed to get that right.