
Multithreaded toolkits: A failed dream? (2004) - antigizmo
https://weblogs.java.net/blog/kgh/archive/2004/10/multithreaded_t.html
======
Animats
Photon, the old GUI for QNX, supports some degree of multithreading. You can
update the values of various display elements such as meters, progress bars,
and text displays from other threads. That GUI is often used for control
panels, with real-time data coming in that needs to be current on the display,
which is why they made that work.

It's going to be interesting to see what happens when someone implements a new
GUI in Rust. The classic problem with GUIs has been that ownership management
for both allocation and locking was a big problem. Rust's borrow checker can
help a lot with the bookkeeping needed to get that right.

~~~
damienkatz
Rust prevents races but not deadlocks, which is one of the bigger problems in
multithreaded GUI code.

~~~
sillysaurus3
Locks turn out to be unnecessary for multithreaded code. A ringbuffer of
messages is only slightly more complex, yet the gains are massive. Both in
performance and simplicity.

It's possible to manage a ringbuffer without any locks. The trick is to have a
counter for the producer thread, and a counter for each consumer thread.
Whenever the producer wants to know "Is it safe to add a message?" it takes
the minimum of all consumer counters, modulo the size of the ringbuffer. The
result is the smallest index that the producer must not write beyond.

In other words, you always know when you're producing messages too quickly and
need to wait on the consumers. And the consumers know when there's a message
waiting -- they just look at the producer's counter. Blazingly fast, and no
locks. Cool trick!

~~~
885895
I would like to read more about what you described here. Know of any good
articles or open source projects where this is done?

~~~
btown
There are multiple ways to make a lock-free ringbuffer; most use some sort of
similar trick with per-consumer atomic counters, though I hadn't heard of
something as simple as the minimum-modulo trick described in the gp! Some
implementations:

[http://mechanitis.blogspot.com/2011/07/dissecting-
disruptor-...](http://mechanitis.blogspot.com/2011/07/dissecting-disruptor-
writing-to-ring.html)

[http://www.boost.org/doc/libs/1_59_0/doc/html/boost/lockfree...](http://www.boost.org/doc/libs/1_59_0/doc/html/boost/lockfree/queue.html)
\- hard to find implementation details though

[http://moodycamel.com/blog/2014/a-fast-general-purpose-
lock-...](http://moodycamel.com/blog/2014/a-fast-general-purpose-lock-free-
queue-for-c++) (uses per-producer counters instead, and relaxes some ordering
guarantees; see comments)

------
frozenport
Well, its also a poorly motivated dream unless you have a sexual fetish for
the Java style of abstraction where everything is completely independent. Who
the heck wants multi-threaded GUIs?

User interaction proceeds sequentially, so most objects don't require locks.
The rare exceptions in my software is rendering or IO on a separate thread,
and these don't nicely fit abstraction models, as mentioned in other posts
they involve C-style state machines like OGL.

A multi-threaded GUI seems like a great way to kill performance, with little
advantage.

~~~
vidarh
Anyone who has seen the responsiveness of e.g. AmigaOS under heavy load next
to many modern systems might be inclined to want (more) multi-threaded GUIs.

Heavy use of multi-threading to disconnect GUI updates from the actual work
was essential to making that happen.

AmigaOS sacrificed throughput over responsiveness all over the place (e.g.
something as trivial as cut and paste from a terminal would easily involve
half a dozen threads with message passing).

You don't need separate threads for every little component, though.

~~~
fulafel
Was the UI really heavily multithreaded compared to today's systems? I thought
there was a lot of events and message-passing going on in AmigaDOS much like
in current GUI systems.

~~~
vidarh
Depends on what you mean by "heavily multithreaded". And yes, you're right,
there were lots of events and message passing, but that message passing went
between different threads.

I mean to write this up for a blog post and do some proper diagrams, but
here's a rough overview of the state transitions when handling terminal IO for
AmigaOS and reimplementations of the API, like AROS (this is where I got hands
on experience with it - I extended the AROS terminal handling):

Low level interrupt sources will be handled by "devices" such as
"keyboard.device" and "gameport.device" (the latter handles the mouse/joystick
ports). These will feed input events into "input.device".

The input.device is opened by any component that wants to handle input events.
This includes the "console.device" which is responsible for providing a "raw"
terminal in a specified rectangle in a window. It handles low level input
processing, and turns keyboard and mouse input that are relevant to the
console/terminal into higher level events which it passes on to clients, as
well as take commands (such as "move cursor to position (x,y)" or "write text
xyz" and render the terminal).

Above the console.device sits the console-handler (applications can, and often
do open console.device directly if they want a low level interface). This is
responsible for opening a window, creating a console.device that covers the
window, and "cooking" low level input into higher level input and vice versa
for output.

The "gadgets" (widgets; buttons etc. in the windows) will be handled directly
by intuition (the GUI system) in a separate high priority thread.

If you then do cut-and-paste, there are additional complications: "conclip"
needs to be running. This receives requests to cut or paste via messages, and
mediates access to the clipboard.device. The clipboard.device again manages
reading/writing files in the relevant clipboard volume. That will involve
talking to the appropriate filesystem handler, which again may write to a
device (such as trackdisk.device for the floppy drives).

Pretty much all of these components will run as their own separate threads.
And most of their interaction is via messages put on a queue.

So if you choose to "cut" a section by pressing a key combination, an
interrupt will be fired to keyboard.device, which will add an event via the
input.device which the input handler thread ("task" in AmigaOS) will pass to
the console.device thread via a message, which will pass it on to the console-
handler, which will pass a message to conclip, which will pass the data on to
the clipboard.device which will send a message to the relevant filesystem,
which may send a message to a low level device. After sending a message to
conclip, the console-handler will send a message back to the console.device if
there's any rendering required.

The reason for all of this is that coupled with careful priorities (UI
rendering and input is running in high priority threads), the system appears
very responsive, while a lot of this happens behind the scenes.

E.g. the clipboard system on the Amiga has to deal with a system where the
clipboard _could_ have been reassigned from the ramdisk where it'd usually be,
to floppy, so it really couldn't reasonably be "inline" without making the
system unresponsive.

In that respect AmigaOS was more multithreaded: There's all kinds of things we
consider fast enough to do "inline" now that was put behind a thread-boundary
because it was either unpredictable or too slow to be done inline back then.

~~~
frozenport
But I think the thing missing is that each of these threads ran on the same
CPU (right?), and thus things like torn reads and writes weren't an issue, so
they didn't need to use expensive std::mutex or std::atomic everywhere.

Today, we have single concurrent execution by enforcing a single GUI thread,
at the time of the Amiga they had single concurrent execution because that was
the only execution they had.

~~~
vidarh
They ran on the same CPU, but you still had to use mutexes and atomic
operations because it had fully pre-emptive multi-tasking, and so your
application had to be ready to lose execution from one instruction to the
next. Torn reads/writes definitively _were_ an issue for higher level code
(unless you could be guaranteed that your construct would translate to a
single m68k instruction), and needed to be kept in mind even in assembler in
some situations (see below).

In fact, you'll find lots of Amiga-software being more brutal and enforcing
serial-execution for critical section by using Forbid()/Permit() pairs, which
will outright disable the scheduler, or even using Disable()/Enable()
(disables interrupts too). Of course this is/was very much frowned upon for
all but implementing atomic operations, though even this is not _guaranteed_
to be totally atomic in an Amiga system without taking care.

The need to protect against other threads/tasks is/was one of the first things
hammered into the heads of Amiga-developers exactly because it was so new to
most, who would usually come at it from 8-bit home computers where the
standard procedure was that you fully controlled the computer except perhaps
for some very trivial interrupt handlers (which most software would take over
control of anyway).

And while each of the normal threads would be running on a single CPU in a
basic Amiga, any number of devices could DMA - the Amiga depended heavily on
this -, and additionally both the Copper (very basic "GPU" of sorts used to
set up "display lists" to manipulate various registers etc. though not really
limited entirely to graphics) and the Blitter could access memory at any time
too, so you very much had to at least in theory be prepared to deal with
memory changing _during_ execution of an individual instruction if working in
"chip-memory" (the Amiga roughly works with two types of memory: "chip-memory"
is memory where auxilliary hardware can steal bus-cycles from the CPU; "fast-
memory" is memory that only the CPU can access).

Also note that while unusual, there _were_ true multi-processor Amiga-setups:
There were "bridge boards" for the A2000 which effectively were an x86 PC on a
card, where the "graphics card" was a buffer in chip memory that would get
displayed in a window, and which would receive input from the Amiga keyboard
and mouse. There were also PPC accelerator boards (a release of AmigaOS4 for
"classic" Amiga hardware with PPC accelerator boards exists; it basically runs
everything it can on the PPC, just like for "new" Amiga hardware), though
usually these would disable the M68k while the PPC was executing stuff (but
I'm not sure if this was enforced by hardware or if it was done by the OS
patches for simplicity).

I used to love to tell people of all the different CPUs in my A2000: A 68020
with the 68000 as fallback (if you soft-disabled the 68020 for compatibility)
on the motherboard. A 6502-compatible core on the keyboard (the A500 and A2000
keyboards had an embedded SOC chip with a 6502 core + RAM + PROM as the
keyboard controller). A Z-80 on my harddisk controller. An 80286 accelerator
board + 8086 fallback on my bridge-board... Of course of the 68020/68000 and
80826/8086 pairs only one of each architecture could ever be running at once.

~~~
frozenport
I really enjoyed reading this post, and it looks like the Amiga was ahead of
its time.

~~~
vidarh
It was a fantastic machine. Unfortunately Commodore all the way through (from
long before the Amiga) was an absolute dysfunctional disaster of a company,
and it was a miracle they lasted as long as they did (and a testament to the
calibre of people that kept saving the company from self-inflicted wounds)

The biggest problem being perpetual under-investment in R&D, and management
meddling that systematically whittled away at the lead they once had. E.g. the
archetypical example is the Amiga4000. On one hand it is the "flagship"; the
biggest, fastest classic m68k Amiga produced.

On the other hand, it arrived late, was ridiculously expensive, and was slow
for what was there. The problem? New management wanted to start all projects
over from scratch and put their stamp on them.

IDE, for example, was suddenly pushed onto engineering. Without understanding
that the Amiga used SCSI for a reason: IDE of the time loaded the CPU too
much. Fine on a single-tasking OS, or on machines with more CPU, but the Amiga
was built around offloading everything. It was the only thing that kept it
competitive in the face of mounting problems for Motorola with upping the
speed of the M68k range (work was underway to evaluate alternative CPUs; PA-
RISC was the lead contender at the time; in the end Commodore went bankrupt
before making a decision, and third parties chose PPC).

The A4000 was the result: IDE dragging down IO performance; a broken memory
sub-system due to rushed redesigns; a butt-ugly case compared to the sleek
A3000, and trying to compensate for the other problems by going for a 68040,
but going "cheap" and picking one of the slower versions and yet stil ending
up too expensive.

The truly crazy thing, though, is that as they were doing this, the "A3000+"
was pretty much done. It didn't have quite as fast a CPU, but was a step up
from the A3000. It had AGA (the last custom chips that the A4000 also got),
and a range of other improvements, such as a DSP providing high-end sound (8x
CD quality channels), and that could also double as a built in modem. And it
kept SCSI...

The best part? It was _far_ cheaper than the A4000, and would've been ready
much faster. Of _course_ Commodore had to axe it...

Being a fan of the Amiga at the time was painful...

------
dnautics
BeOS did this fairly well, although I would say the biggest problem is when
you activate the UI and the underlying app has become unresponsive... When
your app crashes, you want your UI to reflect the crashed state of the app and
similarly become unresponsive.

~~~
mike_hearn
The developers of BeOS went on to do Android and went with the single thread
model, partly because BeOS had a big problems with buggy apps full of race
conditions and deadlocks.

Though IMO the Android API is incredibly confusing for a lot of developers.
I've found very often that some devs (the more junior ones) don't understand
that services and activities are just objects all hanging off a single process
and event loop. Bizarre hacks to let services "communicate" with activities
when a simple static global would have worked fine tend to abound.

~~~
pjmlp
Those bizarre hacks help the service don't crash when the service needs to
talk back to an activity that just got replaced between request/response
cycle, because the user had the strange idea to rotate the phone.

~~~
kllrnohj
If you had just used a static there'd be no need to talk to the activity
directly. The bizarre hacks tend to arise because Java devs seem to consider
static globals as icky.

~~~
pjmlp
Global variables are icky, regardless of the language.

------
samth
The [Racket gui toolkit]([http://docs.racket-
lang.org/gui/](http://docs.racket-lang.org/gui/)) is successfully
multithreaded, and manages this despite being implemented on top of existing
non-multithreaded and non-thread-safe toolkits. It's hard, but it's certainly
not impossible.

[Here's a
paper]([http://www.ccs.neu.edu/racket/pubs/icfp99-ffkf.pdf](http://www.ccs.neu.edu/racket/pubs/icfp99-ffkf.pdf))
about the system, and how it enables cool new stuff

------
kllrnohj
One of the things the article doesn't mention, which is odd, is that frames
are precious objects that need to be produced in whole. If you have multiple
threads all producing parts of the scene, when do you actually kick off the
frame so that the user never sees a part of a frame? Even if your worker
thread is just updating the text of a few fields when it's done, you need to
batch that up into a single atomic update on the screen.

Which means you need transactions, at which point you basically have a single
thread and a message queue so why complicate things internally and add all the
lock/unlock overhead?

~~~
kragen
Most GUI toolkits don't bother. I'm guessing you work in video games?

~~~
kllrnohj
Every GUI toolkit already "bothers" by virtue of being single threaded. You
never see a UI toolkit where this:

    
    
      btn.setText("hello")
      btn.setColor(BLACK)
    

might show a white-text "hello" for a single frame. If the UI toolkit was
thread-safe, though, that would become a possibility.

------
asgard1024
What about functional reactive programming? And what about flux pattern?
Didn't these resolve this "failed dream" of multi-threaded GUI update? (I am
asking real, not rhetorical, questions.)

~~~
frozenport
Don't see how this has anything to do with anything. For example, the "flux"
pattern is similar to model/view which can be done by a single thread (ie,
Qt). In this scheme the main thread services events sequentially, calling the
necessarily object's methods on the same thread.

The discussion is about the 'dream' of pushing a button from a worker thread.

~~~
Arnt
The thing is that the entity that pushes a button is a human, not a worker
thread.

~~~
jbergens
It is very common to have async http requests that "push button" to to say
when they resolve.

------
wrl
A big reason why multithreaded UI doesn't work is that the platform code
(win32, cocoa) generally has poor support for interacting with OS objects from
multiple threads. For instance, NSView instances can only be operated on from
the main thread
([https://developer.apple.com/library/mac/documentation/Cocoa/...](https://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/Multithreading/ThreadSafetySummary/ThreadSafetySummary.html)).

Handling OpenGL context access from multiple threads is terrible, too. I'm
quite excited about the additional threading niceties that we're getting in
Vulkan, which should allow separate threads to re-render parts of the UI and
then send the command buffers back to the main thread.

~~~
mpweiher
> For instance, NSView instances can only be operated on from the main thread

Cocoa is the toolkit, not the OS, so this is just Cocoa taking exactly the
approach described in the fine article.

~~~
wrl
It's also an unavoidable part of writing graphical code on OS X, so, for all
intents and purposes, it _is_ the OS.

------
MichaelMoser123
the author says that with AWT it was a design decision against thread save
GUI, one problem here - there is no way they could have made AWT thread save -
AWT uses the native windowing toolkit as its basis; on Windows this implies
that all clients of the Win32 sdk are from the same thread as the event loop;
there is no other means to work with USER32.dll - you have to send windows
messages to the window handle, that's only possible from the thread of the
event loop

~~~
MichaelMoser123
it would have been possible without locks if the client would have sent IPC
messages to the other threads event queue and then wait until the result comes
back. This way all access to the GUI widgets is serialized via the event loop.

In windows they had apartment model COM objects that used to work by this
principle (ouch, i feel so old now ...)

------
khgvljhkb
Isn't this problem directly related to mutating things in memory, rather than
having the UI components each be pure function?

------
danielfaust
Maybe creating complex, multithreaded systems should better be left to AI. It
could be able to track all the required factors for correctness in a way we
humans mostly can't do.

------
awtthrowa
Just throw the gui operations under a single global lock. No deadlocks, any
thread can update gui.

~~~
astrange
No deadlocks, you say?

