Hacker News new | past | comments | ask | show | jobs | submit login
Towards Principled Reactive UI (raphlinus.github.io)
234 points by raphlinus on Sept 26, 2020 | hide | past | favorite | 97 comments

Lovely post, and an interesting problem area. Three additional perspectives that have been on my mind in this area:

1. Differential dataflow. Like Adapton and Incremental (which Raph cites), I have found DD to be hugely helpful for understanding which computations can be efficiently incrementalized, and how (including in Rust).

2. Operational transforms. Often times, I am working on UI as part of a larger distributed/networked system. In these cases (especially when I already want the UI to display other people and their actions), the division of responsibility between the UI, the network code, and the various internal models has always felt like a missed opportunity for better and more principled integration / compositionality.

3. Build systems. The problem of figuring out how to bring a UI efficiently up to date in response to some parts of the underlying state changing can also be seen as a build system, or cache coherency problem. From that standpoint, where do the paradigms of the Build Systems a la Carte paper fit into Raph’s pictured theory of reactive UI?




Thank you for your kind words. I agree that all these resources are relevant. I'm familiar with some but will study others more deeply.

Certainly I've spent a lot of time thinking about collaborative and distributed systems, and there is no doubt scope for cross-fertilization of ideas. A huge inspiration (cited in the post) is Martin Kleppmann's automerge, which uses a proxy object to capture an explicit diff from a computation which is expressed using ordinary mutable logic.

Regarding the build system work, I will say this: the build task is inherently graph-like, and a lot of the interesting problems involve choices of sort order to utilize parallelism best; I'm fortunate to have a 16 core machine, and it makes me sad to wait on a single-threaded link step. I think by comparison the reactive UI task is mostly dealing with tree structure, and that's a simplification that could be useful to exploit. It's easier to describe mutation of a tree than general mutation of a graph.

But certainly there are common themes in all these incremental systems, and I continue to be drawn to figuring out the fundamental principles rather than trying to build systems ad-hoc.

Martin’s writing and thinking has been hugely inspirational for me too — thanks for highlighting the automerge work here!

Next, regarding your other thoughts, two notes, one on performance and one on “common themes”.

Re performance: while in my initial comment I linked directly to McSherry & friends’ work on differential dataflow mostly because I value it as a principled and effective framework for incrementalizing many kinds of computation I care about, I would be remiss if I failed to also highlight here their complementary “COST” project of using this framework to demonstrate how carefully engineered (single-threaded!) implementations of graph computations in Rust can often considerably outperform more complex but more wasteful pre-existing approaches (at least in the database world).

(Whether the same flavor of result will also appear over time in the build systems or UI worlds seems like an interesting question for future work!)

Finally, re “common themes in incremental systems”: yes! - this and your point about “fundamental principles” are exactly what I was hoping to suggest by linking to the build systems + OT materials!




> Down the line, those needs include accessibility, a serious Achilles heel for imgui-flavored designs in particular.

Thank you for drawing attention to this. I spent a good chunk of last weekend pondering how accessibility might be added to IMGUI toolkits, and I do think it would be difficult. In particular, as you said:

> I believe a proper approach to this problem involves stable identity of widgets, about which much more below.

Agreed. In platform accessibility APIs such as UI Automation, each node has an identity ("runtime ID" in UIA) which is expected to be stable. One of the IMGUI toolkits I looked at was Nuklear, and I didn't come up with a way to derive stable identities for its widgets. On the other hand, Gio [1] (an immediate-mode toolkit for Go) looks more tractable, because the application holds a struct for each widget's state. Still, in an accessibility API like UIA, even simple static text nodes are supposed to have stable identity; I don't know how that would be solved with something like Gio.

[1]: https://gioui.org/

I don't know much about other IMGUI systems, but at least Dear ImGui has "stable identity of widgets" across frames, but unlike traditional UI systems, those identifiers are not created by the UI system and need to be stored by the client code, instead the client code pushes those identifiers into the UI system as strings and/or numeric ids.

I would go as far as putting immediate mode UIs and reactive UIs into the same "mental model bucket". Both only describe the desired UI state (e.g. no separate "UI creation" and "UI updating" phases), and let an intermediate layer figure out the minimal required state changes to "realize" this desired UI state (whether this is exactly how the UI system works under the surface is mostly irrelevant to the API user).

The only (visible) difference between reactive and immediate-mode UIs seems to be that reactive UIs seem to prefer nested data structures, while immediate mode UIs prefer nested code blocks to describe the desired UI state.

> I don't know much about other IMGUI systems, but at least Dear ImGui has "stable identity of widgets" across frames, but unlike traditional UI systems, those identifiers are not created by the UI system and need to be stored by the client code, instead the client code pushes those identifiers into the UI system as strings and/or numeric ids.

Thanks for that info. I hadn't looked closely at Dear ImGui yet.

I'm probably missing something here, but is there a reason these toolkits don't require the dev to specify a stable, unique ID for each node? Feels like there's strong precedent with HTML's id attribute and CSS.

Because IMGUIs are designed to be procedural, making it easy to couple changes in UI with changes in control flow. Requiring an ID for every widget would get really annoying really fast, considering IDs would become voided pretty often. The default is no IDs, leave the ID assignment up to the developer who knows if and when an ID would be stable

You can wrap tools like Dear ImGui and Nuklear into a reactive framework that handles state management and IDs and such and presents an API with a declarative interface a la React, but at that point you're pretty much building your own UI engine and just passing rendering on to those tools, which is not simple to architect.

Dear ImGui does have stable widget ids, it's just not very obvious when looking at the API, and often also irrelevant for the API user. Usually it's the label string, but you can make those unique with special '#' postfix strings which are not displayed but only used to create a unique hash. There's also a push/pop API for identifiers (often useful when creating lists of items where labels are not guaranteed to be unique).

Further, you're in no way required (nor would you expect) to provide IDs for most HTML elements. In modern frontend development you often avoid styling elements by ID entirely, favoring classes instead.

I'm not sure I agree about the priority of accessibility.

First, most of the immediate mode UI's tend to be more "vector based" so the UI actually actually has the possibility of scaling properly unlike--well--basically everything else (giving me 1.0, 1.5 and 2x (effectively reducing my 4K monitor back to 1920x1080) does not count as "scaling"). That's actually a really important "accessibility" feature that STILL doesn't work properly for almost everything.

Second, for accessibility features like tabbing and screen reading, every user and every programmer bears that overhead (which the author points out may have significant architectural implications) for what is a user base of zero for the vast majority of programs.

Finally, who died and made "tabbing navigation" a pronouncement from God? Why isn't "spatial navigation" a better choice, for example? Why isn't there something better than that?

In addition, why is it the job of the GUI toolkit to do accessibility? Yes, I understand why "accessibility" wants to attach to the GUI toolkits as the programmers will do the math and never implement it otherwise. However, that doesn't mean the GUI toolkits should necessarily accept that task without at least thinking about the implications of doing so.

I hate it when y’all give made up stats, like “99% of all websites should not be an SPA” or in this case, “Vast majority of websites do not need accessibility features”

This strikes me as wrong and exaggerated.

What this article pointed out is that accessibility imposes a cost on programmers. At that point, a programmer is entitled to ask whether that cost is justified and whether he wishes to pay it.

This plays out in other areas. UC Berkeley was told that it needed to make public videos "accessible". That was going to be expensive, so they decided to simply remove the videos. This, of course, benefited no one because now nobody else could make the videos accessible either.


Accessibility isn't free and I wish people would quit acting like it is. We, as a society, choose to impose that burden because, on the balance, everyone needs accessibility to some degree as they age or if they get injured.

However, if nobody is willing to pay for it, don't be surprised if some people decide, like Berkeley, that exiting the arena is the better choice.

At which point, everybody loses.

> for what is a user base of zero for the vast majority of programs.

Try saying that to a blind person who lost their job because an essential application was inaccessible. It matters to them.

I disagree with you on pretty much every point and find your positions generally poorly thought out.

All UIs are based around positioning and drawing things. There is no difference whatsoever between any paradigm in that regard: any UI can be scaled properly to whatever level you desire, it just needs to actually do it.

• Navigating through controls is far more common than you seem to imagine. On desktop platforms, a significant fraction of users (though still certainly a minority) will become extremely frustrated if your app doesn’t conform to platform norms; and it’s not just power users: in line-of-business apps, all the best interfaces use Tab for rapid navigation between fields. Mice are really slow as field navigation devices. On mobile platforms, this is more tightly integrated with the keyboard, so that in forms, the “Enter” key gets changed to “Next”. Again, you’ll frustrate a lot of people if you flout platform norms.

• Exposing an accessibility tree for things like screen readers, now that part is more commonly associated with overhead, especially on the web; I’ve seen one web app not do ARIA stuff by default, but have a checkbox in its settings to enable the accessibility tech, with the warning that it’ll make it slower. Can’t remember what app it was. The ideal might be to not build that until something requests it, but the web platform at least doesn’t provide the means of doing that at this time. I’m not familiar with the underlying protocols and whether native apps can work this way.

• Linear field navigation matches how people think most readily and how almost all apps work, and is a long-standing convention across all platforms, whether done with a Tab key or by other means. It’s what everyone is used to, which means you’d better have a really good reason if you decide to disregard it, and provide an obvious alternative. The most common alternative to linear field navigation is 2D spatial navigation with arrow keys, used in things like spreadsheets (augmenting Tab) or games (typically supplanting Tab). Gaming platforms tend to replace Tab navigation with this 2D navigation with arrow keys or a gamepad. But for most apps, 2D field navigation doesn’t tend to be as convenient as 1D.

• I’m baffled about who you think should provide for accessibility if not the GUI library. If you are saying “let the end developer do it”, I respond: are you serious? That would have the end developer duplicate basically everything from the GUI library, so people would abstract that into an accessibility library that wraps the GUI library clumsily, then hey, let’s merge the two so it’s not a pain, and now we’re back where we started. If that’s not what you’re saying, then I’m baffled, because those are the only two options I can see—unless you would have the screen reader do OCR on what’s on-screen and throw AI at it to guess how the app will work!

> The ideal might be to not build that until something requests it, but the web platform at least doesn’t provide the means of doing that at this time. I’m not familiar with the underlying protocols and whether native apps can work this way.

Native accessibility APIs do allow this. Chrome, for example, doesn't build accessibility trees unless it detects that an AT is actually consuming them. But you're right that the web platform itself doesn't allow this. The concern is that websites could then discriminate against people with disabilities, or offer misguided "alternative" versions. Some would probably also argue that if you're working with the grain of the web platform and not against it (i.e. semantic HTML and not too much JS), you get accessibility at no additional cost. I'm more pragmatic than that.

> unless you would have the screen reader do OCR on what’s on-screen and throw AI at it to guess how the app will work!

This may be the only way that the long tail of applications using custom toolkits will ever be accessible, especially considering the responses I get on threads like this one. And the VoiceOver screen reader in iOS 14 is actually doing this. It kind of sucks though that we have to burn battery power to reconstruct the UI semantics that are already there inside the application.

Basically your examples are about how to navigate a text based application--mostly web and CRUD.

What about a PCB layout program? What about Blender and animation? What about video editing? What about a 3D modeler?

If you are using an immediate-mode UI, you probably aren't making a text-based CRUD system--especially as immediate-mode UI's tend to be remarkably bad at text rendering.

A UI toolkit doesn't magically make every application accessible. If the aforementioned applications are to be accessible, that application has to change in major ways.

Not everything is text.

Forcing everything through that narrow lens is an impediment.

Neither Raph nor I were ever saying that the UI toolkit can magically make everything accessible; rather, we were saying that it’s roughly impossible to make an application accessible without support from the UI toolkit. These are very different things. (I’m afraid the terminologies we were all using contained some ambiguities that led to us talking at cross-purposes. I was always talking about framework concerns, since this whole discussion is all about the framework.)

Any fancy components will still need to be able to notify any accessibility tech what they are.

I contemplated saying more about other types of programs like those you mention, but decided that they’re really no different, beyond Tab being far less useful and often justifiably repurposed. The “canvas” parts may easily be just a black box that isn’t exposed to accessibility tech, but all of the rest of the UI around it (menus, configuration panels, property sheets, &c.) will still need to be exposed properly.

Also, a good GUI framework should help the application developer decompose even a highly custom UI for a niche, non-CRUD application into widgets. As part of that, it should help the developer implement accessibility for those widgets. It's not hard to do much better than Win32 here, and probably not hard to do better than Cocoa as well.

I’m curious, given salsa’s[1] so-far quite-effective use of the incremental computation (Adapton/Incremental/Bonsai), if you could elaborate on some of the pitfalls there for UI specifically. The primary salsa use case so far (in rust-analyzer[2]) is obviously fairly different from the pure UI use case, as it’s allowing other things to do the work on the UI and is using it purely for the incremental computation of compilation results, which are a very different set of constraints from UI painting. But at that point, my own knowledge runs out: thus the question.

[1]: https://salsa-rs.github.io/salsa/ [2]: https://rust-analyzer.github.io/

This is particularly of interest to me because I work day in and day out with what is to my knowledge the only application of the incremental computation approach to browser UI,[3] and colleagues and I have talked about what it would take to reimplement in Rust—what the tradeoffs would be, where the problems would emerge, etc. I suspect part of the issue is that the constraints of using the DOM informs a lot of the design considerations in the space, whereas implementing from scratch gives very different tradeoffs?

[3]: https://v5.chriskrycho.com/journal/autotracking-elegant-dx-v...

It's a very good question. I haven't really tried using salsa directly, but I know Adam Perry spent some time with it. I believe an earlier prototype of Moxie actually used salsa, but he ended up moving away from that for reasons I don't fully understand myself. In any case, Crochet is very deeply inspired by Moxie, so there's certainly an influence there, even if we don't end up using the code directly.

Regarding DOM, yes. I think it's held up pretty well considering how long the ideas have been around. It's slow, it's clunky, but it gets the job done. But I also think now is a good time to consider better ways to implement its core functionality, which is tree mutation.

IIRC the issue I had with salsa was that there’s no way to compose multiple databases from separate libraries and have them use the same incremental storage without deep integration.

I'm unsure how directly relevant this is but when I dug into the implementation of Elixir's LiveView stuff I was frankly amazed as to how little javascript was needed client side to apply the diffs being sent down.

I'm aware this is significantly because they "cheat", but I feel like you're already looking at borrowing from other peoples' trickery so maybe it'll still be of interest.

> I think the debate is now essentially over, the reactive approach is winning.

Is this really true? My intuition is that reactive UIs are fantastic for widgets and "wraps a database" type apps, but fall flat when it comes to deeply interactive software where performance and responsiveness is paramount — i.e. where there's a tight feedback loop between user input and rendering, and where the model state tends to be quite large and granular. (DAWs, drawing/illustration software, movie editors, mapping applications, text editors, etc.) I also get the sense that most revolutionary or industry-changing software falls into this category. So how can the reactive approach be anything other than a small tool in the app developer's toolbox?

Could somebody dissuade me from this notion? In my admittedly limited experience with the reactive paradigm, I feel like I run into a wall as soon as I've implemented all the basic widgets and need to actually work on the main content view of my app. It doesn't feel like it could ever be a good fit for software that pushes boundaries.

(Perhaps this is an entirely separate matter from "object-oriented UI" vs. "reactive UI", though. Or maybe I'm "using it wrong", in the sense that reactive UIs are meant to be mostly about widgets and not complex interactive views. But then I often hear people talk about reactive UI as the future of app development, so I'm not sure.)

"Because DOM is slow"

DOM is not slow, in principle.

What makes updates slow is the fact that each mutating DOM function must left the DOM in consistent state - rendering trees invalidated, etc. This creates significant overhead on massive updates. But:

    element.innerHTML = "new content";
is significantly faster than sequence of appendNode & Co. because element.innerHTML mutation is transactional. Browsers do not need to notify observers on each element parsed - only whole subtree at the end.

I have tests of different methods of DOM updates in Sciter (https://sciter.com) see: https://github.com/c-smile/sciter-sdk/blob/master/samples/te...

The test shows that these two methods:

    element.html = "new content"; and
    element.content( vector-of-vDOM-nodes );
are significantly faster than anything else as both methods are transactional.

What you're saying seems to be consistent with what I'm saying. Web-browser standard DOM is slow, for exactly the reasons you state. Adding an additional tree mutation API, not supported by any web browser, can obviously be a lot faster, and as you say, a good way to make that faster is to group the mutation into a transaction, rather than effectively have one transaction for each appendChild and setAttribute call.

What you've done is a pretty small change to the DOM and generally compatible with it. What I've done in Crochet is a more radical rework, but certainly with some of the same general principles (my `Mutation` datatype is also a transaction). I should also point out that my tree mutation is a bit more limited, as it can't do general movement of subtrees, only sibling movement within the same parent (and, in the current prototype, no swapping, only order-preserving, which includes insert, delete, and update).

If some compatibility with browser DOM is a goal, then your approach is appealing. I'm not pursuing it in my prototype because I'm also holding overall system simplicity as a whole. But in any case, I agree that people should be aware of what Sciter does; it's an impressive system.

We just need to add one method to Web API:

    document.mutate(function(mutator) {
The mutate will do transactional update as fast as possible.

> The problems with observable boil down to the fact that the callback requires a lot of context to know what change should happen in response to the event. In this way, I think it is fundamentally in tension with the goals of reactive UI, though some systems have managed to reconcile the two, often with compiler help to translate fairly straightforward declarative logic into an observable-based implementation: SwiftUI and Svelte come to mind.

I don't understand the main issue brought up with observables. What exactly is that "context" you're talking about, and how is it bad?

Observables, at least the way used in my https://github.com/raquo/Laminar/ UI library, define the dataflow graph of your application, letting you channel data into other data and/or into incremental DOM updates without using virtual DOM.

You don't need any additional "context" to know what change happens in response to what, defining the flow of data with observables and then binding some of them to DOM parts that you want them to affect or to source data from is literally how you write your application logic. You will need to express that application logic somehow, and doing that using observables is quite straightforward.

I'm not familiar with SwiftUI, but Svelte is not really any more about observables than React is. Disregarding how it's implemented under the hood, Svelte's API is made to be superficially similar to React but without virtual DOM, and that comes at a very high cost of designing their own language and requiring their own compiler. So you end up with a callback-driven API that to the user feels a lot like React's state + props + render method, and it's nothing like working with plain observables.

I think there might be an issue with definitions. I thought maybe Raph was talking about the Observer pattern from Gang of Four, and that you are talking about Observables as incrementally computed expressions.

I can remember a long time ago reading about an Observer pattern for a UI. Views would subscribe to a model. When the model changed, the views were notified. The Views then queried the model and determined how to update. It was suggested that for more efficient updates the change notification could carry some information(context) to indicate what had changed. The article didn’t discuss this information optimization further.

I see, you're most probably right, and this makes sense, thanks!

But what does the author think of using observables then I wonder? I mean the streaming kind, like ReactiveX (but not ReactiveX specifically). They're first class representations of event streams and state – exactly the domain of a UI library – and working with observables has been very pleasant in my experience.

Yes, I think there was terminology confusion (terminology in this space can be a real dumpster fire - you don't want to know how many different things are called "widget").

I don't have very strong feelings either way about stream-based computation. I am skeptical that using them as a primary primitive for building UI is going to work out well. Things are nice in the static case, but it seems to break down a bit when there's dynamic reconfiguration. Evan Czaplicki has given good talks on his evolution away from purist FRP. It's also interesting to compare the original Elm thesis[1] with the farewell[2].

That said, I think it's a super-interesting experiment to try to integrate these ideas with the Crochet architecture and see how it turns out.

[1]: https://people.seas.harvard.edu/~chong/pubs/pldi13-elm.pdf

[2]: https://elm-lang.org/news/farewell-to-frp

Elm has very different design goals than what I care for. I guess it's not purist FRP anymore but the elm architecture is definitely purist something. The problem in both cases is the purist part. It's way too much ideology and way too much machinery to achieve the simplest things.

Look at this pinnacle of elm architecture for example: https://elm-lang.org/examples/time

All this boilerplate just to display current time in current timezone. The same can be done with three very obvious lines of code in Laminar for example, without any regard for abstract purities, but with the same concrete outcome – a well bahaved h1 element that displays time. Scales to larger components and applications very well too.

Cycle.js has the same problem as old signal-based Elm by the way, and for the same reasons: obsession with purity, and ignoring that observables are a poor match for virtual DOM. You can have one without the other, and have a much simpler system. If you already use observables, virtual DOM adds nothing of value, only a way to satisfy the irrational demand for purity.

It is a long post with a lot of good points. One minor thing I would argue: the "setter support" in languages is not really a plus for observer patterns.

In many such languages support "setter" and use such to naively implement observer patterns, you can see a lot of repeated updates for the same object through the call chains. Because often than not, you are not update one property for one state object, you need to update a set of properties across some number of state objects, and simplistic "setter" based observation cannot differentiate these and coalescing properly. People can workaround this and implement "backpressure" etc, but it just to add more complexity on a broken promises.

What you really need is an observer implementation with transactional guarantees. Thus, the state change cannot be observed until an update transaction is done. This will naturally reduce number of unnecessary updates, and doesn't really need "setter support" from the language.

I'm pumped to check this out, Raph has been doing really good work on the Rust UI front.

You should join the xi channel on zulip https://xi.zulipchat.com/, there's a dedicated Crotchet subchannel.

Do UIs need to be trees?

How do things change if we were to use a list instead? What becomes easier/harder?

Layout calculations might become slightly more expensive, though it seems that that may be acceptable since most time is spent rendering; and perhaps there are other ways to accelerate these calculations if needed.

The fact that most UIs are happy to provide a single linear tab order (and not something more involved that allows users to navigate the tree structure) makes me think that there might be something there.

How would you represent a ui as a list?

A list of widgets, with each widget having a x, y, width and height.

It’s normal for UIs to need some sort of nesting, e.g. menus, tabs (including things like tab panels and ribbon UIs), flexible split panes. It’s possible to do all of these without a tree, but you’ll be doing what nesting does, just at a far more painful level. Switch from tab A to tab B → hide all the A widgets, show all the B widgets. Adjust the split position → recalculate the position, width and height of every element that’s “within” it. Scroll a scrollable panel → adjust the y position of every element that’s “within” it, and also probably refresh the clip masks that every widget must now have since scrolling makes partial occlusion possible.

What is a widget?

Is a text box with a reply button a widget? What about just a text box? What about just a button?

Assuming all three, it sounds like you have one widget composed of two other widgets, which naturally ends up creating tree structures.

Do you think about code as a tree?

It is, in some ways, but this is not the best model to reason about code, IMHO.

I do when I'm trying to operate on it with other code (e.g. compile it - equivalently render the gui). Typically you parse it into a syntax tree.

If you had to reason about your code as a tree when writing it, I think it would quickly be cumbersome. Remember that we're talking about the API here, not the internals of the implementation.

If I had to explicitly say "tree", I agree. But if what I was writing was in fact not a tree, I think it would be substantially more difficult.

Sound theoretical underpinnings are usually necessary for a coherent system that allows for good abstractions.

Note that in code while I don't say the word "tree" while writing it. I create a tree with my braces (C) or indentation levels (python) and file structure. The language designers created a syntax using a tree. And so on.

I guess what I'm saying is that while the tutorial should not say tree, and I should not be thinking about tree nodes and edges while writing it, I should be writing a tree (and in code I am).

I almost never think about anything as a tree, especially when I am trying to solve a problem with code.

I am familiar with the structure, but I've never found it practical as an abstract tool.

I dare you to make a website using only css position: absolute; Make it work on different screen sizes. Do independent scrolling. It'll be hell very quickly. Nesting is used in all somewhat complicated layouts.

Win32/MFC dialog layouts are based off absolute positioning of all widgets, with a few concessions to resizability (like the ability to select whether a widget moves down or stay still when a window gets taller). This is the reason why many dialogs, like Run, or Windows Explorer file properties, cannot be resized.

Any hierarchy (group boxes holding widgets) is purely visual, and the rectangle doesn't "own" its child widgets in a programmatic sense.

Text widgets have a fixed size. If your font gets wider (from DPI scaling or translations), text can become cut off or overflow onto the next line and disappear.

This demonstrates the main advantage of a tree structure, which is relativity.

You can say that all nodes on this leaf of the tree are contained within that node... It's an elegant way to describe a UI in my opinion.

How about with css grid?

Sure, but you're going to need to have dependencies among those widgets in one way or another.

Eg: You want to move a button and you need to move its text and background shape. How do you keep track that these two elements are grouped together?

Awesome post, thanks for writing and sharing it.

"The idea of the app logic producing an explicit tree mutation while holding an immutable reference to the old state of the tree feels very functional, though the app state itself is mutable. Even so, I haven’t seen this particular pattern in the functional programming UI literature (hopefully readers can fill me in). "

IIUC, this sounds a lot like what MWeststrate does with Mobx-State-Tree^1

1. https://github.com/mobxjs/mobx-state-tree

There's an interesting link in these UI patterns: Code clarity and performance.

For example, comparing a virtual DOM model with manual UI changes: Deliberate UI changes, if done well (The article mentions they're often not) are faster... at the expense of losing the elegant model of declarative UI. VDOMs conduct many more operations than are required, but allow you to describe how the UI should behave elegantly.

As the article mentions, Svelte is a cool approach that attempts to reconcile this. Personally, I wrote a React/Elm-like WASM framework in Rust. After stepping back, I'd only use something like that for complex, non-performance-critical UIs. In embedded or simple websites, that sort of overhead isn't appropriate, even though it leads to nice code.

I think we'll be able to dodge this compromise soon. Maybe with something like Svelte, or maybe with a different approach.

Superficially Terraform seems to be solving a similar problem: a declarative interface for updating a stateful process.


- functional core is responsible for computing the new desired state in response to events

- driver/plugin is responsible for computing the diff between the current and the desired state

- imperative shell is responsible for applying the diff to the process... which may be partially or wholly implemented as another component

Excellent article! While reading it I got the impression that some of the decisions made by the author were influenced by the limitations of Rust, so I wondered how you'd design a language to be ideally suited to express reactive UI? What would such a language look like, and what would the UI library look like.

In practice, an immediate mode API is clearly the way to go.

We've all seen "smarter" approaches failing more than a few times, obviously on the simplicity side but also, sadly, on the performance side, even with all the caching and retained structures.

This is the strange beauty of brute force algorithms, they may look dumb and ugly for our refined human minds, but they tend to map very well to the capacities of computers.

It is often faster to not bother and blindly render something instead of running complex logic to decide if you really need to render or not.

And if you're not satisfied with the performances of a "dumb" imgui, you can always add some smarter caching without touching the API.

Altough I bet this approach would only get you marginal benefits.

How do you plan to address accessibility using imgui? As I argue in the post, the model of an imgui-like API for clear expression of app logic, backed by a more traditional retained render object tree, has the potential to give you the best of both worlds.

And current crop of imgui libraries already maintaining states cross updates (after all, that is how you can implement things as simple as drag & drop). However, the reason you need to have a retained render object tree somewhere for accessibility is due to its interaction mode (mostly request & response) different from GUI (which renders at 60~120fps). The mismatch in interactivity thus compensated by the retained render mode.

It felt almost the old days where your UI is derived strictly from business logic with fixed set of widgets is better for accessibility than our new crop of apps where you have more gestures and shortcuts. It is no wonder that imgui shines if you need more gestures and eye-candy animations.

The mismatch between the periodic from-scratch rendering (push) approach of IMGUI and the request/response (pull) model of current accessibility APIs is a tractable problem. Imagine a streaming protocol that allows the application or GUI toolkit to emit the contents of an accessibility tree, either in whole or as incremental updates, whenever it renders. An IMGUI could produce this data each time it renders, even if it doesn't have a retained tree in memory. It would be up to the consumer of the protocol to create and update a retained tree from this stream. That consumer could be a library running inside the same application, which could then implement current accessibility APIs such as UI Automation. Or, if the protocol caught on and was supported by assistive technologies (e.g. screen readers), those ATs themselves could be the consumers. Indeed, I believe that applications pushing their accessibility information to ATs, which can then create and update their own retained trees, would be a better alternative to the pull-based request/response model that you mention.

IMO, the bigger problem is the one that Raph pointed out: widgets (including simple ones like static text) need to have stable identities throughout their lifetimes, even as their state, contents, and locations change. It seems to me that IMGUIs, or at least some of them, make this particularly difficult.

Yes, that would work. But that would be a departure from existing imgui paradigm where the output of imgui library is a 2d bitmap. At that point, it gets really close to the vDOM territory ...

As for the stable identities, yes, it is important. But I also don't see much of a problem with explicit Key: the UI components ultimately will have map back into a data object, which should have enough contextual information to provide the said Key.

You're right; for accessibility, at least for screen readers, you need something a lot like a DOM (whether real or virtual).

The output of most imgui libraries is not a 2D bitmap but a structure containing vertices, indices and texture handles, that can be used to render.

>How do you plan to address accessibility using imgui?

I am not sure I understand this question.

I assume Raph meant in the screen reader sense of the term.

I don't see any limitations on this side, intuitively, but I might not know all the intricacies tied to this kind of implementation.

If the ability to add metadata to any widget is not sufficient then I don't know.

Yeah, I’m not an a11y person (nor a UI or even 2d graphics person).

What about the “zoom under my cursor” type things? That seems much harder (though not impossible) if your interface to clients is “just draw directly”. Though thinking about it, it’s actually probably what macOS does: as a post process, take your frame buffer and then magnify the stuff in a region of interest and re-render that on top.

You’re winning me over :).

Edit: I assume your metadata solution is “if someone asks, the text under this region of interest says < string >”.

There is a common misconception about immediate-mode APIs, there are in practice very few implementations where rendering is done directly.

A structure is built every frame with all the data to render later, as a batch.

And of course if you need a magnify lense the obvious way would be to render it as a post process.

> I am not sure I understand this question.

I rest my case.

I think it’s fair to say that accessibility is an ambiguous term. Which one did you mean? (My charitable guess was a11y in the screen reader, color blindness, font size, etc. sense)

You are right, but I couldn't resist the opportunity. Yes, just "accessibility" is a fairly broad term, but I do mean it in the context of assistive technology such as screen readers and navigation by keyboard and devices like directional controllers. I should note, we haven't actually done any of that work in Druid yet (we're struggling with scope as it is), but trying to think ahead. I believe that having a retained tree will help enormously with building this, and that, in the meantime, tab focus navigation is a good proxy for at least some of the problems we expect to face.

It is a bit of work but navigation with keyboard/gamepad is possible with an immediate-mode GUI, as demonstrated by the excellent work of Omar Cornut (Dear Imgui)

You have to set a compilation flag to enable the keyboard/gamepad navigation, if you missed it from the source code.

Basic navigation with keyboard/gamepad is a trivial case compared to screen reader accessibility and navigation. Take a look at the Windows Narrator keyboard shortcuts [1] like "Next Heading" or "Go to next landmark/region" or "Read row header" - these require integration with the system UI framework or the accessibility APIs directly to provide extra semantic information about what's drawn on screen.

Even with integration into system accessibility APIs, Narrator (and other screen readers) performs a lot of heuristics to improve the user experience. These heuristics depend on a separate tree data structure [2] and how it changes over time, so in order to implement even the most basic accessibility, IMGUI would have to retain that information instead of throwing it away every frame. Component identity is crucial to how most (all?) screen readers work today and it's a happy accident that it also allows an efficient hybrid retained-immediate mode architecture.

I don't know if it has been fixed or not, but a year or two ago you could completely nuke the Windows screen reader just by assigning React components randomly generated keys so that the DOM nodes would be destroyed and recreated every redraw.

[1] https://dequeuniversity.com/screenreaders/narrator-keyboard-...

[2] https://docs.microsoft.com/en-us/windows/win32/winauto/inspe...


I’m a big fan of immediate mode myself, because it’s kind of amazing how much you can actually push through via brute force.

However, wouldn’t you agree that Unity, Open Scene Graph, etc. are successful counter arguments?

I would say it depends on your audience. The bottom layer of the stack might be a very simple retained mode, or fully brute force immediate, but it’s not clear to me that a “UI framework” for application developers should be immediate mode.

In my opinion, the reason why immediate-mode API are so efficient and practical is because it helps to avoid structure duplication.

An UI, from an abstract point of view, is nothing but a big structure.

When you're writing an application or a game, you very often already have an existing structure containing everything you need.

Take a simple example : a leaderboard.

Using anything but an immediate-mode API will force you to create a second, complex structure to hold this leaderboard, and a lot of glue code to copy data to and from this structure.


I think that argument is “people can’t usefully agree on common abstractions, why bother forcing a translation”. And so for users of scene graphs, it’s basically “well, we can all agree there’s a basic graph and bounding boxes are a thing” so you’re not giving up much.

But for UI, is the argument that it’s often more like drawing/painting, so you spend more cycles (in both mental and processing senses) on trying to express the thing you already knew you wanted to achieve?

I don’t know. Unity is completely redoing their UI stack, so I don’t think even they are happy with the current incarnation.

I meant Unity in the “you’re my game engine / scene graph” sense. Like why are game engines and scene graphs fairly successful, yet the comment I replied to is arguing “clearly retained mode doesn’t work”.

The hierarchy in a game object scene graph tends to be much shallower than a UI hierarchy, so I’m not sure if that is a good comparison.

It is absurdly easy to make a sluggish and extremely memory inefficient UI with the old Unity API.

I've seen that made many times.

I make a retained mode-ish gui library. (I am not sure if it falls exactly into the tradition of retained mode gui, but it involves constructing trees of widget objects, so—close enough.)

Its design is fundamentally much simpler than that of the immediate mode UI libraries. There is no need for implicit state. There is no need for layout to be a core feature of the library; layout can be 'just another widget'. This opens up the possibility for different types of layouts: constraint-based, grid-based, etc.

It's even possible to implement an imgui on top of this fairly easily; but not the other way around, because imgui has implicit global state to manage.

Imgui is not slow, especially with GPUs. But it definitely loses on the simplicity front.

Immediate rendering consumes an insane amount of CPU.

Only if you render at 60Hz/120Hz/etc. like games do.

If you have a static UI, you can skip rendering frames until "something" happens. That's going to burn no more power than a retained-mode UI.

Sure, the immediate mode UI may redraw 100,000 triangles vs the retained-mode 100 triangles. However, that's just not a bottleneck on practically every device nowadays. And I doubt that the difference is that stark as rendering scaled text properly is still just egregiously expensive in terms of computation.

However, everybody nowadays has to "ANIMATE! <jazzhands>" so you're rendering 60Hz anyway irrespective of type of UI.

And if animation is required by application (film editor, digital audio workstation, etc.) it's not clear that you really can do anything other than immediate mode. Do note how many of those kinds of programs wrote their own GUI over the years--that says something.

One thing that people don't mention is that immediate-mode UI's tend not to suffer from as many synchronization issues and retained glitches as retained-mode UI's do. Even if you don't get the update correct in an immediate-mode UI, the next frame the error goes away. In a "retained-mode" UI your widget needs to "refresh", which may not happen except in very rare circumstances. I have had to grab the corner and adjust a window size a couple of pixels innumerable times in order to clear UI glitches over the years.

I should probably comment about threading and the "UI Master Thread", but I don't think I've seen a good example of an immediate-mode UI running multi-threaded, yet. You probably need Vulkan to even have a hope of pulling that off (OpenGL is notoriously multi-thread hostile).

Not in my book.

I am running a prototype immediate mode gui that I built for my yet unreleased engine and the frame time is staying below 500ns for a fullscreen app with many widgets and layers.

The whole thing is built on C + Vulkan but is not entirely optimized yet.

And this is running on a decent but not extremely fast computer. (An Intel NUC from 2018)

Now take that same code and run it on a mobile device. Can you say for certain that it's making optimal use of power? And if not, what changes would you have to make so it does?

I'll do that later but I don't think it will lead to extreme CPU consumption.

There are many ways to improve performances of idle/almost idle ui.

Caching can be useful for large blocks of text as it tend to be the slowest thing to render.

But it is preferable to incrementally optimize/add complexity and check assumptions and not make large architectural decisions ahead of time.

One way to reason about it is to think in terms of pure data transformations.

If you're caching data, you now have stored state and it's no longer a pure data transformation. You will run into these performance problems with any large data set, not just blocks of text.

Can I see a screenshot?

I plan to publish several articles about my engine later this year, but I am not ready to show it yet.

Man, reading this just reinforces that DOMs and HTML are not "right" for webapps.

These frameworks, contrivances, and the resulting principles will spiral on, never solving the real issue: DOCUMENT Object Models and HYPER TEXT MARKUP Language are for representing linked documents, not stateful object based applications. It's right there, in the names.

Is that true though?

In the article he mentions how a lot of the web frameworks don't have to worry about handling tabindex because the browser does it for them.

I have also yet to find another non-web UI framework that handles styling nearly as well as CSS. Between flexbox, grid, and absolute positioning, you have so much power for creating layouts. Throw in some drop shadows and borders to help highlight the important UI pieces and bam.

I think the common critique of React, Vue, etc on HN is fair when talking about people building simple websites with these highly sophisticated and massively overkill tools. But when it comes to applications I have yet to find something nearly as flexible and good looking with so little work. And they do it by using HTML and CSS which as it turns out are pretty good abstractions to build on. Or at least they are from my perspective as someone who's been using a variety of these frameworks for the past 5 years or so, on quite a few medium sized projects.

looong time ago, we've discussed with a colleague who was building some UI for a game as side project, about what "hierarchies" (essentialy, aspects of structure) are there in an UI. So we ended with about 4:

* encapsulation - what includes what / belongs to what

* visual - what overlays/obscures what

* event-passing - if X gets clicked, or otherwise interacted with, who else would get the event passed to/notified

* ??? can't remember this one.. but i remember it took many conversations to find out we need it. Probably was dependency-linkage (in aspects other than above 3, e.g. some model-logic dependencies)

This aspect-distinguishing may or may not be usable for you. For example, a usual "scrollbar" is composed of multiple smaller elements, which are tied together in all above aspects. But it does not have to be like that, e.g. the position-mover ("thumb") and the lower/higher buttons need not be on same visual place, e.g. a circle volume knob and separate quieter + louder knobs elsewhere (and that may be very far). One may argue it's not a scrollbar anymore.. expect it does the same thing, just visually differently.

It would be interesting if one can really separate these aspects in the code, somehow.

That said, how would your toolkit handle this: a small "innocent" interaction that causes lots of small changes all over the tree. Think taxforms: a form, mixture of subforms and fields, many levels deep, and i need to highlight the user-changed fields (or subforms containing changed fields), all the way up. What i ended up recently, in React (i.e. in DOM-recipe-tree), is, anytime the user presses some keystroke in some text field down in the depths, the whole thing needs be repainted. A few hundred fields probably. Which.. isn't that fast anymore, when depth goes beyond 3-4. (From user perspective there are 2-3 times less levels than programatical). Probably i've missed some "i have not changed, don't repaint me" step somewhere.

The web does it better, and the web does it first.

But I am glad to see native developers finally get a half-baked, sparsely documented reactive UI framework (still a year later!), and like a decade late. Apple is over, and it’s hilarious!

Rust is a systems language. For UIs, you want to trade a little performance for ergonomics, thus you'd want a different language. Just as one example, closures are very useful when writing UIs and you need a garbage collector to use them in their fullest potential.

Of course, you can still write your low level bitblt functions in Rust if you need to (though these functions are probably best left to a GPU). But an entire application and its UI, then you're probably pushing it too far.

Rust solves the closure problem with its borrow checker, closures are heavily used through the rust ecosystem. ( I didn’t downvote)

C++ is a systems language, yet Qt seems to be quite a successful GUI toolkit.

Qt has issues with lifetimes of objects. It's easy to shoot yourself in the foot by accessing objects that have already been destroyed. The borrow checker can solve that, but still shows that C++ was not really the best choice as a language for building UIs.

I also think that the success of the combination of Qt/C++ is mostly a result of the evolution of programming environments. C++ was massively popular, so that could explain it.

You’re being downvoted but I agree with your opinion.

This blogpost doesn’t show any UIs built with this rust paradigm, and the github homepage only displays a few screenshots of a calculator app.

Technically, it appears interesting. Practically, it seems very far off from the tools I use to build usable (or hopefully aesthetically pleasing) UIs.

That’s… kinda the point of research work. No one claimed it was ready to replace the tools you use in production.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact