pcwalton seems to be presuming that part of the contract of an "immediate mode API" is like old-school ones it actually immediately draws to the frame buffer by the end of the call.
Whereas you are talking about modern "immediate mode API"s where the calls just add things to an internal data structure that is all drawn at once, avoiding unnecessary shader switches etc. IIRC this is how Conrod (Rust's imgui library) and https://github.com/ocornut/imgui work, although with varying levels of caching.
One point to make about retained mode GUIs is I remember reading an argument that immediate mode is great for visually simple UIs, such as those in video games, but isn't as good for larger scale graphical applications and custom widgets. For example when rendering a large text box, list or table you don't want to have to recalculate the layout every frame so you need some data structure that sticks around between frames specific to the widget type, so that's what retained mode APIs like Qt do for their widgets.
Sure you can do the calculations yourself for exactly which rows of a table are currently in view and render those and the scrollbar with an immediate mode API, but the promise of toolkits like Qt is that you don't have to write calculations and data structures for every table.
Immediate mode GUI systems are allowed to keep state around between frames and the most-featureful ones do. The "immediate mode" is just about the API between the library and the user, not about what the library is allowed to do behind the scenes. The argument that retained-mode systems are inherently better at this doesn't hold water; it is kind of an orthogonal issue.
This works just as well/quickly as a retained mode API in almost all cases. There's some cases like extremely long tables with varying row heights and sortable columns, where you need an efficient diff of the table contents. Since recalculating layout and sorting every frame is inefficient. Retained mode APIs do this with methods to add and delete rows. It's possible to do with an immediate mode API, but to detect differences in the rows passed in quickly you need to use a functional persistent map data structure with log(n) symmetric diff. Or you can just have an API that is mostly immediate mode but has some kind of "TableLayout" struct that persists between frames and is modified by add and remove functions.
I'm curious what API you would use for implementing a table with varying row heights (that you only know upon rendering but can guess beforehand), sortable columns and millions of rows. I implemented this in an immediate mode GUI API a few months ago, and I did it with persistent maps and incremental computation in OCaml. Incrementally maintaining a splay tree and a sorted order by symmetric diff of the input maps. This isn't as nice of an API in languages like C++ so I'm wondering if there's a better way.
In general my policy is that when things get really complicated or specialized, the application knows a lot more about its use case than some trying-to-be-general API does, so it makes sense for the application to do most of the work of dealing with the row heights or whatever. (It's hard for me to answer more concretely since it depends on exactly what is being implemented, which I don't know.)