> And in fact we can even use kernel threads most of the time. Well, not if you ...

mpweiher · on March 27, 2015

"most of the time" we don't need tens-of-thousands of threads.

Yes, I can easily dispatch something onto the UI thread using the technique you describe, but for that using a simple HOM is much more convenient:

   uiLabel onMainThread setText:'num: ', i.

Neither this nor your example are actually blocking the UI thread, because they are really just incidentally there and just pushing data in, a quick in/out. (And I assume that "Fiber.sleep()" takes the Fiber off the main thread )

However, the more difficult part is if the UI thread actually has control flow and data dependencies, let's say a table view that is requesting data lazily. Taking the blocking computation off the UI thread doesn't work there, because the return value is needed by UI thread to continue.

pron · on March 27, 2015

> "most of the time" we don't need tens-of-thousands of threads.

Well, that depends what you mean by "most of the time". But if you have a server that served tens-of-thousands of concurrent requests/sessions, you'd want to use as many threads as sessions, and probably many more (as each request might fan out to more requests executing in parallel). In that case you can no longer use kernel threads and have two options: switch to an asynchronous programming style, with all its problems (in imperative languages), or keep your code simple, and simply switch from using kernel threads to lightweight threads.

> However, the more difficult part is if the UI thread actually has control flow and data dependencies, let's say a table view that is requesting data lazily. Taking the blocking computation off the UI thread doesn't work there, because the return value is needed by UI thread to continue.

Again, that's not a problem. If you use lightweight threads scheduled onto the UI thread, you can block those fibers as much as you like -- synchronously read from the disk or a database -- the kernel UI thread doesn't notice it (as it's not really blocked), but from the programmer's perspective, you can intersperse UI and blocking operations (including IO) all you want.

mpweiher · on March 28, 2015

> Well, that depends what you mean by "most of the time".

It means "most of the time". As in the majority of cases, in the real world, not in hypotheticals such as...

> But if you have a server that served tens-of-thousands of concurrent requests/sessions,

"If you have..." -- But I do not, that's the point. A server with tens-of-thousands of concurrent requests is the absolute exception and so not "most of the time". Most web-sites or blogs can be happy if they have a thousand visitors per day, and that's already optimistic. They could be served by hand, or even by a Rails app.

For example, I work for Wunderlist (Alexa rank ~1600). We have over 10 million users (of our API), so already an unusual high-load case, yet we get "only" on the order of 10K requests per minute. (Well, that was last summer, so more now :-) )

Considering that most requests take a couple or maybe dozens of milliseconds, the amount of actual concurrency required to handle this throughput is orders of magnitude below what you describe. In order to keep latencies down and not just throughput up, you want to up the concurrency a bit, but to nowhere near your "but if" case. And that's an app with 10 million very active users. The case you describe is simply highly atypical. That doesn't mean it never happens, it's just not very common, even on the server.

That's what I mean when I write "most of the time".

Clients, on the other hand, tend to deal at most with on the order of 100 outstanding I/O requests (that's already pushing it pretty hard). Whether you use kernel threads, user threads or another of these async mechanisms is almost entirely a matter of programmer convenience, performance will be largely indistinguishable. On the client, I have a hard time seeing your case pretty much ever.

So you have none of the clients and a tiny percentage of servers with the need for 10s of thousands of concurrent requests. The other case is what happens "most of the time". That also doesn't mean you can't use a user-thread approach in those cases, you certainly can, it's just not necessary.

---

I am not sure I am getting through to you with the UI thread. One more try: yes, I understand you can reschedule your fibers (and thus not block the UI thread). I am saying it doesn't help, because you have an actual control flow and data dependencies that are essential, they are not artifacts of technology.

Scenario: You have an iPhone app that displays flickr images. You start the app, there are no images yet, they have to be fetched. But you UICollectionView just came into focus and is asking you, the data source, for the Subviews. You know that there are 10 images, so you tell it that. It then asks you for the the visible subviews. At this point, you have to return something, because the collection view wants to display something. But you don't have the image yet, it's still in transit. Still the UI has to display something to the user. So you can return empty UIImageViews. Or you can return placeholder views.

No matter what you do, you have to do it now as the UI is being displayed, because you can't de-schedule the user that is looking at the screen.

And later, when those images do trickle in from the network connection, you have to somehow asynchronously update that UI you displayed earlier. You simply cannot do it synchronously because at the time the display is built, the data just isn't there yet.

pron · on March 31, 2015

> That's what I mean when I write "most of the time".

I absolutely agree that in the scenarios you've described, the thread implementation makes no difference (again Little's Law), and that this is "most of the time" for traditional web apps. But we're working with people who are working on IoT, where there is constant ingest (and queries) from all sorts of devices, where we're easily surpassing 100K concurrent request, and expected to grow beyond that by several orders of magnitude.

> No matter what you do, you have to do it now as the UI is being displayed, because you can't de-schedule the user that is looking at the screen.

Ah, now I understand what you mean (thanks for being patient)! Still, I think this pseudo code (which gets run in a fiber for every image):

   display stub image
   fetch image
   display image

is easier than any callback-based alternative. You can even make it more interesting:

   Future<Image> image = get image
   while(!image.isDone) {
       display spinner frame
       sleep 20 ms
   }
   display image

This is a lot easier than the async alternative.

muraiki · on March 27, 2015

By "tens-of-thousands of threads" I think he means something along the lines of how in Erlang/Elixir an object is often a thread, and a library a program. By giving so many threads "for free" you make blocking cost nothing. It's a very different approach from your typical language.

This article only uses a few threads, but it will perhaps quickly give you an impression of how this design works: https://howistart.org/posts/elixir/1

mpweiher · on March 27, 2015

I am fully aware of the approach, especially in languages/systems like Erlang, and the freedom that very cheap threads give you.

My first point was that you actually have much more of this freedom than most people are aware of, even with (comparatively) heavy kernel threads. For example, see the "Replace user threads with ... threads" by Paul Turned (Plumber's Conference): https://www.youtube.com/watch?v=KXuZi9aeGTw

More on that point, I see a lot of user-threading/async craziness on clients such as iOS that would have been easily been handled by less than a dozen kernel threads, most of which would be sleeping most of the time anyhow. That's a number of kernel threads that is easily manageable and not particularly resource intensive.

My second point is that there is one thread that this blocking-happy approach mostly doesn't apply to, and that is the UI thread. You really don't want that UI waiting for (network) I/O and therefore must employ some sort of asynchronous mechanism for data flow and/or notifications.

pron · on March 27, 2015

The Paul Turner approach works when you have up to about 10K-20K threads. Beyond that, you lose the ability to model a domain unit-of-concurrency (request/session) as a software unit of concurrency (thread). The kernel-thread approach works as long as you don't hit against the Little's Law wall. Basically, Little's Law tells you exactly when kernel threads become the wrong approach, which depends on the level of concurrency you wish to support and the mean latency of handling each concurrency unit (i.e. request/session).

> My second point is that there is one thread that this blocking-happy approach mostly doesn't apply to, and that is the UI thread.

You're not allowed to block the kernel UI thread, but you can schedule lightweight threads onto the UI thread and block them all you want, so from the programmer's perspective that restriction disappears.

muraiki · on March 27, 2015

Sorry, I was just trying to help clarify what was being said -- not trying to argue against your points.