Is that all that's happening here? There's an implicit limit on real threads, wh...

dboreham · on Jan 16, 2024

My long held belief: green/user-level/M:N threading schemes never work at first, and only work reliably after extreme effort has been put into fixing all the cases where blocking code gets called underneath. afaik there are only two modern working implementations: golang and erlang. This article is consistent with that belief.

cyberax · on Jan 16, 2024

There are many other implementations, although in less popular languages.

The trick is to include the green threads from the start, so there are no libraries that depend on real threading. That's why Go and Erlang are so successful.

kelnos · on Jan 16, 2024

The funny thing is that Java did have green threads back in v1.1, but they were dropped in v1.3.

That doesn't invalidate your point; more than 20 years of Java practice has focused on making things work well for platform threads.

andrewf · on Jan 16, 2024

I think Solaris moved from green threads to pure kernel threads at the same time (https://docs.oracle.com/cd/E19253-01/816-5137/mtintro-75924/... says Solaris 9 was the transition point).

pjmlp · on Jan 16, 2024

Go suffers the same issue when calling into native code, that is why it has APIs to deal with it.

For example, https://pkg.go.dev/runtime#LockOSThread

marwis · on Jan 17, 2024

This seems different.

It pins goroutine until it is explicitly released ensuring that multiple native calls will remain on the same platform thread and nothing else is going to use it. This is critical for namespace manipulation on Linux.

Java only pins for duration of native call and synchronized blocks.

It looks like Java does not offer equivalent API? For now could be achieved with synchronized but if synchronized will be changed in the future to not pin it would break.

marwis · on Jan 17, 2024

Oh, actually one can just spawn non-virtual thread to solve it.

nerdponx · on Jan 16, 2024

It works well enough in Python and NodeJS.

kaba0 · on Jan 16, 2024

That’s M-on-N, with N being 1. That’s basically a trivial problem in comparison.

samus · on Jan 16, 2024

Virtual threads were never intended as a drop-in replacement for platform threads. They offer the same API, but they are for different usage scenarios.

If you have lots of blocking I/O (meaning: waiting for things happening on other threads or processes, which offers scheduling opportunities), use virtual threads. If you compute or call native code, keep using platform threads.

The issue with synchronized is eventually going to be resolved. But long-running computations (sorting, parsing, number crunching, etc) or native calls must also in the future be offloaded to an ExecutorService with platform threads.

xmcqdpt2 · on Jan 16, 2024

The change in semantics is that while in principle your OS thread will always have a turn at making progress (assuming no super heavy spin locks etc), that isn't true for virtual threads. The classic situation and the one they hit in the article is something like this,

You've got some virtual threads that encounter this code,

    synchronized(foo) {
      foo.wait()
    }

And some other virtual threads that are in charge of awaking the waiters,

    synchronized(foo) {
      operation()
      foo.notify()
    }

This is a classic approach to the producer/consumer pattern in Java.

If operation() can do a virtual thread suspend, then it's possible to be suspended, relinquish the platform thread, which the scheduler reuses for the consumer and gets blocked on Object.wait. If this happens enough, you can end up with all the platform threads blocked, and no threads available to make progress on the producer.

The problem is that Object.wait doesn't release the virtual thread, which is a pretty major foot gun that I think the JDK team would have liked to avoid but it was too hard to implement correctly in the current JDK's codebase.

Groxx · on Jan 16, 2024

The only way I can see this being a problem is if the virtual threads can't be stolen from their (now pinned) carrier thread. Because otherwise that's all true of real threads too, blocking them is the whole point of Object.wait.

If there's no work-stealing from pinned carriers (or they're low-finite and normal threads are effectively infinite): yes that'd be a HUGE issue. I would be shocked if they released anything with that limitation though, that would violate some of the core expectations of mutexes and threads - independent ones need to make progress or nearly all patterns can't guarantee progress.

Groxx · on Jan 16, 2024

From Java docs for `jdk.virtualThreadScheduler.maxPoolSize`: the default is 256.

So yeah I can see that starving rather quickly, particularly with benchmarking-like workloads. Synchronized is very very common, 256 concurrent calls really doesn't seem all that abnormal.

If that were raised to like max-int32 would things be fine, semantically? That'd mimic real threads limits (no jvm limit at all afaict).

xmcqdpt2 · on Jan 16, 2024

> If there's no work-stealing from pinned carriers (or they're low-finite and normal threads are effectively infinite): yes that'd be a HUGE issue. I would be shocked if they released anything with that limitation though, that would violate some of the core expectations of mutexes and threads - independent ones need to make progress or nearly all patterns can't guarantee progress.

Correct you can't steal the carrier thread from an Object.wait() waiting virtual thread. This is apparently in the pipeline but it is a pretty major limitation.

Most cases of synchronized/notify/wait should probably use concurrent collections instead (as message queues) so in greenfield code it's not that big of a deal. Virtual threads make writing consumers/producers using collections way easier too.

Sadly, most Java projects are not greenfield projects.

Groxx · on Jan 17, 2024

>Correct you can't steal the carrier thread from an Object.wait() waiting virtual thread. This is apparently in the pipeline but it is a pretty major limitation.

I mean stealing other virtual threads from the pinned carrier thread (except for the one pinning it) so they can make progress. Normal work-stealing stuff - the queue(thread) is blocked(pinned), so process that task(virtual thread) in a different queue(thread).

It makes sense that a pinned thread remains pinned with the virtual thread that pinned it.

The 256 default carrier thread limit is going to frequently be a problem though, yeah. That's more than enough to cause all this, and it's a pretty crazy default imo.