Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> That, unfortunately, is really heavy memory wise and on the OS context switcher.

So, there was a time where a broad statement like that was pretty solid. These days, I don't think so. The default stack size (on 64-bit Linux) is 1MB, and you can manipulate that to be smaller if you want. That's also the virtual memory. The actually memory usage depends on your application. There was a time where 1MB was a lot of memory, but these days, for a lot of contexts, it's kind of peanuts unless you have literally millions of threads (and even then...). Yes, you can be more memory efficient, but it wouldn't necessarily help that much. Similarly, at least in the case of blocking IO (which is normally why you'd have so many threads), the overhead on the OS context switcher isn't necessarily that significant, as most threads will be blocked at any given time, and you're already going to have a context switch from the kernel to userspace. Depending on circumstance, using polling IO models can lead to more context switching, not less.

There's certainly circumstances where threads significantly impede your application's efficiency, but if you are really in that situation you likely already know it. In the broad set of use cases though, switching from a thread-based concurrency model to something else isn't going to be the big win people think it will be.



> So, there was a time where a broad statement like that was pretty solid.

That time is approaching 20 years old at this point, too. Native threads haven't been "expensive" for a very, very long time now.

Maybe if you're in the camp of disabling overcommit it matters, but otherwise the application of green threads is definitely a specialized niche, not generally useful.

> In the broad set of use cases though, switching from a thread-based concurrency model to something else isn't going to be the big win people think it will be.

I'd go even further and say it'll be a net-loss in most cases, especially with modern complications like heterogeneous compute. If you're use case is specifically spinning up thousands of threads for IO (aka, you're a server & nothing else), then sure. But if you aren't there's no win here, just complications (like times when you need native thread isolation for FFI reasons, like using OpenGL)


> That time is approaching 20 years old at this point, too. Native threads haven't been "expensive" for a very, very long time now.

It depends on the context, but yes. I worked on stuff throughout the 2000's where we ran into scaling problems with thread based concurrency models. At the time, running 100,000 threads was... challenging. But yeah, by 2010 we were talking about the C10M problem, because the C10K problem wasn't a problem any more. There are some cases where you really do need to handle 10's or 100's of millions of threads, but there aren't a lot of them.

> Maybe if you're in the camp of disabling overcommit it matters, but otherwise the application of green threads is definitely a specialized niche, not generally useful.

Yup, but everyone is still stuck on the old mental model of "threads are bad", partly driven by the assumption that whatever is being done to handle those extreme cases is what one should be doing to address their own problem space. :-(

> I'd go even further and say it'll be a net-loss in most cases, especially with modern complications like heterogeneous compute.

Even more so if you're doing polling based I/O rather than a reactive model. The look on people's faces when I point out to them that there's good reason to think that for the scale they are working at, they'll likely get better performance if they just use threads to scale...

It's so weird how we talk about the context switching costs between threads without recognizing that the thread does the poll is not the same thread that processed the IO request in the kernel.


> I'd go even further and say it'll be a net-loss in most cases, especially with modern complications like heterogeneous compute. If you're use case is specifically spinning up thousands of threads for IO (aka, you're a server & nothing else), then sure. But if you aren't there's no win here, just complications (like times when you need native thread isolation for FFI reasons, like using OpenGL)

Virtual threads are going to be an /option/, not a requirement. Threads have to explicitly created as virtual threads. If this is not done, nothing will change.


Having it be optional increases, not decreases, complexity. ;-) It also increases the propensity for people to use the feature blindly.


For the JVM developers for sure. Implementing Project Loom must have been quite a ride. But even if it is used blindly, there are only three obvious issues I see:

* It's a no-no for computational workloads. As you said, they are concurrent, but not necessarily parallel.

* As you said, care has to be taken to use the right thread when interacting certain low-level APIs.

* It becomes easier to overload upstream systems by sending too many queries concurrently.


Oh there's a bunch of other problems as well. Developers will "solve" problems by increasing the number of virtual threads, that actually should be solved in other ways. Tons of code is going to suddenly discover assumptions about its underlying runtime model are no longer true, leading to subtle and potentially complex problems. New software will need to either take on the burden of choosing a runtime model or adopt the complexity from having to consider a mixture of both...


Your words might be true, but the world jumped on async wagon long time ago and going all in. Nobody likes threads, everyone wants lightweight threads. Emulating lightweight threads with promises (optionally hidden behind async/await transformations) is very popular. So demand for this feature is here.

I don't know why, I, personally, never needed that feature and good old threads were always enough for me. It's weird for me to watch non-JDBC drivers with async interface, when it was a common knowledge that JDBC data source should use something like 10-20 threads maximum (depending on DB CPU count), anything more is a sign of bad database design. And running 10-20 threads, obviously, is not an issue.

But demand is here. And probably lightweight threads is a better approach than async/await transformations.


It's madness.


Is it? Benchmarks routinely show the async native database drivers outperforming JDBC ones in Java, and evented (async) IO is king in the only other contenders, C++ and Rust runtimes for RESTful and other server apps.


...and in how many circumstances are the database drivers the limiting factor in application performance?

As I said in the beginning, you will absolutely win in the extreme cases (and accordingly, those tend to be the drivers that are tuned more for performance). In most cases in won't really make much of a difference one way or the other, and in some cases it will actually inhibit performance.


>>> The default stack size (on 64-bit Linux) is 1MB

The default thread stack size is 8 or 10 MB on most Linux.

The exception is Alpine that's below 1 MB.


To clarify, the 1MB is the default stack size for threads with the JVM on 64-bit Linux.

Search for "-Xss": https://docs.oracle.com/en/java/javase/16/docs/specs/man/jav...


The default reserved size is 8mb. The allocated size starts at a page (usually 4k), and grows in page sized increments as you use it.


Yep.

Granted there are scenarios where you want 100,000 "threads of execution." And that clearly is going to be impractical for system threads.

But if your worried about the overhead of your pool of 50 threads, stop it.


> Granted there are scenarios where you want 100,000 "threads of execution." And that clearly is going to be impractical for system threads.

100,000 was impractical in the 2000's. Today, even with the default Java stack size of 1MB, 100,000 * 1MB = 100 GB of virtual memory. For IO bound tasks, actual memory usage would typically be a fraction of that, possibly under 2GB. That's definitely practical for a modern server.

> But if your worried about the overhead of your pool of 50 threads, stop it.

Yeah, people seem to misunderstand how thread pools work out these days. They're more limits on concurrency than anything else.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: