Lots of systems/embedded programmers roll their eyes at this kind of talk. Threads aren't really that hard.
Event queues do have benefits in certain situations. They pair nicely with state machines. You can easily end up in callback hell though, and it is often difficult to integrate some long-running, atomic tasks into your event loop. You end up doing things like having a thread pool, at which point you have to wonder why you stopped using threads in the first place. Oftentimes a threaded approach is a cleaner approach. Just get the locking granularity right - it's not that difficult.
The main problem with threads is that they're non-composable: the set of locks that a thread holds is basically an implicit dynamically-scoped global variable that can affect the correctness of the program. If you call into an opaque third-party library, you have no idea what locks it may take. If it then invokes a callback into your own code, and you then call back into the library, there is a good chance that your callback will block on some lock that a framework thread holds, that framework thread will block on a lock you hold, and then the code that releases that lock will never execute. Deadlock.
If you control all of the code in your project, this does not affect you: define an order in which locks must be acquired and released and stick to it. If all of your dependencies have no shared data and never acquire locks themselves, this does not affect you (and indeed, this is recommended best practice for reusable libraries). If you never call back into third-party libraries from callbacks, this does not affect you, but it severely limits the set of programs you can write. If all of your dependencies thoroughly document the locks they take and in which order, this affects you but you can at least work around the problem areas and avoid surprise deadlocks.
Most application developers do not work under conditions where any of these are true, let alone all of them. Application development today largely consists of cobbling together third-party libraries and frameworks, many of which are undocumented, many of which are thread-unsafe, and many of which spawn their own threads and invoke callbacks on an arbitrary thread.
One technique to get a handle on this situation is making the mutexes actual explicit global variables.
"But global variables are bad" they will say. Yeah. And it reflects the reality.
"But I need a separate mutex for each object instance like they recommended in 1995 https://docs.oracle.com/javase/tutorial/essential/concurrenc... " they will say. Have fun with that.
Python and early Linux kernels use a single global mutex for access to all shared mutable state. In my experience, this is an entirely reasonable design decision for a huge majority of applications.
There are some patterns that are safe as long as you implement them correctly. The patters that are good for IO are among the simplest, so that's where the GP was coming from. But it's not viable because he has full control of the code, it's viable because his problem domain has good options.
Open libraries tend to either just be single threaded abd should be used as such or explicitly thread-safe.
Disclaimer: Used threads in Java not much in C. Love me some Jsr-133 volatiles. Still confused with the Java 9 memory model updates.
I've made use of threads at some point in almost every single job of any duration. They're one of many problem solving tools and if you understand them, which isn't particularly difficult, at some point you're bound to run into a problem that's a natural fit for a multi-threaded solution.
Nowadays, especially with no shared state, they're super-easy to use on many platforms. Take, for example, the parallel support in the .NET framework, along with functionality that supports debugging multi-threaded apps in Visual Studio like the ability to freeze threads.
If you do need to share state, which is when locking becomes essential, most languages and platforms have easy to use constructs to help you do this without much in the way of drama.
I'm not suggesting for a minute that there are no dangers, but there are plenty of dangers with other programming techniques, as well as lurking in any system of sufficient complexity, so I don't really understand why threads garner so much hate.
This is actually a problem. It is very easy to just slap locks around which, depending on your workload, can cause the threads to be blocked waiting for work.
I have seen many designs that used threads "for performance", but had so many locks in place that a single threads would actually perform similarly, with much less code complexity.
Once you get past a couple of locks in your code, it starts to smell.
Just because you can do Thread.New in your favorite language, doesn't mean you are using them correctly or efficiently.
But the point of this article is to say if we ditch the notion of threads entirely and go with this other thing, we won't need safety nets anymore because it will be impossible to deadlock and corrupt data (as opposed to less likely).
I don't blame go because I'm not convinced threads are all that bad, but having more concurrent data structures would be great.
Unmaintanable raw-pthread messes are a nightmare sequel from the director of Endless GOTOs.
No experience or comment on Python/Tornado. We don't really do a lot of web stuff.
But sure, people can screw things up in lots of ways. However, once a threaded program is screwed up, you really only fix it by starting a new version - it's near impossible to incrementally fix race conditions and dead locks - you can't reliably repeat the bug to debug it. Bugs in non-threaded code can at least be tracked down one by one.
It takes a lot less expertise to make events as fast as threads as it takes to make threads as safe as events. I don't know about the rest of you, but I personally do not have a brain that can become an expert on every topic.
Nowadays, a great many GUI apps have a lot of data crunching to do in the background. You've really got two options for how to handle that:
1. Be intermittently unresponsive, like iTunes.
2. Do work on a background thread, like decent software.
4. Work on another process, like safe software
Windows NT, OS/2, and BeOS were better, if only the hardware cost cane down and we could ditch this legacy DOS/Win16 code…
I guess the author knew what was about to be foisted upon the world.
Kudos for trying to warn us.
(I remember reading Novell and OS/2 documentation in the late 80s / early 90s about threads and recoiling in horror. Of course, all real men must use threads, cuz they’re faster, even if stupefyingly dangerous)
And Sun was busy rolling out a certain now popular language with threading baked in around 1995.
I’ve had to fix other people’s servlet thread crosstalk bugs a few times. Wish “modern” (???) web apps used a different architecture...
(You probably already know this; just elaborating for those who might not be familiar with Tcl/Tk)
What's the go-to solution to get a UI to not block when running a computationally expensive task that takes a lot of time to finish?
One solution is processes (mentioned in the post). Fork a process which does your computationally expensive thing and then get the result when you are done. For the security minded, we've seen this make a bit of a come back because separate processes can be run with more restrictions and can crash without corrupting the caller. We see this in things like Chrome where the browser, renderers, and plugins are split up into separate processes. And many of Apple's frameworks have been refactored under the hood to use separate processes to try to further fortify the OS against exploits.
More expensive to start than threads, and far more expensive and complex and restrictive to move data around. Sounds like with the exception of some specific corner cases, threads are a better solution.
> Another solution is break up the work and processing in increments
Either the tasks aee broken into ridiculously fine-grained bits that are hard to make sense or keep track,or you still get a blocking UI. Furthermore, the solution is computationally more expensive.
These costs, though, are generally trivial compared to the lifecycle costs of dealing with multithreaded code. Isolation in processes greatly enhances debuggability, and it's almost impossible to produce a truly bug-free threaded program. Even a heavily tested threaded program will often break mysteriously when compiled with a different compiler/libraries, or even when seemingly irrelevant code changes are made. It's a tar pit.
Maybe, but, on Linux, processes and threads are almost the same thing.
Additionally, even where a process is a bit more expensive to create, it is not enough to block the UI thread from being responsive. I have first hand experience with this on different operating systems, including Windows, and it is more than fast enough to keep the UI completely responsive.
> and far more expensive and complex and restrictive to move data around.
Not necessarily. For threading, synchronization patterns are not necessarily simple. (This is why computer science instruction spend time on these principles.)
Furthermore, some languages and frameworks provide really nice IPC mechanisms. Apple's new XPC frameworks are pretty nice and make it pretty easy to do.
> Either the tasks aee broken into ridiculously fine-grained bits that are hard to make sense or keep track,or you still get a blocking UI.
As I mentioned, coroutines make this dirt easy. It principle, this doesn't have to be hard.
> Furthermore, the solution is computationally more expensive.
That doesn't really follow. The underlying task is the where the computation is. You are just moving it, either to a process, a thread, or dividing it up, or something else (e.g. send it to a server to process). At the end of the day, it is the same work, just moved.
Yes, you might need some state flags for breaking up the work, but threading also requires resources such as creating and running the thread, the locks and protecting your shared data, and so forth. There is no free lunch any way you do this.
If you do use a lot of CPU time, spawning a process instead of a thread might not have any noticeable impact at all.
Additionally, IPC isolates the process, meaning it can be more resistant to hostile takeover (if you drop privs correctly) and additionally you avoid any and all shared state that could possible result in unforeseen bugs.
Presumably the work processing time overwhelms the IPC time.
The exact architecture will vary according to your needs. There was one project I was involved with, which contrary to what Joel Spolsky would say, we recommended that it be entirely rewritten. The biggest problem? Spaghetti code and threads. Or rather, they way threads were misused. You see, there was no logical module separation, they had global variables all over the place, with many threads accessing them. There were even multiple threads writing to the same file (and of course, file corruption was one of the issues). To try to contain the madness, there was a ridiculous amount of locking going on. They really only needed one thread, files and cronjobs...
For the rewrite, since we were a temporary team and could not trust whoever picked maintenance of the code to do the right thing, we split it into not only different modules, but entirely different services. Since the only supported platform was linux (and Ubuntu at that), we used d-bus for messaging.
This had the not entirely unexpected side effect of allowing completely independent development and "deployment", way before microservices became a buzzword. You could also restart services independently and the UI would update accordingly when they were down.
Even then, at least one of these services used threads (as tasks). Threads are great when they are tasks, as they have well-defined inputs, outputs and lifecycle.
At another project, I had to call a library which did not have a "thread-safe" version. A group at another branch was using Java, and they were arguing that it would be "impossible" to use that library without threads. The main problem was, as expected, that the library used some shared state. We would just fork() and call the library and let the OS handle.
Threads are a nice tool, but that is only one of the available tools in your toolbox. Carpenters don't reach for a circular saw unless there is no other way, because it is a dangerous, messy and unwieldy tool.
(That won't work in every case, but it should be thoroughly considered first.)
If you need to deal with parallel processing (which is relatively often in the real world) you WILL have to face the problems of consistency, visibility and program order.
Many languages don't even require programmers to have much exposure to threading mechanics. It's an OS responsibility, and that's not necessarily a bad thing.
This is the case much more often now than it was in 1995.
Shared mutable state is where the problem lies, but in a lot of cases shared memory can be used to pass it without copying from a "scheduler" thread to worker threads for parallel processing in non-overlapping chunks, and then back for compiling into a whole.
They manage without threads pretty well, I think. Shared state is deliberate, outside of individual processes, rather than accidental in-process. As it should be unless you are doing some serious systems level programming.
Actor model with messages (events) sent between potentially isolated processes, perhaps?
If you truly need to run a compute intensive task in the background, the effort to [de]serialize a “command object” to and from another process should not be much overhead, vs sharing almost all memory by default.
Once you reach the point where you are starting a thread pool at start up, rather than spawning threads on demand, why not just have a process pool?
Shared memory blocks can also be used to explicitly share data too large to effectively serialize as a message / event.
If you truly have something that pumps huge amounts of data between compute intensive tasks, then threads make sense. Proceed with extreme caution, and try to encapsulate the trick bits.
I.e. does it allow passing large parts of data structures without copying?
Under Windows, it is a different story. Threads and processes are wildly different constructs, and threads are more lightweight. Sometimes, still not lightweight enough, so they came up with fibers.
However, real Threading code is just incredibly difficult to reason, just by looking at it. This makes it easy for you to introduce race conditions without even knowing that there is one!
There is also the fact that locks don't lock anything!
They are just a flag, that a any code may choose to ignore.
They are a not an enforcing tool, just a cooperative one.
(More here: https://www.youtube.com/watch?v=9zinZmE3Ogk)
P.S. I created a library, that makes it easier to write safer multiprocessing code
 And even in this case you're probably still multi-threaded, although in most cases it won't feel like it because your server side threads won't share state.
At the same time, Java’s threads are so easy to use — without of the portability or debuggability issues of native threads — that threads don’t seem that bad to Java programmers. Yeah, shared state can be a foot gun, but so can global variables. You just keep things as pure and easy to reason about as possible. And Java has had concurrency primitives that keep you from having to deal directly with threads and locks for over a decade.
I don’t think “events” and threads solve the same problem. If your program would work just as well doing all its work in a single thread then yeah, you don’t really need threads. If we’re comparing “events and callbacks” async style to async/await style where you write your code as if it were running in a thread (even if it isn’t), I think the latter wins.
Threads can be a bad idea but if you keep in mind what variables you use and guard shared memory, it's fine. Sometimes you might prefer a process instead for security/resistance.
I started a new project back around then to build a system for deep caching of web sites to give time consistent access offline. I implemented as a web proxy with an online/offline button. As you browsed web sites, it would crawl recursively following a set of rules. The intent was to precache content near what you already explicitly accessed, to make it available offline later on (we called this the "detachable web").
While not the primary purpose of our project, I put together a demo to optimize the Alta Vista search results page, which at the bottom only had a "Next" button (unlike the "1 2 3 4 5..." you see at places like Google today). When you clicked "Next", it took Alta Vista a few seconds (4-5) to return the next page of search results. My system would prefetch the 10 pages of results by POSTing the "Next" for you, basically while you were still reading the first page resuls. The result was "Next" became instantaneous. Again, this is not why we built this system; this was just one novel approach I used it for.
I mentioned all this because the entire project was implemented in Tcl. Being influenced by the lack of thread support in Tcl and by the paper mentioned in the OP, my project utilized a event-driven model for everything, since every inbound user require could fire off dozens of background fetches, all of which needed to be done in parallel. Events (and continuations) worked well for this. I have a paper up from the 5th Tcl/Tk workshop:
I had used for the project Tcl because it let me support all three prevalent platforms of the time: UNIX, Windows 95, and MacOS 9. Day-to-day work was done on FreeBSD.
I think have some commentary in the paper on the effects of the event-driven approach. What's funny is that I was taken off the project for v2, which the team then decided would be written in Java using threads, because, well, Tcl wasn't mainstream enough. In 1997, Java was the rage. The downside is that they could never get v2 working reliably enough because of the explosion in memory and processing power it required to accomplish the same work. In Tcl, having 60 traversals active when it was just 60 continuations (events) just worked. In contrast, the Java implementation needed 2-3 threads per traversal, and it just couldn't scale up to that.
The canonical thing that works this way is Erlang (and its modern cousin, Elixir). See also "actor model" (e.g. Akka). It is approximately how Windows and Mac GUI used to work (back in the day; did not look at these APIs for ~20 years).
Python async is coroutines, a different kind of concurrency. In it, the event loop is hidden, and coroutines just yield control, implicitly or explicitly, to allow other coroutines proceed. In Python, a CPU-bound task can only run on a single thread, due to the Global Interpreter Lock preventing concurrent modification of data. Coroutines are still useful both for IO and as a general way to describe intertwined, mutually dependent computations. (The earliest, limited Python coroutines were generators.)
I'm not sure that's true, at least not in my problem space. Copying data leaves you the possibility of operating on stale data, which will result in the computation returning the wrong answer. To avoid that, you have to let the event handler know somehow when the data has changed. How are you going to do that?
The amount of experience you needed to program that, while dealing with structuring the rest of your program could be large, especially if you were adding threads to an existing program.
Functional programming is cumbersome to pull off in the systems programming languages available at the time (C).
Nowadays nobody should use locks anyway, as there are much better, faster and safer variants for concurrency with native threads, based on actor capabilities and ownership, and avoid blocking IO like hell. No, not Rust. Rust did it wrong.
Those who do it right are so far Pony, Midori/Singularity, and parrot with native kernel threads. With simple green threads there are some more, but they are only usable for fast IO, not fast CPU tasks.
Functional languages really shine under parallel-processing conditions, but that wasn't a big thing at the time. Processors could handle multiple threads, but there was only one core, so only one thing could run at a time. Systems with multiple cores or multiple processors were rare and expensive.
C++ was the dominant language then and all multi-threaded programming (at least all the multi-threaded code I ever saw) used threads with locks to access mutable state shared between threads. No doubt there were some people who had better knowledge, but virtually all software developers at the time were either doing single-threaded code or writing C++ code using threads.
It wasn't until many years later (after 2010) did I slowly learn about immutable state and functional programming. Languages had evolved a lot in that time and functional programming overhead no longer mattered now that processors had become so much faster and had started sprouting multiple cores. We also have fastly more memory at our disposal, so we don't have to write code anymore that is super memory-efficient at the expense of everything else. Functional languages aren't performant enough to write an operating system in (and they usually don't allow manual memory management), but they're just fine for most applications.
Functional languages and the concept of immutable state have really flourished since 1995, now that the environment is much more favorable for them. Faster processors and more cores made functional programming and immutable state much more practical to use. I'm so happy that this is the case, because writing multi-threaded code with a shared mutable state was really tough.
And they allow you to make the same mistakes you can make with threads.
Don't get me wrong, I love Go. But it does not free you from having to think about what you are doing.
A pretty valid critique, and a reasonable solution offered.
Threads are like a data super-highway and all the incorrect uses of them arise from using them for way to little data. Akin to building a 5 lane highway for 5 cars to pass.
A thread has some amazing things of being able to switch an execution very fast (built into things on the CPU level) and memory caching/storing advantages. Aka a thread is meant for a compute heavy task like rendering something, or running a decode in the background, mainly doing heavy math. Threads provide great things but at a cost. just like a highway they cost a lot ( a lot of memory in your ram) and require some maintenance and management (locking mechanisms)
The problems with threads arise when people think its ok to use them everywhere for all tasks parallel or async.
Example Apache used to start a thread for each connection to server which at that time took 40 MB + .5 sec and this allowed a myriad of attacks on, one of them being slow loris.
In java-script if you start a new web worker thread, that's actually a new V8 instance and costs you again a lot in memory and startup time.
This "start a thread for everything" was definitely the prevelant thinking in the first decade of 2000, and people were not really thinking about hidden costs.
Come along Ryan Dahl with node.js in 2009 and "OMG everyone forgot there are such things as event loops"
An event loop is basically a much cheaper single threaded async way of processing events in an event queue, the big idea here was that in most other languages threads waited for any time consuming I/O to network or hard disk and let other threads run in the meantime.
Ryan combined the async nature of event loops with async I/O... rightfully a very clever move. (also I/O locks is what often causes thread locks in multi-threaded environments)
This allowed the single threaded event loop to never really lock up with any time consuming, but not CPU related task, freeing the CPU to constantly process the event queue, in a way emulating multi-threading on a single thread.
Going back to the highway metaphor, this would be more like an elevated city bike path, it cant take heavy trucks (heavy CPU loads) but it can take a huge amount of light processing request and never lock up, freeing up your city streets from bikers and leaving them more free to run the heavy trucks.
This is how node js can handle 600k concurent connections - https://blog.jayway.com/2015/04/13/600k-concurrent-websocket...
something you would never be able to achieve if u started a thread for each one.
basically this is akin to building 1 dense bike path for 600k bikers or building 600k 5 lane highways down each only 1 biker would go.
Where node.js falls short is if u give it heavy math tasks, the event loop will lock up.
So in my analytics processing server i had a node.js main loop with a bunch of V8 web-worker thread pools, to do the heavy math and statistics, while the main thread just routed requests and served cached data.
Another consideration however is memory leaks, threaded environments tend to clean up well after themselves, because if there is a leak in a thread it gets wiped when the thread dies. But node.js is very susceptible to memory leaks.
All these things are just tools, you have to learn when to use the right tool for the right job.
But i think there are much more pitfalls in building threaded environments then there are using event loops. I got node js concepts within a week or two, however i still struggle with some thread lock concepts even after taking clases, and shit is way harder to debug properly too. Its that high abstract level of thinking that i have a hard time visualizing in my head, and i am never sure that i though EVERY scenario through.
How is it clever? You cannot have async IO without an event loop. Async IO was a pretty mature technology long before nodejs came out. Netty did this back in 2004. The only special thing about nodejs is that it's culture is to be async by default.
In that regard process are much better solution.
Wasn't projecting async all the way down to IO primitives.
Commented because most of the debate seems to be threads vs callbacks--with processes being unwisely overlooked.
That being said, if we're talking not about IO, but about cpu/memory bound problems... Well I'd be lying if I said it was uncommon in my career to come across people who assumed, to the detriment of simplicity, quality, and performance, that a calculation (e.g. process a list mapping op with AsParallel/parallelStream) would be aided by parallelism. That is just ignorance by Dunning-Kreuger devs who don't apply a critical eye to their own experiences.
We desperately need more isolation in software.