My guess is that people get scared off because the failure mode is so daunting. It's easy to write a threaded app that works fine for simple tests, then fails miserably when used under load. When that happens, it's nearly impossible to isolate a good test case for debugging. That's enough to scare off most people.
Another reason threads haven't become more popular is because operating system support is generally poor. It's rare for API documentation to say anything at all about multi-threading. There are common-sense rules for adding threads to Mac programs, but I discovered them more or less by trial an error, rather than anything I read in Apple's documentation.
BeOS not only had good support for threads, their use was in fact required in all GUI programs. Every single window runs in its own thread. That forces you to get good at multi-threading in a hurry.
The biggest mistake people make is creating too many locks. The most common failure mode is where thread A is holding lock 1 and is trying to acquire lock 2, while thread B is holding lock 2 and trying to acquire lock 1. A former boss (who worked at Be, as a matter of fact) suggested you can solve this one by always acquiring the locks in the same order everywhere. I say a better method is to never hold more than one lock at a time.
The best way to do that is share pretty much nothing between threads. Write each thread as if it were a separate program, as much as possible. Have the thread get new work out of a queue, and place finished work into a different queue. That way, you're guaranteed to never hold more than one lock at a time.
Successful multi-threaded programming requires a lot more discipline than normal. Failure to follow this rule means you'll be getting those impossible-to-debug lockups.
I'd say that on Windows threads are first-class citizens and are very actively used. Windows even comes with a ready-to-use thread pooling and ability to run arbitrary code in different thread contexts. Moreover, Windows-specific documentation often has a special paragraph regarding thread safety for every clib/stl function.
When I moved to Linux world I indeed found that threads aren't treated well. In fact I still don't have a full and deep understanding of threading outside of Win32, I suspect because different UNIXes have them differently. Perhaps it's the lack of POSIX standardization what makes them scary.
I've built a few multi-threaded windows server-like services and never found them to be particularly difficult to grasp or debug: system APIs, docs and tools were excellent.
To summarize, only after becoming a full time Linux/OSX programmer I finally started to see why Internet is full of "threads are evil" articles.
I may be wrong about Windows. I recently tried a Tcl script that launched another Tcl script repeatedly and found additional memory consumption was about 100kb per process. It wold probably have been less if I used fork. It would be interesting to know how Windows compares.
You are right, Windows processes are heavy. Moreover, context switch between processes on Windows is also a little slower, because NT-based kernels went "maximum security" route and perform full process isolation, that's also why Windows doesn't have a fork-like syscall, only environment variables, IIRC, can be cloned. (which is how, I suspect, Linux threads are implemented - they're probably share-everything fork calls).
On the other hand thread/fiber switch on Windows, I believe, is fastest of all popular x86 OSes.
All in all I like Windows process model a lot more. Clear separation between processes (share nothing) and threads (share everything except the stack) is well-known and well documented, and leads to a much nicer server-like application programming experience.
Although... I really like how easily you can nuke frozen processes on Linux if there is a problem. No programming tools/knowledge required. Dealing with hung threads would have been a lot harder.
Windows would be less then that per thread, probably more per process. But I am thinking back to 2003-ish data ! (really should try some experiments again).
He suggests events as a simpler alternative. I would add: Unless you use a language with convenient concurrency semantics like Clojure.
I would recommend that anyone using threads run their program through the Helgrind tool from the Valgrind folk. It does a nice job of noticing if you missed a lock on a shared variable by instrumenting and analyzing your program at run time. (It caught me on one in my most recent program. I knew I hadn't started any threads when I made the access, still I should have had a lock there in case I later changed the initialization order.)
Each thread gets its own Tcl interpreter, and you pass messages back and forth between them. It's a much better approach than the "one big lock". I believe Perl takes a similar approach to Tcl.
1. Tell the O/S to decide which order all your tasks execute (Wow great I don't need to worry about all that).
2. Oops, now we don't know what order tasks will execute, so we have to program a whole heap of synchronization code.
Do it yourself:
1. Program everything, so you know which order things occur, and don't need to worry about synchronization.
The conclusion is still accurate enough though, IMHO.
Locking order is important to prevent the classic deadlock, but there is nothing inherently 'too hard' for programmers in using threads.
Dr. Ousterhout has one of those:
Actually it says he was a professor at UC Berkeley, where he wrote Tcl, Tk, and an OS, amongst other things.
After that, he went on to work at Sun, and then founded his own successful company.
Postfix is another great example of a high quality piece of software that is not threaded. There are many other examples.
It is entirely possible to write complex systems that perform well without threads.
Threads are very handy if they don't talk to each other like java servlets.
If you have a shared variable, make it a monitor.
If you have a share resource resource, make it immutable.
If you have 45 shared variables, you are doing it wrong.
Use library objects like queues to communicate between threads.
Use a shared-nothing model like in erlang.