Threads Are a Bad Idea for Most Purposes (1995) [pdf]

oppositelock · on Jan 28, 2020

I started doing massively parallel programming on SGI systems around the time this paper was published. SGI's at the time could have 64 CPUs in a single system image, which was very novel. Sun was working on its early multi core workstations, and companies like Cray were pushing different models of distributed computation.

This paper came at a time when threads were really painful to work with. POSIX threads were still new and mostly unsupported, so you were stuck with whatever your OS exposed. On IRIX, you would do threads yourself by forking and setting up a shared memory pool, on Solaris, you had the best early pthread support, in Java, you used the native Thread classes which only really worked on Solaris at the time. It was a mess!

This mess is now solved. Pthreads are everywhere, C++ has std::thread, Java threads work everywhere, and we've had many new language some out which handle parallelism beautifully - for example, the consumer/producer model built into channels in Go is very elegant. The odd one out is Win32, but it's close enough to pthreads for the same concepts to apply.

Event driven programming has also become threaded, this is how the whole reactive family of frameworks, node.js, etc, handle parallelism.

As someone who's been doing this for a long time, what I find confusing is higher level constructs that try to hide the notion of asynchronous operations, such as futures, promises. As a concept, they're fine, but they're difficult to debug because most developer tools seem not to care about making debugging threads easier.

pdimitar · on Jan 28, 2020

> This mess is now solved.

If you say so. I can't count the dollars I've made fixing other people's poorly written multithreaded code in my entire career, including in the last 3 years.

Thread support is [more or less] standardised in all major OS-es now, sure. Doesn't change the fact that it's an extremely bad fit for the human brain to think about parallelism.

Stuff like actors with a message inbox (Erlang/Elixir's preemptive green threads) or parallel iterators transparently multiplexing work on all CPU cores (Rust's `rayon` comes to mind) are much better abstractions for us the poor humans to think in terms with. Golang's goroutines are... okay. Far from amazing. Still a big improvement over multithreaded code as you pointed out though, I fully agree with that.

I might be projecting here and please accept my sincere apologies if so, but it seems to me you are a bit elitistic in your comment. Multithreaded programming is still one of the most problematic activities even for senior programmers, to this day. Multithreading bugs get written and fixed every day.

At this point I believe we should just move to hardware-enabled preemptive actors where message passing is optimised on the hardware level, and just end all parallelism disputes forever since they are an eternal distraction. (The overhead when utilising message passing today is of course not acceptable in no small amount of projects. Hence the hardware suggestion.)

travisgriggs · on Jan 28, 2020

> Doesn't change the fact that it's an extremely bad fit for the human brain to think about parallelism.

I'm curious why you feel this way? Certainly anything with multiple (more than one) thread is a lot harder than single threaded. Given.

But I think the human experience is rich with real world counterparts that can make multithreaded programming "natural" (not necessarily easy, there's a difference). Basically any process in real life you do that involves collaborating with multiple people is a collaborative multi-threaded process. When forced to grapple with multi-threaded solutions, I often ask myself, "if I had a room full of people working on this problem, what would have to be in place to make the operation flow smoothly?" For me, this anthropomorophisation of the process makes it a "good fit" for how my brain is used to solving lots of real world problems.

On the flip side, I haven't had a lot of luck finding real world experiences that I can model coroutines/async/etc on. So they may be ultimately easier, but if our rubrik for "fit for the human brain to think about" is what the wealth of human experience has evolved our brain to handle well, I'm less convinced.

But I like learning new things. Maybe I just haven't seen the lightbulb yet. Help me see your point of view?

pdimitar · on Jan 28, 2020

> On the flip side, I haven't had a lot of luck finding real world experiences that I can model coroutines/async/etc on.

I feel we might be having too incompatible perceptions of the world but I'll try.

Every communication (human or otherwise) is exactly actors with message inboxes. We the humans can't react to 3 incoming conversations at the same time. Our brain queues up what it hears or reads and then responds serially, one by one (not necessarily in order of arrival). Basically an actor with a message inbox, like in Erlang/Elixir.

The way you assert multithreading mimics the real world to me is not convincing. One possible example in favour of your point of view I could think of is probably a table full of food and 20 people reaching and grabbing from it at the same time. But even then, two people cannot successfully get the same piece of meat. So not a true multithreading in the sense of N readers/writers competing for the same resource; it's more like a shared memory area where parallel writes are okay as long as all writers reference different parts of it.

Sure, a lot of stuff collides in real time without synchronisation out there, that much is true. But you likely noticed we the people don't cope with the Universe's chaos very well and rarely manage to grasp it properly. So is it really a good way to model deterministic and non-chaotic systems with programming? To me it's not.

I am curious why do you even think multithreading mimics the real world at all. Elaborate, if you are willing? (We might think of different definitions of multithreading in this instance.)

zzzcpan · on Jan 28, 2020

> Basically any process in real life you do that involves collaborating with multiple people is a collaborative multi-threaded process.

That would be actors or processes communicating with each other, not a collaborative multi-threaded process.

Real world has pretty much no natural notion of threads, only of actors.

blackflame · on Jan 28, 2020

Have you ever seen those assembly lines where a bunch of people sit around the conveyer belt and each one grabs a product, makes an addition, and sets it back down? That's fairly close.

vishnugupta · on Jan 29, 2020

Each person on an assembly line operates independent of the rest, there’s no “coordination” as such. A person picks up an incoming object, transforms it and sets it down. For that person, for all practical purposes, the rest of the people working on conveyor belt don’t even exist. I suppose that (not having to coordinate) played a key role in the popularity of conveyor systems. It’s simple enough to debug, address bottlenecks etc.

The closest s/w equivalent I could think of is Unix pipes. When I do “cat foo.csv | sort | uniq -c”, cat/sort/unique aren’t coordinating with each other, each takes the content from stdin, transforms and place it in stdout.

blackflame · on Jan 30, 2020

That's what optimum parallelism is. They are collaborating to process N number as many items at the same time in the most optimum way. They also are coordinating to prevent someone from grabbing the same item or placing the same item in the same place. The conveyor belt itself is a ring buffer.

The assembly line principle can also be applied to the concept of threads. Workers collaborate by handing items off to one or more workers(threads) down the line. Thread pools are just micromanagers constantly reassigning the poor workers, but at least the make a lot of friens collaborating.

zo1 · on Jan 28, 2020

> "I'm curious why you feel this way? Certainly anything with multiple (more than one) thread is a lot harder than single threaded. Given."

Not the OP, but in my experience most devs are bad enough dealing with state and flow in single-threaded code already. Including myself. But I've at least accepted that and try to mitigate it by avoiding things that make it worse (like threads). Meanwhile, I watch people around me doing code by spaghetti (throwing it on a wall and seeing if it sticks). The better devs don't apply that to their code, but still end up applying it to their "reasoning" about a problem domain, especially a business one.

E.g.: "Screw it, if this variable that I need is null I'll just return a False in this function I'm writing" ... without ever thinking "why" this variable might be null and how that explanation impacts the problem they're solving. Nevermind the fact that they're fully OK with propagating such a broken state further into the program with their False response. Now some poor other dev has to figure out why this function is returning a false result. Odds are they'll do the same thing and now we've doubled the amount of bad states.

Adding in even more complexity in terms of being able to "predict" code state by adding threads is a nightmare. I would most certainly protest and heavily advise against anyone using threads unless it's really really the thing that's needed to solve the problem. Otherwise, it's just adding more a lot more complexity for very little benefit if any at all.

derefr · on Jan 28, 2020

> Basically any process in real life you do that involves collaborating with multiple people is a collaborative multi-threaded process.

I would argue that the code equivalent of people collaborating in the real world is multiple processes with IPC (or even further afield, multiple nodes in a distributed system), not threads. You don’t share memory with other human beings. Imagine how productive you could be if you did!

harikb · on Jan 28, 2020

If we read other people’s (say a manager, who is giving work) mind and just did that, we would be in the exact mess we are with shares mutable state.

_ikq8 · on Jan 28, 2020

Imagine reading someone else’s mind, only to discover that they are reading yours... Mind-stack overflow?

derefr · on Jan 28, 2020

for certain classes of problems, sure. But how about “embarrassingly parallel” jobs (e.g. weeding a garden) or even “idempotent OLTP” jobs (e.g. studying several subjects at once)?

(Technically these aren’t best expressed as SHM but rather as a tuplespace or greedy worker-pool abstraction, but they can be achieved starting off with “just” an SHM threading primitive, so they count, I think.)

nybble41 · on Jan 29, 2020

> You don’t share memory with other human beings.

This really depends on where you draw the boundaries. In a computer everything is "memory" of one kind or another, but that doesn't necessarily correspond to human memory. The analogy works if you think of the "actor" as merely the thread's context and dedicated working space (stack), with everything else being part of the surrounding environment. In the real world we don't have anything akin to process separation, except by convention. Even the body itself can be directly manipulated by others. The convention we use for communication between individuals looks more like a message-passing system, but physical manipulation of matter is more like operating on data in shared memory. There is no physical law preventing someone else from coming along and messing with an object I happen to be holding any more than the rules of a computer prevent one thread from messing with a data object held by another thread.

lmm · on Jan 29, 2020

Collaborating humans have nothing like the problems that require memory barriers. We rarely have anything that behaves like shared memory - shared physical objects might come close, but a physical object is inherently de facto protected by a mutex.

Async, actor-like processes are everywhere in the real world - "something comes into my inbox, I do my piece of work on it, then send it to their inbox". Or "I'll send this off, do something else for now, and then go back to working on that when I get their reply".

bnegreve · on Jan 29, 2020

When you write a sequential program, you are essentially specifying one order of instructions that will produce the correct output. With parallel program, you need to consider every possible orders and add the correct synchronization to prevent unwanted orders from happening. That's the difficult part.

As a programming model, threads don't really help you to achieve this. When there's a bug, your only option is pretty much to think very hard and try to figure out what happened. (Or to use gdb with scheduler locking, which is about as bad.)

blackflame · on Jan 28, 2020

There are more than one way to parallelize things as well. For example you could have and assembly line methodology where different threads are working on different stages at the same time. Using a common shared threadpool, as one stage slows down, it receives more workers to speed it up. Using producer/consumer queues to designate when data is being passed from one threads control to the other eliminates a lot of multi-threading bugs if this architecture is appropriate.

Taniwha · on Jan 28, 2020

I think it's a way of thinking - I was an OS hack - Unix device drivers/etc early in my career, I could think about interrupt races/etc but it was hard, I took a detour into designing chips for a decade where EVERYTHING is parallel, by the time I got back to kernel work (Linux was around by then) I found that stuff that had been hard was now obvious.

In short if it's not a good fit for your brain, change your brain!

bakul · on Jan 28, 2020

H/W components don't share state, which is why H/W parallelism easier. Where they do interact with each other, we have to either design fully async logic or worry about metastability. In contrast all kernels are basically all about sharing state. Even in user code and even when, a language such as Go provides channels, a lot of people seem to use shared state (and mutexes and locks).

I am curious as to how your h/w experience makes it easier hacking on the linux kernel....

Taniwha · on Jan 28, 2020

There's lots of shared state, every flop is shared state, and mostly here I'm talking about synchronous parallelism in pipelines - yes we do have to deal with metastability too, and that's a whole other level of evil so that people avoid it where possible and spent lots of brainpower mitigating it (metastability is so evil you can't completely make it go away, just make it extremely unlikely).

Go is sort of like saying "we're not going to deal with parallelism directly, just throw everything into fifos to talk to other places" mostly you can't afford to waste gates like that in real hardware (there are places where it makes sense - I've built a lot of display controllers).

I think the big difference is the way you end up thinking about time and timing holes in code (dealing with interrupts), after building gates for a while i think you tend to see the timing holes more easily

ergothus · on Jan 28, 2020

Take decade to learn threads? Then expect other devs to do the same? I think that proves the point that threads aren't a good fit, even if we are technically capable of it.

Taniwha · on Jan 28, 2020

No, it didn't take a decade, that was just how long I did it for (before it got boring) - I just mean that you have to learn to think about parallelism in a different way, we train hardware engineers to do this but not software ones

pdimitar · on Jan 28, 2020

So it eventually got boring for you. So... threads are a bad mental model for our brains? :P

Not sure what you mean by your statement that hardware engineers are trained to think differently about it. I am not sure there's much overlap between HW and SW engineers in terms of parallelism modelling. Do you have a few examples?

Taniwha · on Jan 28, 2020

No, chip making got boring - essentially it's a month a year of doing fun creative stuff and 11 months a year making sure that it makes timing and is perfect before you tape out. When you're coding on the other hand you can get something new working every day. I chose to go back to software.

However having said that I find myself designing silicon again, change is good for the brain I guess

Hardware people tend to think in terms of netlists and pipelining so they have to worry about parallism on every clock

baggy_trough · on Jan 28, 2020

The greatest programmers in the world cannot write bug free C level threading code. It is a task beyond human capabilities.

mypalmike · on Jan 28, 2020

I'm not one of the greatest programmers in the world, but I don't find C threads to be particularly hard to deal with in a robust way.

Threads in C are hard if you have a sloppy codebase with global variables everywhere and a general lack of structure and abstraction. Unfortunately, this describes many many C codebases.

ringzero · on Jan 28, 2020

I don’t follow - are you asserting that, by contrast, the greatest programmers in the world can write bug-free single-threaded code?

lmm · on Jan 29, 2020

I would argue that a moderately talented programmer can write code that does exactly what they think it is supposed to do (whether you call that "bug-free" is a question of semantics) if the have a language and type system (or equivalent) that allows them to express their requirements, and they don't step outside what they can express that way (i.e. they don't write code when they can't express what its semantics should be). Languages/compilers that can do that for multithreaded code are extremely niche.

ken · on Jan 29, 2020

For all intents and purposes, Knuth has. He's certainly more careful than most but I don't think he's unique among all of humanity here.

pshc · on Jan 28, 2020

I think the implication is to stop using C.

pdimitar · on Jan 28, 2020

I'd interpret it to stop using the pthreads model of the parallel coding, in general. Because the pthreads model exists and is used in many programming languages.

But stopping to use C is a good start (for whoever has the choice)!

nineteen999 · on Jan 29, 2020

Maybe the functional programming/managed code fans should step up to the plate and rewrite the operating systems, desktop environments, and hundreds of thousands of command line tools etc that have all been implemented in C or C++.

Not as toy, proof-of-concepts. Fully fledged replacements for all that stuff that can be used in our our daily work. Build distros of the stuff so that we don't have to pick and choose this stuff and replace the bits of our systems in a piecemeal fashion.

They could show us how it's done instead of talking endlessly about it in online forums. So let's face it, it's never going to happen.

pdimitar · on Jan 29, 2020

Hm, where did I say anything about FP?

As for managed code, it's being used with huge success in a lot of places (not in OS-es or drivers, yes) but that's quite the huge topic by itself.

nineteen999 · on Jan 29, 2020

I never implied you did, they are merely one of the two major major groups of programmers I see consistently deriding C based infrastructure that presumably enables their paychecks to a large extent.

I'm not defending C, I'm merely sneering at its noisy detractors who spend more time complaining about it than supplanting it.

pshc · on Jan 29, 2020

Well, people are working on it. For example, what's been happening in Firefox. C/C++ has a head start in the millions of person-hours.

nineteen999 · on Jan 30, 2020

One could argue as a counterpoint that the Rust community (and all the other language communities) have millions of person-hours (and lines of of C/C++ open source code) to reference as they RIIR, a luxury the C/C++ community didn't really have, much of what they've built since Linux appeared was developed from scratch. Outside of 386BSD, there wasn't a lot of unencumbered UNIX source code available to them.

Perhaps as the Rust community grows in size and momentum this will happen more. Right now it seems like a bit of a manpower problem.

yawaramin · on Jan 29, 2020

@nineteen999: that all happened a long time ago. Study your history: https://en.wikipedia.org/wiki/Lisp_machine

nineteen999 · on Jan 29, 2020

I mean to replace the ones that we use today. Not the ones we used 30 years ago. General purpose operating systems. Perhaps with a distribution we can download for free and install on commodity computing hardware, and actually use today for our primary areas of work/interest.

Note the section 1.7 in your own link ("End of the Lisp machines"). The Lisp software ecosystem was never as rich, broad and varied as the C/C++ software ecosystem is today, before it died.

BTW, I only stumbled across your comment by accident, since you replied to the parent poster instead of me. Parenthesis mismatch?

yawaramin · on Jan 29, 2020

HN thread depth reply limit!

nullValue · on Jan 29, 2020

I respectfully disagree. I have a embedded multi-threaded 'C' program running in over 11k+ retail stores in the USA right now. It's been handling multiple client requests to a Sqlite DB since 2007 without any issues. This product has made my company a lot of revenue. The secret to using threads is all in the design. Don't share resources between threads (I only had one shared resource for 50+ threads guarded by a semaphore).

Cheers.

pshc · on Jan 29, 2020

That is a decent anecdote. Well, with respect, allow me to revise and qualify my read of @baggy_trough's comment:

It's perfectly fine to continue using an existing C codebase for a program, not exposed to the public internet, that's maintained by a focused group of maintainers. But on the other side of this spectrum, for large exposed projects like OpenSSL, Chromium, or even Linux, C/C++ has become risky.

gameswithgo · on Jan 28, 2020

Yet sometimes a simpler abstraction can't do the job well enough. Sometimes you have to do the hard thing.

pdimitar · on Jan 28, 2020

IMO nobody is arguing about the sometimes part. Many programmers out there can't use Erlang/Elixir or Rust, or any other tech that makes writing parallel code much more deterministic and safe. I am aware of that and I have a deep respect for the programmers in the trenches who fight to produce bug-less parallel code in C.

My argument at least is that, given the choice, there exist much more productive ways to make your employer money than to try and learn nuclear phys... I mean multithreaded programming. :-)

lmm · on Jan 29, 2020

Citation needed. I've yet to see a real business problem that couldn't be solved with a simpler, safer concurrency model and a bit of ingenuity.

travisgriggs · on Jan 28, 2020

I'm not sure I follow your terse comment. Are you inferring that because skilled programmers have bugs in code with threads, that threads are bad? Isn't that a bit non-sequitur? Skilled programmers have bugs in thread with no threads! Should we therefore infer that if only they added threads, they would be bug free?

I am not arguing in favor (or against) threads. I am not contesting that they lead to more bugs or not. What I was contesting was the assertion was that "threads are not a natural fit." My point is that procedural programming (which nearly all programming is at least small level, minus languages like Prolog) is something human beings have been doing for thousands of years. Look at "recipe" for doing something, and you have procedural program. Look at any recipe for lots of people working together to do something, and you have a pool of cooperating procedures. It's something we get schooled in our whole lives.

isodude · on Jan 28, 2020

When I rewrote applications from Java to Erlang I was amused of how much different it was. Not only codewise but how you attack the problem.

FP makes the programmer rethink the way that the problem is solved. I think that adding threads on non FP code makes it hard from the get go.

Don't hide it on hardware, rethink the approach instead.

pdimitar · on Jan 28, 2020

I mean, many people have the right idea but for now we have no choice but to emulate the right parallelism approaches on top of the wrong hardware architecture. Maybe this will change one day.

As for FP... I remember a guy saying "imperative/OOP shows you how it does stuff, while FP shows you what is it doing". It was a very eye-opening statement for me.

isodude · on Jan 29, 2020

I'm not saying you are wrong btw, there are multiple efforts put there to create truly parallell processors. But it's not until recently that the technique exist to make them good enough. Good times ahead I would say. I personally know someone part of such effort.

I clearly remember the feeling of just getting it as well. It was a very binary moment :)

nickbauman · on Jan 29, 2020

    Multithreaded programming is still one of the most problematic activities even for senior programmers

I agree wholeheartedly. FWIW I've found higher level abstractions built into the language itself, such as atoms and refs in Clojure, go a long way toward mitigating the problem once and for all. This gives you an actual harpoon to hunt Amdahl's* Great White Whale of Parallelism. Instead of a toothpick (C++) or a pocket knife (Java) to do it.

The trouble is we need to stop hunting whales if we're going to move the needle in a permanent and significant way...

* https://en.wikipedia.org/wiki/Amdahl%27s_law

hinkley · on Jan 29, 2020

> Multithreaded programming is still one of the most problematic activities even for senior programmers

My second favorite class in college was the elective for distributed computing (which I took instead of compilers, but I wish I'd taken both). It felt good to me and I came out into the job market ready to concurrent all the things.

By the time I'd worked with a disparate group of coworkers, I found out that 1) my enthusiasm was a rare thing, 2) that this was with good reason (many, many burnt fingers), and 3) that just because I love something doesn't mean that it should have a central role in a team project, even if I am the lead.

In a similar way that I 'got dumber' by learning to think like a user (one of the downsides of studying UX) I sort learned to reach for concurrency after I'd tried several other options.

... but here I am, years and years later, debugging other people's async/await code for them. Nobody likes to have to have someone else fix their mistake, but it goes a lot better when I start by telling them this shit is hard. Which it really is. Even when (or maybe especially when) I'm confident my stuff is right, I find a few boneheaded mistakes that I have expressly warned others to avoid in stuff I wrote uncomfortably recently.

voidlogic · on Jan 28, 2020

I find Go's approach to be in the Goldilocks zone. Its not intrusive enough to slow me down (or guarantee correctness for that matter...) but its enough that I no longer need to save the multi-threading something I know needs to be multi-threaded for a second pass and very rarely have multi-threading issues or trouble debugging.

That being said, developers often have a bit of learning to get there, but that can greatly be accelerated by good team/mentor code review and discussion.

pdimitar · on Jan 28, 2020

Goroutines are definitely an improvement, no argument from me.

It's just that there exist ways to shoot yourself in the foot quite easily.

Example: I'd find it much more intuitive if writing to a closed channel returned an error value and didn't issue a panic.

But I'm guessing that the Go programmers get used to the assumptions that must be made when working with the language so such things are likely quite fine with them and cause them no grief.

loopz · on Jan 30, 2020

Such things bites everyone. But writing to closed channel is regarded as programming error. You potentially lose values. So logic should be watertight, which is hard with concurrency. Doing simplest approach to locking helps.

Better with panic than silent errors and flawed logic.

pdimitar · on Jan 30, 2020

Yeah. I'm not opposed to such idioms. As I mentioned before, you get used to them.

alfalfasprout · on Jan 28, 2020

I agree that multithreading bugs (eg; race conditions, invalid access, etc.) are common. Thing is, a lot of problems emerge from poor programming practices.

Threads are definitely a valuable tool for performance-oriented work and things like eg; worker pool patterns aren't always a great fit and you end up with more overhead.

pdimitar · on Jan 28, 2020

There's a lot more lower-hanging fruit to be picked before you have to ditch actors with message inboxes for classic multithreading in order to gain some more oomph. And it has been my experience so far that the overhead of the other parallel coding techniques is peanuts compared to the I/O bottlenecks of the 99% of the apps out there.

I know there exist a group of programmers that really don't have a choice and have to use `pthreads`. But if you do have the choice then IMO sticking to multithreading programming is very backwards and invites a lot of suffering in your future (or that of your future colleagues).

Just use actors with message inboxes. Measure, profile, tweak. If you exhausted all other possibilities and you absolutely positively can't upgrade your server ever then sure, ditch actors and write multithreading code.

cogman10 · on Jan 28, 2020

Meh, I mean, shared mutable state, to me, is the ultimate culprit of threading complexity. If you don't have that, very often many of your problems go away.

Futures and Promises go a long way in making that better. That's not to say that you can always rely on them, but it is far easier to get them right than it is to get threading right with shared mutable state right.

Actors, Channels, Futures, promises, reactive programming, etc. They all have one thing in common. They kill off shared state in favor of message passing.

pdimitar · on Jan 28, 2020

> Meh, I mean, shared mutable state, to me, is the ultimate culprit of threading complexity. If you don't have that, very often many of your problems go away.

Yep, exactly.

Everybody can screw up syncing and message passing as well, given enough lack of experience, or schedule pressure, or simply not getting it. But it's much harder to screw up that while conversely, it's extremely easy to screw up multithreaded code with shared state.

jjgreen · on Jan 28, 2020

> Thread support is [more or less] standardised in all major OS-es now

More-or-less, but no barriers on OSX (at least I had to hack around that for a user 6 months ago)

FpUser · on Jan 28, 2020

"I can't count the dollars I've made fixing other people's poorly written multithreaded code in my entire career, including in the last 3 years."

"Multithreaded programming is still one of the most problematic activities even for senior programmers, to this day. Multithreading bugs get written and fixed every day."

Strongly disagree. I am going to claim that threads / locking are not hard to write if one has at least a bit of common sense and discipline. Problem is that we have hobbyists without any trace of knowledge on how computers work producing commercial software.

Goes like this: I do not want to deal with memory management - it is too hard, I do not want to think about types - my poor brain can not comprehend those, pointers - OMG what are those beasts, threads and locking - gonna commit Seppuku. I will advise those to come to a logical conclusion: programming is frigging hard when one wants to produce a decent product rather then buggy POS. No matter how many concepts the language hides / converts to something else there will be always something else they will not be able to wrap their mind around. So how about trying gardening instead

pdimitar · on Jan 28, 2020

A lot of things are possible for us the humans. But some your brain gets easier and works better with while others are a struggle even if it gets them right most of the time.

You don't have to make this about implying that others don't have "at least a bit of common sense and discipline".

FpUser · on Jan 29, 2020

I'll go the other way around. Some people are capable of comprehending an awful lot of things/concepts and use those productively. And some are not. It is definitely not the concept's inventors fault. Nothing to be ashamed of as we are all different. I am not shedding tears because I can not fathom quantum physics while others can.

dijit · on Jan 28, 2020

Your comment about threading in win32 being lackluster is surprising. It was my understanding that while the OS doesn't support fork(), that it /does/ support pthreads, and that as soon as you go outside of the process itself, Windows has "better for developers" system calls. I'm speaking specifically about I/O Completion ports.

Working with epoll or select/poll is really nasty when dealing with "some data" that's coming on a socket, where as IOCP allows you to tell the kernel "as soon as you see some data destined for my thread, just populate it in the memory here, tell me when something is there and then tell me how much is there"; for high performance C++ programmers this is basically invaluable and as far as I understand there's no direct competitor in UNIX and unix-like land. (although; according to the C++ devs where I work, "kqueues are less braindead than epoll")

oppositelock · on Jan 28, 2020

I never said it was lackluster, I said that it was equivalent. The differences lie around how asynchronous thread signals are delivered and handled. Win32 is less flexible in that regard, and when you're writing any kind of cross platform threaded software which includes Win32, you must make accommodations for this difference. Pthreads can't be fully emulated on Win32, so you have to have your pthread path, and your win32 path.

And yes, you correctly point out that on nix and Win32, interaction with the kernel is often different. Win32 does a lot right, I won't ever speak negatively of it, it is the desktop OS of choice by a huge margin. It's just that nix and Win32 are so different from each other, that you can't generally write a single implementation of your low level thread interaction for both classes of platforms.

dboreham · on Jan 28, 2020

If you were using signals on Win32, that's the problem.

gpderetta · on Jan 29, 2020

I'm not a win32 programmer, but I think that for a long time windows didn't have os provided condition variables (events are not always an adequate replacement and had nasty footguns around signaling); turns out that iti is hard to implement a cond var with posix semantics and if you managed to get a third paarty implementation it was often buggy. I think they were eventually added in Windows Vista.

PaulDavisThe1st · on Jan 28, 2020

the thing that is missing (or was missing - not 100% of the current state of things right now) in nix land is the generalized concept of "waitForSomething" where "something" could include any* event/data that should/could wake a thread. kqueues help a bit, but I think it is still harder than it should be on *nix (even Linux) to write code whose semantics are "put this thread to sleep until anything happens".

If the thread is waiting on a specific set of objects (e.g. file descriptors, kqueues), then this can be done easily, but it's still not quite as simple as it has been under Windows (and by the way, IOCP is built on top of the underlying mechanism that makes this possible on Windows; they are not the mechanism itself)

theamk · on Jan 28, 2020

Modern kernels support signalfd, timerfd and eventfd. With those things, it is possible to wait on almost anything.

zozbot234 · on Jan 28, 2020

With io_uring this can be done on Linux as well.

StillBored · on Jan 28, 2020

Not yet, read the LWN article and look at the todo list. There are a number of things still missing, and of course there are things like semaphores and generic eventing that still aren't part of io_uring.

io_uring is "different" and definitely has some advantages due to its shared ring buffer. but NT has a core/generic set of async completion/wait/etc functions that can be applied to basically the entire API surface.

gpderetta · on Jan 29, 2020

I do not think you can wait on a critical section or a condition variable with WaitForMultipleEvents in windows. The same is true in linux. You can use select/poll/epoll on signalfd though.

StillBored · on Jan 29, 2020

Does IORING_OP_POLL_ADD work with signalfd's (I don't really know without messing with it, how about IORING_OP_EPOLL_CTL?)?

I wouldn't assume it does until I've verified and read the code, I know last year at plumbers there was a fuss about the fact that a lot of operations which should have been working wernt. Frequently in odd ways, for example direct read worked but not direct write IIRC.

linksnapzz · on Jan 28, 2020

My understanding is that as of Solaris 10 && AIX ...6.1? there are IOCP implementations available for use, although they might not be enabled in the OS at install.

trentnelson · on Jan 28, 2020

You might find this deck interesting: https://speakerdeck.com/trent/pyparallel-how-we-removed-the-...

There is no equivalent in UNIX land to the thread-agnostic asynchronous I/O primitives available on NT. When paired with a robust threadpool API (Vista+), it is an unbeatable platform.

adrianmonk · on Jan 28, 2020

> Sun was working on its early multi core workstations

This article is from 1995, and they were out a bit earlier than that. Sun had been shipping 4-core workstations with thread support since 1993.

The SPARCstation 10 came out in 1992 and had two MBus slots. Each slot could take a CPU card with 1 or 2 CPUs on it.

Although SunOS 4.x had very limited multi-CPU support, SunOS 5.x had support for multiple CPUs. SunOS 5.1 (Solaris 2.1) came out in 1992 and supported SMP, and SunOS 5.2 (Solaris 2.2) came out in 1993 and introduced a thread API.

By 1995, they had also introduced the SPARCstation 20 and the 64-bit Ultra 2.

They also had some server systems (like the SPARCcenter 2000) that had a backplane with several boards that could themselves have CPU cards. I don't know the maximum number of CPUs, probably something like 20.

Info from my memory and:

http://mbus.sunhelp.org/index.htm

https://en.wikipedia.org/wiki/SPARCstation_10

https://en.wikipedia.org/wiki/Solaris_(operating_system)

brlewis · on Jan 28, 2020

> Java threads work everywhere

That depends how you define "work". Java has a decent concurrency library such that if you have a limited-scope project implemented by 1 or 2 highly-skilled programmers, you stand a good chance of avoiding thread safety issues. But as the project grows, the probability of thread safety issues approaches 1.

> Event driven programming has also become threaded, this is how the whole reactive family of frameworks, node.js, etc, handle parallelism.

Event-driven programming is way less painful than Java threads. With threads, in every place you mutate anything you have to ask yourself, "what happens if my thread is preempted here?" With event-driven programming you only do that sort of thinking in places where you see the `await` keyword.

adev_ · on Jan 28, 2020

> Event-driven programming is way less painful than Java threads. With threads, in every place you mutate anything you have to ask yourself, "what happens if my thread is preempted here?"

Event-driven programming is a different pain in the ass but still a pain in the ass.

The current status of the Web speaks for itself: it's event-driven based JavaScript, and it's full of state/transitions bugs everywhere.

In event driven you trade concurrent r/w access issues for a loosing execution contexts issues and out of order issues. The result is often a mess of spaghetti callback close to unreadable.

The less worst of the world might be simply a proper coroutine system, which seems to come to almost every programming language after 30 years of existence

oppositelock · on Jan 28, 2020

You can write horrible event driven code. For example, do your events need to talk to a database? Right there, you've got a synchronization point.

Thinking about asynchronous behavior is always tricky, and code designed to run on threaded systems isn't generally arbitrary code with locks thrown into it the mix, you strive to write the largest possible reentrant sections, and only lock where you have to, and as infrequently as possible. With reentrant code, it doesn't matter where you get preempted.

lmm · on Jan 29, 2020

> For example, do your events need to talk to a database? Right there, you've got a synchronization point.

Not if you do it right. Interacting with sync-only systems from async systems is a problem, but it's not fair to blame that on async - if you make the whole system work async, you don't have that problem.

Bjartr · on Jan 28, 2020

> That depends how you define "work".

I don't disagree with what you've said, I just wanted to chime in that in the context of the conversation to this point the definition is something in the ballpark of "are available cross-platform without jumping through hoops".

brlewis · on Jan 28, 2020

I was replying in the context of "This paper came at a time when threads were really painful to work with," which seemed to imply that they're no longer really painful to work with.

pierrebai · on Jan 28, 2020

That may be true in some language, but it is not generally true. The awaited task can run in parallel and both are only re-sync when you use the result. Thus anything can also be mutated in parallel with await/async style of programming.

trhway · on Jan 28, 2020

>Event-driven programming is way less painful than Java threads. With threads, in every place you mutate anything you have to ask yourself, "what happens if my thread is preempted here?"

in 23 years programming Java (and in the all places where i've worked having been recognized as local expert on threads and synchronization, incl. in the current very large C++ platform project) i have never asked myself that question in the context of the mutation of data.

plapetomain · on Jan 28, 2020

Well you either did ask that question and answered it by using appropriate locking and concurrent data structures (which is basically implicitly asking and addressing that question..) or you wrote a lot of crash prone shitty software. Which one was it?

trhway · on Jan 28, 2020

Locking isn't about preemption. You can have a system without preemption and still have to use the locks and other concurrent primitives.

lowbloodsugar · on Jan 28, 2020

Not sure why the downvotes. If a system has enough CPUs, threads might never be preempted, and yet we must still use locks or concurrent primitives if we are sharing resources across multiple threads.

I don't think I've actually thought about preemption explicitly since the Nintendo64 days when thread priorities figured into our locking strategy. But then we'd also re-implemented the scheduler and, at times, turned off the official OS to go do something we needed to do without interruption. So, like I said, preemption not high on my radar these days, as it falls under the general category of "concurrency". Even an iPhone has 4 CPUs.

samatman · on Jan 28, 2020

The question shouldn't have been about preëmption, but rather, "what if some other thread mutates my data here?"

The effect is much the same, whether the race is caused by task scheduling, or just because another core got there first.

gpderetta · on Jan 29, 2020

Or even if some random function you called fired some callback that just mutated some state you were in the middle of changing.

StillBored · on Jan 29, 2020

Locking and preemption are different but sometimes related concepts. As you mention.

Modern kernel people still have to worry about locking as well as preemption, if you look at the NT native API's you will see dispatch levels, which control whether kernel threads can be prempted for higher priority activities, linux is similar with the _irqsave() and preempt_ calls.

Is this for example important when syncing with an interrupt handler (no preempt/rt config), which is why there are both spin_lock() and spin_lock_irqsave() which implicitly blocks irq preemption. AKA if you grab a lock that is needed by say an interrupt handler, then you take said interrupt the machine will deadlock because the scheduler won't deschedule the interrupt handler.

kstenerud · on Jan 28, 2020

What purpose would locks and concurrent primitives have in a system where running code is not preempted?

trhway · on Jan 28, 2020

In addition to parallel in all senses/levels systems mentioned by the other commenter, another example would be a truely single threaded hardware with cooperative or even sequential multithreading where you'd use locking/etc. to safeguard your memory model invariants against for example compiler, JIT and hardware optimization shenanigans.

StillBored · on Jan 29, 2020

To guarantee two different threads don't try to simultaneously mutate (or in some cases access) shared resources.

pmarreck · on Jan 28, 2020

> But as the project grows, the probability of thread safety issues approaches 1.

This is exactly why I run unit tests concurrently (other than the speed boost), and try to write the tests (and the code) so that it won't break when run concurrently (like inserting something into a DB and then assuming success only if the count has gone up by exactly 1).

brlewis · on Jan 28, 2020

Are you saying that concurrency issues can reliably be found via testing?

pmarreck · on Jan 28, 2020

Well, they can't be deterministically found since process scheduling is not deterministic (notable exception: Haskell, I believe, has a way to do deterministic concurrent scheduling!), but I have definitely seen fails that only crop up when the suite is run concurrently, and it usually turns out that the code failing in those circumstances has made assumptions about application state that do not hold true in a concurrent context

btilly · on Jan 28, 2020

OK, intermittently your test suite fails nondeterministically 0.01% of the time. Do you imagine that this will be sufficient to find which of the last thousand commits introduced the bug?

I would recommend using specialized tools for the task like Intel's Thread Inspector or Coverity's Helgrind.

pmarreck · on Jan 28, 2020

Or one could simply try very hard not to mutate any state that is available to another process and corral all such code that requires doing so (which most of the time ends up being only a small portion of the code; the rest can be designed functionally, just passing values in and back out with no other side-effects) into a thin I/O layer, which is (I believe) the Hexagonal Architecture and is explained quite well by Gary Bernhardt here: https://www.destroyallsoftware.com/talks/boundaries

btilly · on Jan 28, 2020

This is not an either or, it is an and.

The one being a good idea doesn't preclude the other from being a good idea as well. Do them both.

(And also avoid concurrency wherever you can. And of the available forms of concurrency, threads are one of the worse ones.)

pmarreck · on Jan 28, 2020

Agree!

zelly · on Jan 28, 2020

The big problem with native threads is its API. You have to make a callback aka function pointer. All of a sudden you have to keep track of parallel execution of these function pointers. This level of abstraction has no representation in code, only existing in the developer's mind. This doesn't scale.

There is an impedence mismatch between how code is written (sequential, top-to-bottom, left-to-right text) and how parallel code behaves. That's why people use idioms like actors or coroutines. I think to actually solve this you need a new format for writing code, like a graphical function call graph instead of a text editor.

StillBored · on Jan 28, 2020

Your win32 comment is odd. I was developing on NT during the 3.1 betas in ~1992 and we chose to go all in on multithreading. Our product launched in 1993 with the release of NT. I don't ever remember hitting an OS level thread/etc bug (fair share of footguns though).

More than a year later we were trying to port much of the codebase to solaris and it was a nightmare. Pretty much nothing worked right so we ended up bolting on the fork/mmap abstraction on top of much of it and building our own lock wrapper. While we sorta got it working, the solaris port died for internal political reasons and the NT version took off and we never looked back.

NT was designed out of the gate for heavily threaded/async workloads, that stuff wasn't bolted on like pretty much every unix clone in existence.

rcurry · on Jan 29, 2020

I worked for a major day trading software company back in 1999 or so and we built everything around NT. I had come from a Unix background, but to be honest NT was the perfect choice at the time for building the back end of a large-scale retail trading platform. And IO completion ports were amazing compared to what was available on other platforms.

stareatgoats · on Jan 28, 2020

> futures, promises. As a concept, they're fine, but they're difficult to debug

Might as well forget debugging in promise-land. I think this is partly because they are conceptually problematic as well, apparent once beyond the simplest use-cases, given the number of bloggers that still seek to explain it - and then are forced to edit their post because of errors. And the need to list common mistakes on MDN:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guid...

mc3 · on Jan 28, 2020

I don't think it is that bad, once you learn those mistakes (maybe a linter could help to?) and more importantly bed down the mental model I think promises are a fine way to do things up to a point.

I don't do intense concurrency in the browser, but I'll do non trivial stuff, like collating responses from a server then sending a request once I have all that information. I think promises are fine for this and it is possible for most programmers to write clean, mostly mistake-free promise based code.

As for difficult to debug? I've never had that issue, neither in JS (browser/node) or C# with the similar but different Task<>.

For example in JS, I can put breakpoints at any point of the promise chain and see what is going on. If they are network requests I'll also look at the network tab. If everything is happening real fast I might use console.log statements, but that is rare.

I think promises are OK for a lot of situations that most of us will encounter developing business software - this might be a reflection on the simplicity of the problems I end up solving. Obviously if you are creating a multi-threaded high-frequency market making trading thing then this might not apply.

int_19h · on Jan 29, 2020

It's all about tooling. There's no reason why a debugger can't make sense of "spaghetti" stacks that result from future-based async code, and some debuggers do just that (e.g. VS does it for C#).

dirtydroog · on Jan 28, 2020

Where did futures/promises come from? They just seemed to appear out of nowhere. I'm not particularly a fan of them either. C++'s current version of std::future is particulary useless.

mc3 · on Jan 28, 2020

Reading on wikipedia, it sounds like the 1980s this stuff came out, with support in a particular lisp released in 1980.

The promise pipelining technique (using futures to overcome latency) was invented by Barbara Liskov and Liuba Shrira in 1988.

Yes B. Liskov is the L in "SOLID", if you are wondering!

Happy 40th birthday promises! :-)

I wouldn't be surprised if mathematicians were thinking of this before transistors were invented though.

I think it takes time for things to trickle down. Maybe it takes someone from an academic background (or a curious paper reader) to be forced to use C++ or Javascript for something, but along the lines of "progress relies on unreasonable men/women" they get annoyed and write their own library so they can use the nice Lisp/Haskell/etc. feature they are used to, but often in a more limited way but better than nothing. Then from there it might get hyped to the shithouse, die, or just get used by insiders.

abecedarius · on Jan 28, 2020

AFAIK the E language http://erights.org/index.html influenced several other people and projects like Twisted Python and Midori, which influenced the now-popular deployments like Javascript. (I followed E in the 90s but not so much the other projects.) There's a sketch of E's history at http://erights.org/history/index.html but it's mostly stubs there. They apparently invented promises independently of Liskov, while working on Xanadu: http://erights.org/elib/distrib/pipeline.html

kragen · on Jan 28, 2020

Right, and MarkM is on the ECMAScript committee. Futures I think are about 20 years older, maybe from MIT.

P.S. sorry so late answering your mail!

abecedarius · on Jan 28, 2020

https://en.wikipedia.org/wiki/Futures_and_promises is quite detailed and looks good. I'm sure MarkM & company knew of futures, so I should've mentioned them.

(I owe you mail too.)

kragen · on Jan 29, 2020

Aha, Baker and Hewitt, 1977. I didn't realize Friedman had defined "promises" in 1976; I wonder if they're the same as E promises? (Which are almost precisely the same as ECMAScript promises.)

I was terribly remiss not to mention Dojo, a popular JS toolkit which got its promises from Twisted, which of course got them from E, though Twisted modified them a bit. I don't know how it slipped my mind.

abecedarius · on Jan 29, 2020

I haven't read the Friedman/Wise paper. Most likely those were futures? They were doing FP work around then, like https://help.luddy.indiana.edu/techreports/TRNNN.cgi?trnum=T... (I seem to remember reading another paper of theirs which included racing suspensions till one of them completed, which would be more like futures than the streams of this paper. But if so I'm forgetting where.)

I get the impression E promises had a nicer design than JS's for handling errors -- but that's also a vague memory and I never really learned JS's.

dboreham · on Jan 28, 2020

They're the lesser of two evils: before came callback hell.

Async/await has also mostly replaced bare Promises. The result almost gives you something as usable as threads.

Fwiw I find it amusing the async is promoted as the pinnacle of elegance in concurrent programming when in reality it exists because the JS interpreter wasn't thread safe.

kragen · on Jan 28, 2020

There were thread-safe JS interpreters, in early versions of Opera for example. But without locks it was going to be impossible to write thread-safe JS. The arguments of Ousterhout and Miller may or may not have been an influence in 2000 but certainly they were in 2010. You might be interested in Miller's dissertation.

newnewpdro · on Jan 28, 2020

I came along a bit after you did, arriving with Pthreads books already in bookstores, but on Linux we were just getting LinuxThreads and not yet even NPTL, so things were still pretty flaky. I remember trying to write threaded programs that used SVGAlib, and breaking my console multiple times a day because LinuxThreads used SIGUSR[12] internally and so did SVGAlib, that year magic-sysrq became my best friend.

My impression at the time was that developers had a flawed impression of threading complexity and bugginess because they were bolting threads onto existing single-process programs, and their existing hygiene was the problem, nothing inherent to threading.

If you look at the single-process C programs of the era, global variables and global state in general were extremely common. When you start adding threads to such programs to try take advantage of SMP, trying to wrap locks around heaps of unnecessarily shared, poorly encapsulated state, of course you produce a lot of buggy programs. Then everyone starts saying "multi-threading is too hard, not worth it", instead of admitting their programs are a complete mess.

Using modern standards of hygiene, even with threads-naive languages like C and good old Pthreads, I don't find it particularly challenging at all to write threaded programs.

Like you mentioned, I also struggle with the higher level abstractions attempting to make threading easier. Pthreads makes sense to me, I find threads, mutexes, condition variables, rwlocks, all very intuitive and ergonomic to use, but it's probably because I spent a lot of time using that API at a young age.

As I started taking swe jobs in silicon valley in the early-mid 2000s, it surprised me how few people had experience with Pthreads. It blew my mind, four different startups doing C programming with experienced C programmers and nobody was ever familiar enough with Pthreads to quickly review my code without having to rtfm. It's like there was such a stigma surrounding threads being "too hard" a lot of people never even attempted it. But my being a kid learning linux and just excited about new features in my unix, when LinuxThreads arrived and then NPTL, I spent years playing with Pthreads in C and getting my hands on SMP systems just to program them with Pthreads. Those years of playing paid off unexpectedly well in silicon valley, SMP was everywhere and C was still heavily in use on Linux.

I still get a bit of happy nostalgia when the opportunity arises to write some code like:

  pthread_mutex_lock(&foo->lock);
  while (!foo->ready)
          pthread_cond_wait(&foo->cond, &foo->lock);

  /* consume from foo */

  pthread_mutex_unlock(&foo->lock);

rovolo · on Jan 28, 2020

Personally, my issues with threads and the various locks is that you have to manually manage resource ownership. It's kind of like programming without types, with manual memory management, or with null values. It's easy to program with 'null' if you do it every day, but it wears on you and there's always the chance that you make a dumb mistake. Moving that functionality into the type system is a relief because you can lean on the compiler to tell you when you're doing it wrong.

I like promises/futures because they tell me when resources will take a while to become available without blocking. I generally don't want to need to know that I need to lock the foo queue before consuming from the queue. I would rather have the queue handle that for me and encode the behavior I want in the type signature:

    next() -> T           sync, blocking
    poll() -> Optional<T> sync, non-blocking
    next() -> Future<T>   async

badsectoracula · on Jan 28, 2020

> The odd one out is Win32, but it's close enough to pthreads for the same concepts to apply.

One thing that i find very convenient in Win32 that is lacking in other platforms (at least using pthreads) is that every thread has its own message queue and you can have threads communicating with each other simply via PostMessage/GetMessage (which also handles sleeping). You can implement something similar over pthreads, but it is nice that the OS provides that out of the box.

dfox · on Jan 28, 2020

The message queue is user-space abstraction relevant to the GUI subsystem and not something that is inherent to Win32 threading model. And in fact its implicit existence is source of completely unique Win32 class of hard to debug threading bugs.

Edit: on the other hand Win32 has one really nice feature: you can WaitForMultipleObjects() on essentially anything that has kernel handle, which includes most of IPC primitives. On the other hand this causes the native Win32 IPC to have significant overhead and is the reason why game developers often resort to userspace spin-locks and why Windows 10 had introduced NPTL/Linux-style lightweight futex-based mutexes...

badsectoracula · on Jan 29, 2020

It may not be inherent but IMO that is just hair-splitting, the important part is that it is there (also FWIW i had PostThreadMessage in mind, not PostMessage) and is very convenient to use. I've used it a bunch of times whenever i worked on Win32 tools that i wanted threading by sending commands to other threads to do stuff (e.g. enumerating the directory structure for an asset browser in a game editor at the background to avoid stopping the UI) and having them reply back with messages about the results.

And honestly i never had any bugs with that, if anything i've found it the easiest approach to understand when it comes to inter-thread communication.

int_19h · on Jan 29, 2020

Win32 message queues are not only relevant to the GUI subsystem. Note how PostThreadMessage doesn't even deal with HWNDs anywhere. And COM, for example, uses those same message queues for cross-apartment calls - even if it's one GUI-less service calling another one, there's a message pump in there somewhere.

dfox · on Jan 29, 2020

I somewhat believe that the sole reason for existence of PostThreadMessage and friends is to make COM cross-apartment calls work for threads that do not own any windows. You can probably devise other uses for that, but such uses still somewhat boil down to implementing your own COM-like bidirectional IPC mechanism (GIMP's libwire comes to mind, which does essentially the same thing on top of “anything that behaves like SOCK_STREAM socket” and shared temporary directory)

BlueTemplar · on Jan 28, 2020

Yeah, as someone using JavaScript for the first time (and not being an experienced developer either), I gave a brief look at promises, recognized that I hardly understood any of it, and quickly decided to use the bleeding edge async instead ! (And haven't regretted it.)

gdubs · on Jan 28, 2020

Curious what kind of work your were doing? I’ve always been fascinated by the golden days of SGI.

oppositelock · on Jan 28, 2020

All sorts of real time graphics stuff; flight simulators, oil and gas visualization of voxel maps, general industrial visualization. These were the early days of hardware accelerated 3D and everyone thought that visualization would change the world to a greater extent than it really did.

It was great fun to write code which drove 8 displays using 8 graphics pipes, with roughly 8 cores working in concert with each pipe. All this work for something that runs faster on the latest iphone using a single thread...

I went on to work at SGI for a few years, and it was still my favorite job ever. It was pure R&D, graphics and realtime systems for their own sake. Today, this doesn't exist. 3D graphics are an applied technology that's part of an app, but not a product research area of its own.

zeusk · on Jan 29, 2020

> It was pure R&D, graphics and realtime systems for their own sake. Today, this doesn't exist. 3D graphics are an applied technology that's part of an app, but not a product research area of its own.

I'm quite certain there are teams here at Microsoft that do just this, and their counterparts in the GPU industry (AMD/Nvidia/Intel - although Nvidia seems to be the dominant one in research)

Psyladine · on Feb 3, 2020

>and everyone thought that visualization would change the world to a greater extent than it really did.

c.f.: Data Scientists

pvg · on Jan 28, 2020

On IRIX, you would do threads yourself by forking and setting up a shared memory pool

IRIX also had that weird 'sproc' API that I think was semi-lifted from something in Sequent's DYNIX.

dfox · on Jan 29, 2020

By the way this is how threads work on Linux to this day. LinuxThreads were exactly this and NPTL is mostly about doing magic with RT signals and shadowing libc symbols such that stuff that changes global process state works reliably. Along the way the kernel got some awareness of userspace playing such tricks and got futex(), but still it is mostly implemented as bunch of userspace magic that makes essentially unrelated processes look like threads of same process. One nice consequence of this is that all the PTHREAD_WHATEVER_PSHARED stuff simply works (in contrast to BSD derivates, where you have to choose whether you want to use IPC primitive across threads or across processed and when you want both you get -ENOSYS, -EINVAL or somewhat hilariously -ENOMEM with manpage containing rationale along the lines “POSIX says that this has to be possible and that there can be global limit on number of such objects. It does not say that the limit should be larger than zero, so this always fails with -ENOMEM”)

yongjik · on Jan 28, 2020

I'm not exactly sure if events are easier to debug. I use tornado (event-based Python webserver library) extensively at work - when something goes wrong you don't get a nice stack trace, you get some random sampling of callback spaghetti. Also the default state of matters is that everything is serialized in a single thread and everything waits for their predecessor, even when they are totally unrelated, though that probably tells more about the particular framework than event-based programming in general.

I'd rather use a real multithread-based framework, honestly, though I concede that it also opens up different ways of making developers' lives miserable.

ailideex · on Jan 28, 2020

The programming paradigm provided by a threaded model (which is also there in async/await syntactic sugar) is a lot easier to follow and work with than asynchronous code paradigm IMO. I think writing code asynchronously is solving a problem all over your code base which can be solved much better by async/await or lightweight threads like golang has with goroutines and like jvm is getting with project loom - all of those solves the problem in the right place - allowing you to write code synchronously but execute it asynchronously.

Once we de-couple these two things - how we write code and how we run code - the discussions about this become clearer and easier to have.

rlpb · on Jan 28, 2020

The state of the art is improving. Trio for Python for example is much better in terms of stack traces.

> Also the default state of matters is that everything is serialized in a single thread and everything waits for their predecessor, even when they are totally unrelated...

I think this is a preferable default since bugs caused by pre-emptive multithreading are much harder to debug.

maxmalysh · on Jan 28, 2020

It's time to migrate to async/await. Check out asyncio.Protocols for TCP servers and aiohttp for HTTP servers. We get beautiful async stack traces delivered to Sentry. Debugging anything is a joy.

fjp · on Jan 28, 2020

I’m always a +1 for aiohttp (and aiopg)’s ease of use

davidw · on Jan 28, 2020

The author, in case anyone didn't recognize the name:

https://en.wikipedia.org/wiki/John_Ousterhout

Specifically, he created the Tcl programming language, which had a nice event loop way back when.

Uhhrrr · on Jan 28, 2020

It's still there! IIRC you could also have arbitrarily many nested event loops, for better or worse. And some threading support is also available [1], although it seems to be culturally disapproved of.

1. https://wiki.tcl-lang.org/page/thread

topspin · on Jan 28, 2020

I solved some tough IPC problems using the TCL event loop. A product that runs on several thousand machines across the US was designed around a set of distinct C++ processes communicating over sockets. Using the TCL event loop greatly simplified the design, improved performance and eliminated concurrency bugs present in the bespoke IPC code it replaced.

It should have been written that way on day one but some programmers default to pounding out a bunch of spaghetti instead of learning about the tools at their disposal.

davidw · on Jan 28, 2020

I just meant to say that it was already in place... uh...holy crap... 25 years ago.

kevin_thibedeau · on Jan 28, 2020

Technically it was part of Tk. It got grafted into mainline Tcl 20 years ago.

milesvp · on Jan 28, 2020

A shame that no one has mentioned cache invalidation as further reason threaded programming is hard. One my biggest takeaways from Martin Thompson’s talk on mechanical sympathy is that the first thing he tries when brought in as a performance consultant is to turn off threading. He mentions locking as a performance problem but that these days cache locality can be the key to speeding up slow applications.

zzzcpan · on Jan 28, 2020

Yeah, it was hard to realize in 1995, but nowadays pretty much everyone who tried experienced performance problems with threads, or rather with shared memory multithreading concurrency model. It doesn't actually scale if you idiomatically synchronize shared memory access with locks or atomics, you need some way to batch things and amortize the cost of synchronization between cores while also preserving locality, which ultimately implies an asynchronous model where threads are just a low level implementation detail.

milesvp · on Jan 28, 2020

I've been hearing rumors that AMD's current offerings have been starting to avoid even shared cache between processors. It boggles my mind that any CPU designer would think a shared L2 cache is a good idea. Makes me wonder where my model of memory starts to break down. I always just think of L2 as being slower, less expensive memory than L1. I'm wondering if there are any benefits that actually outweigh the cache eviction penalty of multiple processors accessing it...

kardos · on Jan 28, 2020

It's surely an engineering tradeoff resulting from weighing the different pros/cons. If you had dedicated cache per core at 1/Nth the size, much of it would be wasted when you're running less than full tilt -- eg, if you're using 2/4 cores then half of the cache is artificially unavailable instead of doubling the cache available to those 2 cores.

gpderetta · on Jan 29, 2020

L2 hasn't been shared for a very long time.

dfox · on Jan 29, 2020

For me this is the only relevant reason why you don't want to do multithreading or alternatively why you want to structure your multithreaded application as mostly independent “processes” that communicate by means of message queues. I don't view “Concurrency is hard to get right” as a valid argument.

amelius · on Jan 28, 2020

Yes but the cache problem also exists with asynchronous programming.

milesvp · on Jan 28, 2020

True. I wonder how much event systems by their very nature simply destroy cache locality. Still, it's likely much easier to reason about cache hits, and build event handlers such that they remain local for the duration, as opposed to threading, where it's all but impossible to predict what the cache will look like.

gdy · on Jan 28, 2020

What talk is that?

dang · on Jan 28, 2020

A thread from 2017: https://news.ycombinator.com/item?id=14547063

Way back in 2008: https://news.ycombinator.com/item?id=399670

zevv · on Jan 28, 2020

> A thread from 2017

Didn't you read the article?!

electricityUser · on Jan 28, 2020

Did you click on the links in the comment above yours? ;-)

metalliqaz · on Jan 28, 2020

Whoosh

birdyrooster · on Jan 28, 2020

The perniciousness of winky face is on full display here

cdoxsey · on Jan 28, 2020

You have two choices when it comes to utilizing multiple cores: threads and processes. Threads are hard, but sharing data between processes is hardly any easier.

Maybe they're a bad idea, but these days you have no choice but to learn how to use threads. AMD's run-of-the-mill processors have 16 cores, Intel's 8. Servers have lots more. Heck even your iPhone has 6.

The clock rate on CPUs isn't getting any better. It's just more cores from here on.

marcosdumay · on Jan 28, 2020

It being hard to share memory is a feature, not a bug.

Threads make any single thing on your program mutable without your direct control. Processes keep the mutability scoped into a few hard to extend areas.

FartyMcFarter · on Jan 28, 2020

> Threads make any single thing on your program mutable without your direct control.

No they don't. Threads don't mutate random variables by themselves, you need actual code that does the mutation (whether it's running on a separate thread or not).

I mean, how is that statement different from "calling other functions makes any single thing on your program mutable"?

marcosdumay · on Jan 29, 2020

> how is that statement different from "calling other functions makes any single thing on your program mutable"?

On those languages where functions mutate things, that's basically true. But it's much more common that functions can only mutate global variables, and people keep those in low numbers, exactly for that reason. Actually, replace "functions" with "methods" and you will get into one of the largest flaws of OOP.

But anyway, mutability is much less of a problem outside of concurrent code.

dirtydroog · on Jan 28, 2020

'const' all the things.

jovial_cavalier · on Jan 28, 2020

He says at the end that if you want true concurrency, use threads. But the point is that many people use threads where concurrency is not a requirement, or even desirable.

chrisseaton · on Jan 28, 2020

> But the point is that many people use threads where concurrency is not a requirement, or even desirable. reply

Why would you write a program using threads if you don't require concurrency? The only purpose of threads is to achieve concurrency. Can you give some examples?

sgerenser · on Jan 28, 2020

Not the person you were asking the question to, but I can give you tons of examples. This is particularly true in the embedded space or people who write C or C++ that doesn’t have any standard event library. Basically, threads are a lowest-common denominator way to do blocking operations without stalling everything. I’ve seen a lot of code where there’s various threads running at different intervals (low priority background thread that sleeps for 1 second then wakes up to do stuff, another thread that does nothing but blink an LED, etc.) But in reality, in most non-cpu bound workloads, using an event loop would make writing this kind of code much easier and you woulnd’t ever have to worry about mutexes, semaphores, deadlocks, etc. As a C++ programmer, using Qt (even for non-UI stuff) with its event loop is just so much easier than spinning up threads just to call a network endpoint.

That being said, threads definitely have their place and are the only real way to take advantage of all the power offered by modern multi-core CPUs (well that and multi-process but that’s not really any easier to get right). But for the basic stuff I think running 80 threads when your app is only using 5% cpu is insane (something I see a lot in the C++ code I’m exposed to).

chrisseaton · on Jan 28, 2020

Aren't all your examples still examples of concurrency?

> way to do blocking operations without stalling everything

Switching between multiple tasks doing IO - that's concurrency isn't it?

> various threads running at different intervals (low priority background thread that sleeps for 1 second then wakes up to do stuff

Different tasks ready to run and switching between them as needed - that's concurrency again isn't it?

Not really sure what everyone else in this thread is seeing that I'm not.

sgerenser · on Jan 28, 2020

OK, sure those are actually examples of concurrency. I was thinking of concurrency more along the lines of multiple threads of execution doing useful work at the same time, which threads are actually good at doing but these examples are not that.

justincredible · on Jan 28, 2020

The confusion might be due to everyone using concurrency to mean concurrency and/or parallelism.

plapetomain · on Jan 28, 2020

Think of an object oriented system. You can have a thread per object at the extreme where each object has its own thread/queue to handle messages. For most cases with synchronous calls you’re not really getting any concurrency.

Maybe it’s hard to imagine now, but in the 80s and 90s there were people that pushed this sort of architecture with a straight face. Even if not this extreme the idea of using threads for componentization rather than a focus on concurrency..which was possibly a side benefit was very much a thing (think COM/CORBA))

Hence why many articles like this and Ousterhout from the 90s, etc saying it was idiotic.

aidenn0 · on Jan 28, 2020

I worked on an embedded system with 1 thread per object and it was a very good ratio of code-to-expressivity (both when being used, and the implementation of the system). Each logical hardware port had its own thread, and was interacted with solely through message passing.

brandmeyer · on Jan 28, 2020

This is pretty much exactly the kind of robust system architecture that Erlang advocates employ.

In embedded systems, our kernels and threads are lightweight enough that we can go very fine-grained without paying a steep context-switching penalty. I'm not convinced that the penalty in Linux is all that high, either. Its only when you're going after the C10K (or C1M?) problem that you start to notice.

aidenn0 · on Jan 29, 2020

> In embedded systems, our kernels and threads are lightweight enough that we can go very fine-grained without paying a steep context-switching penalty.

Right, this system would run through the entire runlist at several kHz when idle and 0.5-1kHz under load on a PowerPC 405 that ran at about 200MIPS. Our shortest deadline was 10ms so it was plenty fast enough. Context switch was swapping out 12 machine words.

jovial_cavalier · on Jan 29, 2020

>why would you write a program to use threads if you don't need concurrency?

... exactly. There is no good reason. That's what I'm saying the point of the article is. However, this doesn't mean that people don't do it. He is saying that the tasks that people typically use threads to solve can actually be solved with an event loop and handlers, thus eliminating all of the messy issues with shared state, race conditions, etc. that true concurrency introduces.

jandrese · on Jan 28, 2020

I think that is referring to the case of "select() is too confusing, I'll use threads instead so I won't have to think about it."

chrisseaton · on Jan 28, 2020

I think I'd still call that concurrency - you want multiple continuations to preserve state between IO calls.

mjpuser · on Jan 28, 2020

I think that programmers are quick to jump to concurrency when they want to make something performant as opposed to using other structures, like a cache, or researching other possible solutions, like the HN favorite bloom filter, or other algorithm.

didibus · on Jan 28, 2020

That said, threads don't give you quite the boost you might be hoping for. Even with things that can parallelize well, it seems at around 8 or 12 threads you start to hit a wall.

dirtydroog · on Jan 28, 2020

How so? How many cores on the machine? Maybe you have some false-sharing, maybe thermal throttling is kicking in.

didibus · on Jan 30, 2020

Some of what I heard is memory access bottlenecks and cache coherency bottlenecks.

redisman · on Jan 28, 2020

It's good but not quite as good. 16 core is the high end of consumer CPUs, mid-range is probably more like 6 or 8 cores.

ychen306 · on Jan 29, 2020

I think he's referring to threads as a programming model, not the physical implementation.

awinter-py · on Jan 28, 2020

> Threads should be used only when true CPU concurrency is needed

> Scalable performance on multiple CPUs

the exceptions to 'when to use threads' in 1995 sound like SOP these days

jeffdavis · on Jan 28, 2020

Threads have some very practical advantages:

* Standard, easy way to get a backtrace

* Standard, easy way to get a list of active things going on

* In many cases, threads make it easy to follow control flow

chriswarbo · on Jan 28, 2020

Yet threads make backtraces and control flow much less useful, since they miss out important context from concurrent threads.

I wouldn't personally say that threads make control flow easier to follow: we might gain a little by disentangling separate activities into threads, but we lose a lot when these get interleaved in arbitrary, non-deterministic ways.

white-flame · on Jan 28, 2020

> Yet threads make backtraces and control flow much less useful, since they miss out important context from concurrent threads.

Threads' contexts should be independent from each other. If reading your stack trace relies on the state of other threads, you've got a very brittle design.

What is lost is the history of the state the current thread is working with, but the state should be fully encapsulated within the thread, except in the case of large shared read-only input buffers that are being processed in parallel. But those latter buffers aren't hidden from the current thread nor its debugging. Debug information can also be logged on state objects to show its provenance.

jrochkind1 · on Jan 28, 2020

> but the state should be fully encapsulated within the thread, except in the case of large shared read-only input buffers

Sure. The trick is that the thread design does not ENFORCE that, threads as an abstraction involve shared memory.

> If reading your stack trace relies on the state of other threads, you've got a very brittle design.

Or a bug. And a bug or a brittle design is exactly when you need a debugger the most, right?

derefr · on Jan 28, 2020

> Yet threads make backtraces and control flow much less useful, since they miss out important context from concurrent threads.

I mean, when a program panics, you get a stacktrace from (a consistent snapshot of) all the threads, so what's the problem?

erik_seaberg · on Jan 28, 2020

If a highly concurrent service gets a query'o'death, it's hard to tell which threads were working on it and which were merely within the blast radius. Frameworks tend to roll their own notion of "request context" without it being strongly typed or pervasive across the language and libraries.

derefr · on Jan 28, 2020

Now I’m curious what backtraces from Postgres look like when parallel scans are enabled. IIRC, you’ve got one isolated fork(2)ed master for the connection, which then has threads to divide work. Not too bad to debug.

gardnerbickford · on Jan 28, 2020

A snapshot doesn't show the shared memory changes that may have caused the panic. There needs to be a synchronous log of all changes to shared memory to debug an issue.

Similar to having a snapshot of network traffic vs a recording of network traffic.

gpderetta · on Jan 28, 2020

Also a way to make use of more than a tiny bit of silicon in that expensive CPU you just bought.

anaphor · on Jan 28, 2020

It's not that threads are necessarily a bad idea (though they can be for performance reasons), but that programming with most synchronization primitives is a bad idea. If you program with message passing, then it's not much different from the "events" model except that in the event-driven model you're trying to hide the underlying abstraction more (you still have concurrency, it's just baked into the I/O library).

I honestly think this presentation is confused. "Concurrency is fundamentally hard; avoid whenever possible" seems to go against their own argument. Event-driven models (which rely on message passing) are still doing concurrency, except instead of using locks and semaphores to synchronize things, you're using mailboxes and channels.

Even CPU interrupts are a form of concurrency that is similar to event-driven models. Just because you're not spawning a thread and acquiring a lock, doesn't mean you're not doing concurrency.

pkolaczk · on Jan 28, 2020

Event driven concurrency is not really easier than threads with primitive blocking synchronisation. Races are still possible, instead of deadlocks you can have livelocks, resource control and backpressure are non-trivial etc.

anaphor · on Jan 28, 2020

That's true, but at least you can handle backpressure at the runtime level and just choose a predetermined strategy for dealing with it (i.e. start dropping messages, exponential backoff, etc)

TheFiend7 · on Jan 28, 2020

IMHO I think one large part of it is synchronization. If you're having to synchronize things all the time, you're probably misusing threads and should be using a different execution model.

desc · on Jan 28, 2020

Another way to look at it is that only infrastructure should be locking stuff, and infrastructure should be a very tiny part of the codebase. That infrastructure should probably be responsible for layering a different concurrency paradigm on top of threads...

Most platforms these days provide such things as part of the language, or in the standard library, or as a freely-available package. Writing one's own concurrency infrastructure is usually unnecessary, but when it is needed, it needs to be kept as small and as easily-auditable as possible.

A bit like `unsafe` in Rust, in fact.

Locking all over the place generally indicates that someone's trying to shotgun-debug concurrency bugs. I've had to use libraries which did that, and wished horrible things upon those responsible.

bcrosby95 · on Jan 28, 2020

> Threads should be used only when true CPU concurrency is needed.

Which is basically any program running on any modern processor.

AnimalMuppet · on Jan 28, 2020

Not at all. There are multiple cores, sure, but why does my program have to use them? If it performs adequately using only one core, and if the nature of the problem doesn't require threads, why should I make it multithreaded just because the processor has multiple cores?

rongenre · on Jan 28, 2020

Raw threads are really hard to program correctly, however sinking parallel code into an executor or queuing framework tends to really reduce complexity and in a lot of cases get all the cores working.

bjourne · on Jan 28, 2020

In my life I must have written at least a few million lines of code. Probably more. How many of those lines have been explicitly multithreaded? A few thousand lines, at most.

bluejekyll · on Jan 28, 2020

I think this was truly the most amazing thing about learning Rust. After having experienced the pain (the issues brought up in the linked slide deck) of threads in C, then C++ and Java, it was wild to work with a language that provided some significant safety rails for working with data across threads.

Now async/await gives us even better options on top of that, but it’s truly what made me enjoy the language so much. This article is what resonated with me and got me to invest so much spare time over the last 5 years in working with Rust: https://blog.rust-lang.org/2015/04/10/Fearless-Concurrency.h...

gmfawcett · on Jan 28, 2020

Rust does a great job here, but so do languages that make multithreading safer at a higher level -- Erlang/OTP and Haskell are two great examples. Immutable data may not solve every concurrency problem, but it sure goes a long way.

I also have a lot of respect for Ada's task-based concurrency approach (independent actors, communicating by rendezvous). You don't get the flexibility to roll your own concurrency strategy in Ada, but the language's support for its chosen mechanism is truly excellent. Even if you'll never use Ada, this part of the language is worth studying just as an example of great engineering design.

DougBTX · on Jan 28, 2020

There's a nice Kevlin Henney talk where he lays out this diagram:

                                 |
       non-shared mutable state  |  shared mutable state
                                 |
     ----------------------------+------------------------
                                 |
      non-shared immutable state | shared immutable state 
                                 |

Each quadrant is safe, except the one in the top right: shared mutable state.

Functional languages are great at the bottom two, since they strongly encourage immutable state. Message-passing based concurrency strongly encourages non-shared state, the safe two on the left.

Rust is the only language I've seen which encourages all three safe quadrants, while making the fourth a compile-time error.

kccqzy · on Jan 28, 2020

> Each quadrant is safe, except the one in the top right: shared mutable state.

No. Your database is one giant shared mutable state.

How is using a database safe then? Transactions.

Haskell has had software transactional memory for fifteen years now. Microsoft tried to copy it in .NET but it's nearly impossible to do right in a language without clear separation of pure and impure code.

astine · on Jan 28, 2020

"> Each quadrant is safe, except the one in the top right: shared mutable state.
No. Your database is one giant shared mutable state.

How is using a database safe then? Transactions."

Transactions are less an occasion where shared mutable state is made 'safe' during concurrency and more a situation where concurrent processes are forced to temporarily interact and operate in a sequential, non-concurrent, manner. They use locks under the covers. Databases manage to be parallel because different processes operate on different sets of data at the same time; they lock when two or more processes attempt to access the same row on a table. Transactional state access is still vulnerable to deadlocks and other difficulties of concurrent programming. This means that transactional state is not 'safe' in the same way that immutable and non-shared state are safe. It's just a much easier way of managing the kind of difficulties you have with shared mutable state than say, raw locks.