Hacker News new | comments | show | ask | jobs | submit login

I've been writing massively threaded C++ code since the 1990's, so I can comment with some serious experience here. My code used to run on 256CPU SGI's and had high cpu utilization across all of them.

When designing C++ code for threading, you have to keep a few things in mind, but it's not particularly harder than on other languages with thread support. You generally want to create re-entrant functions which pass all of their state as input, and return all results as output, without modifying ant kind of global state. You can run any number of these in parallel without thread safety issues, or locks. When using multiple threads with shared data, you have to protect it with locks, but I generally use a single thread to access data of this sort which produces work for other threads to do, which can be queued or distributed any other way. Producer/consumer queues show up a lot in good designs, and your single-threaded choke points are usually a producer or a consumer, which allows thread fan-in or fan-out. (As a side note, Go really got this right with channels).

What leads to problems in threaded C++ programs is unprotected, shared state. Some C++ design patterns make life particularly hard, such as singletons, which are just a funny name for global variables. C++ gives you lots of ways to make something global accidentally - static member variables, arguments by reference, for instance.

True C++ hell is taking a single threaded C++ program and making it multi-threaded after the fact, but that's hell in any language which allows you to have globals of any sort and doesn't perform function calls within some kind of closure.

The article in this post is being a little reductionist. You don't need to pick a threading model - some of your threads can pass messages, others can use shared data, all in the same program. The lack of thread local storage is not an issue, because you partition the data so that no two threads are working on the same thing at the same time.

I agree with all your points.

I've written a lot of multithreaded C++, I did quite a lot back starting about 1994 for around 10+ years, a lot of middle-tier application code on Windows NT in particular (back in the day when it was all 3-tier architectures - now we'd call them services) - it's totally fine if you know what you are doing.

Work is usually organised in mutex-protected queues, worker threads consume from these queues, results placed a protected data structure, and receivers poll and sleep, waiting for the result.

Other tricks to remember are to establish a hierarchy of mutexes - if you have to take several locks, they must be done in order, and unlocked in reverse order, this should guarantee an absence of deadlocks. A second trick - a way to guarantee locks are released and bugs of non-release of mutexes do not occur, as well as the correct order of releases, is to strictly follow an RAII pattern, where destructors of stack-based lock objects, unlock your mutexes as you exit stack frames.

Of course, in later periods, you started to see formal recognition of these design patterns, in Java and C# libraries which had WorkerPools, Java and it's lock() primitive, but these design patterns were prevalent in my code at the time, because it was the only obvious and simple way to use multi-threading in a conceptually simple manner. KISS...

Nothing particularly hellish about any of it - but I remember it was not a development task for all developers, and without common libraries in the period (this is pre-STL), you had to work a lot of it out for oneself.

I do remember in the period you would get grandiose commentary from some public developers who would proclaim such things as, "it is impossible to have confidence in/write a bug-free threaded program."

I always felt that said more about the developer than multithreading though.

The D programming language has a 'pure' annotation which can be applied to functions. This enables the compiler checking that the function does not read or write any global mutable state, including checking the functions called.

After using it for a while, it's amazing how much global variables tend to creep unannounced into code :-)

> The D programming language has a 'pure' annotation which can be applied to functions. This enables the compiler checking that the function does not read or write any global mutable state, including checking the functions called.

Are foreign calls assumed to be non-pure? Can they be marked pure in the case that the foreign (likely C) function doesn't reference global mutable state?

> Are foreign calls assumed to be non-pure?


> Can they be marked pure in the case that the foreign (likely C) function doesn't reference global mutable state?

Yes. Here's an example:


It's true that the C Standard does not actually guarantee that they don't access mutable global state, but in practice they don't. We're not aware of one that does, and don't know why anyone would write one that does.

great feature, sir. thank you!

You're welcome! It's a 'little' feature with a surprisingly large impact.

> True C++ hell is taking a single threaded C++ program and making it multi-threaded after the fact, but that's hell in any language which allows you to have globals of any sort and doesn't perform function calls within some kind of closure.

That's not true. In Rust you can have globals, but you have to declare the synchronization semantics for every global when you do. (That is, they have to be read only, or thread local, or atomic, or protected by a lock.)

This property makes it very easy to take a single-threaded Rust program and make it multithreaded later, and in fact this is how most Rust programs end up being parallelized in my experience.

> you have to declare the synchronization semantics for every global when you do. (That is, they have to be read only, or thread local, or atomic, or protected by a lock.)

That doesn't sound like it helps you determine which one of those is appropriate. So it seems rather like a system that will lead to unexpected bottlenecks when you parallelize. (Somebody deep inside the stack arbitrarily decided locking was the most appropriate - now you've got a contended lock or a potential lock ordering bug later on.)

Granted it does seem like it allows for more reasonable defaults than a default-unsafety policy.

This is usually not a problem because using globals and taking locks unnecessarily is unidiomatic Rust. You have to go out of your way to write more code when you use mutexes, so most people don't unless they truly need to.

We have experience with this. For a long time, WebRender rasterized glyphs and other resources sequentially. Switching it to parallelize was painless: Glenn just added a parallel for loop from Rayon and we got speedups.

> a system that will lead to unexpected bottlenecks when you parallelize

This is a funny comment. You are implying that performance is of higher value than correctness. Speed without correctness is dangerous, and leads to significant bugs, especially when you're talking about concurrent modification of state across threads.

I'll take correct and need to improve performance over incorrect and fast where the cost of tracking down incorrect concurrent code is so extremely high, let alone dangerous for actual data being stored.

> This is a funny comment. You are implying that performance is of higher value than correctness.

Of course it is. Tony Hoare noticed it as far back as 1993: given a safe program and a fast program, people would always choose the fast one. Correctness in a mathematical sense does not always map to correctness in the business sense; it's sometimes much more cost-effective to reboot a computer every day and not free any memory than try to be memory-correct which will cost at least a few thousand dollars more in employee time.

In a case where it just crashes, that's probably a reasonable tradeoff.

What really bothers me though, is that you might actually store incorrect data somewhere. That could have hugely negative implications for the business.

Let's not equivocate here. Threads don't exist merely as a fun exercise to introduce more interesting bugs. They are for performance. If you don't care about that I may advise to stay away from threads.

> This is a funny comment.

Funny would be an understatement.

So much nicer to change than the type and chase compiler errors though. Reading the code and just being real smart is tough to get right.

Obviously it is possible to write threaded C++ programs. And there are experienced people like you who can do it well, with enough discipline, good design, and so on. But I think the point of the article is that for most programmers it is very easy to make mistakes and shoot themselves in the foot with threads and shared memory. It could be even something like using a 3rd party library where it's initialization context can't be shared but a mistake was made and it did end up being shared by accident, say passed to some workers threads.

> What leads to problems in threaded C++ programs is unprotected, shared state.

That's one of the main problem, but I don't think people start with saying "we'll just have this unprotected shared state and hope for the best", that shared state ends up being shared by accident or as a bug. I've seen enough of those and they not fun to debug (hardware environment is slightly different, say cache sizes are bit off, to make it more likely to happen at customer's site for example, on Wednesday evening at 9pm but never during QA testing).

Other things I've seen bugs in is mixing non-blocking (select / epoll / etc) based callbacks with threads. Pretty easy to get tangled there. Throw signal handlers in and now it is very easy to end up with a spaghetti mess.

Even worse, there is a difference between "we'll start with a clean, sane threaded design from the start" vs "we'll add a bit of threading here in this corner for extra performance". That second case is much worse and can result in subtle and tricky bugs. Sometimes it is not easy to determine if the code is re-entrant in a large code-base.

Interestingly and kind of tongue in cheek someone (I think Joe Armstrong, but I maybe wrong) said to try and think about your programming environment as an operating system. It is 2017, most sane operating systems have processes with isolated heaps, startup supervision (so services can be started / stopped as groups), preemption based concurrency (processes don't have to explicitly yield) and so on. So everyone agrees that's sane and normal and say putting their latest production release on a Windows 3.1 would not be a good idea. Why do we then do it with our programming environment? A bunch of C++ threads sharing memory are bit like that Windows 3.1 environment where the word processor crashes because the calculator or a game overwrote its memory.

> So everyone agrees that's sane and normal and say putting their latest production release on a Windows 3.1 would not be a good idea. Why do we then do it with our programming environment?

Because people, convincing devs to adopt new ways is an uphill battle unless they have tried it themselves.

Joe Duffy's keynote at Rust Conf just went live, and one of the reasons of Midori's adoption failure was convincing Windows kernel devs of better ways of programming, in spite of them having Midori running in front of them.

I think the OP's attitude is still correct. Threaded programs are hard. It takes experience, discipline, and good design. There is no band-aid for that. The technology is helping with things like local storage, built in actor models, borrow checkers, etc. but those are only helping the design aspect. It still takes discipline, like knowing to not mix up async callbacks with/across threads. You are basically writing your own little kernel, and writing a kernel is hard, but once you know the rules it can be done.

I think rdtsc's point was that writing your own kernel in the present day is a terrible mistake unless you genuinely need something not available otherwise.

> So everyone agrees that's sane and normal and say putting their latest production release on a Windows 3.1 would not be a good idea.

And yet unikernels are here and used.

thank you

I did it for many years. It is not for the weak. The hellgrind plug-in for valgrind saved my ass big time here and there. Without threads a program needs to have zero bugs, with threads you need a negative bug count. You will get that joke when you try to add threads to a normally bug free program and suddenly you discover the rest of the real bugs you didn't know yet.

> lack of thread local storage is not an issue

We've had it since C++11. http://en.cppreference.com/w/cpp/language/storage_duration

I think android ndk still barfs on c++11 tls and the earlier language extensions that came before it like __thread. It's not uncommon to be working with language implementations that are missing things like that. (I remember MS's version of it produced binaries that would not run on XP - maybe less important today but I had a run-in with that circa 2009.)

I don't know about TLS, but NDK is already using clang 5.0 and my code is C++14.

Yes I know they have recent clang. But does tls work? Last I knew, no.

Note clang is not the only variable here. You need support from the dynamic linker.

Edit: some googling suggests they may have added this support last year with some caveats.

Really? That's disappointing. Good old bad old NDK strikes again. I really don't understand why Google doesn't seem interested in fixing it.

They are fixing it, but NDK team seems to be a very small team, from the commits and gitHub issues.


You can already use clang 5.0 on the NDK, so even early C++17 support is possible. I don't know about TLS support.

The NDK is only there for high performance graphics (vulkan), realtime áudio, SIMD and bringing native libraries from other platforms.

So apparently their motivation is that devs should touch the NDK as little as possible.

Yeah, it just seems like they have the resources to support a bigger NDK team if they cared to do so.

There plenty of apps that could really use a well-supported NDK. Games and audio apps for a start.

Google released ARCore last week.

"ARCore works with Java/OpenGL, Unity and Unreal and focuses on three things:"


Which makes quite clear that even in AR, the role of the NDK is to support Java, Unity and Unreal applications, not to be used alone.

I am fine with it, given the security issues with the native code so we should actually minimize its use, I just would like they would provide better tooling to call framework APIs instead of forcing everyone to write JNI wrappers by themselves.

As a counter-example, they also just announced AAudio, and that's a C API: https://developer.android.com/ndk/guides/audio/aaudio/aaudio...

So they are still using C as a lingua franca for some low-level parts of the system. But without putting much effort into improving the C toolchain.

I'm not very familiar with Unreal but I understand it's a C++ library, so it seems like that would benefit from a solid C++ toolchain too.

The thing I see glanced over a lot in multithreading is the actual implimentation of messaging people are talking about. What is being used to actually just 'send' messages to the main thread while the sender thread continues on? I know qt has slots and signals, but I haven't found a good native method that does something like that if I wanted to for example use an std::future and also receive messages back that can update a gui.

If you're using a GUI toolkit like Qt or WxWidgets, then you have to use whatever event system the toolkit provides, since it controls the main thread's event loop. For WxWidgets for instance the doc is here: http://docs.wxwidgets.org/trunk/overview_thread.html

I worked on a C++ trading app in the past and our approach was "services" (actors) talking via message passing. Basically a bunch of long-lived threads, each with a queue (nowadays that could be something like boost::lockfree::queue [1]), running an event loop. Message passing worked by getting a pointer to the target service's queue via a service registry (singleton initialized at startup, you could also keep a copy per service though), then pushing a message to it. If you needed to do something with the result, then that would come back via another message sent via the same system.

It does mean if there's back-and-forth between services the code would be harder to follow as you'd have to read through a few event handlers to get the whole flow, but with well-designed service boundaries it worked quite well. I don't think futures would have been useful in our context as blocking any service for an extended time would have been very bad.

[1] If you don't want boost, you can wrap std::queue yourself, this is a simple example where you can replace the boost mutex etc. with std ones: https://www.justsoftwaresolutions.co.uk/threading/implementi...

There isn't a good, native method. I always begin by implementing a thread safe producer/consumer queue with non-blocking peek, which checks to see if there's something at the head of the queue. You can use something like this for other threads to block on, or to poll, etc. It's very useful.

Futures are somewhat useful if you spawn off a thread to do a single task and wait for it to finish, however, on many platforms, the thread creation overhead is high enough that you don't want to do that, you want to have existing worker threads being given work to do.

Once again, this is something Go does really well. Go routines are almost free, so you can use them with a futures model very easily.

I'm not 100% sure what's meant by "native" method in this context, but if you're programming for macOS/iOS, try Grand Central Dispatch. It's a very nice task parallelism library with some extremely smart design choices.

If you use dpdk it has some of the fastest (if not the fastest) ring implementation available.

In case anyone else is wondering a "ring" is a queue data structure which provides a different set of trade-offs. The documentation specifically compares it to a linked list:

    The advantages of this data structure over a linked list queue are as follows:

    * Faster; only requires a single Compare-And-Swap instruction of sizeof(void *) instead of several double-Compare-And-Swap instructions.
    * Simpler than a full lockless queue.
    * Adapted to bulk enqueue/dequeue operations. As pointers  are stored in a table, a dequeue of several objects will not produce as many cache misses as in a linked queue. Also, a bulk dequeue of many objects does not cost more than a dequeue of a simple object.

    The disadvantages:

    * Size is fixed
    * Having many rings costs more in terms of memory than a linked list queue. An empty ring contains at least N pointers.


technique wise, it's not too hard, but it only takes one little screw up to bring the whole thing crashing down. Back in an older version of MS C++ compiler/library their std::string had this bug where strings weren't thread safe as they relied on global state. That took a while to find.

So, functional programming then?

I'm tired of thinking this in my head but

- skimming through doug lea concurrency articles: FP

- thread safe C: pure functions, shared nothing: FP

- comment above: FP

at what point will people start to call it what it is ?

Because thinking that functional programming is going to solve all your problems is simple, easy, and wrong. Anyone who has written high performance software knows this. First, garbage collection without locks is extremely exotic, so likely all your memory allocations and deallocations will lock and with typical functional programming you will be doing a lot of them.

Then you can get into what you mean by FP - if you share nothing then your communication is done by copying. This is not a magic bullet and isn't an option for many scenarios. If you do shared state for join parallelism you can cover other scenarios, but now you are sharing data.

Atomics are very fast and work very well when they line up with the problem at hand. Then again, you are creating some sort of data structure that is made to have its state shared.

If the problem was so easy to solve, it wouldn't be nearly as much a problem. Handwaving with 'just use FP' is naive and is more of a way for people to feel that they have the answer should anyone ask the question, but reality will quickly catch up.

What about immutability and shared structure. You can stream your computation in a way to avoid copying, avoid locking (unless you have to synchronize on a change). Persistent DS are rarely mentioned, I only know Demaine's course.

> What about immutability and shared structure > You can stream your computation in a way to avoid copying

Where is the synchronization in this scenario? You either have to decide how to split up the read only memory to different threads (fork join) or you have one thread make copies of pieces and 'send' them to other threads somehow. Arguably these are the same thing. This is one technique, but again, it doesn't cover every scenario.

I don't know if calling it 'immutability' changes anything.

> avoid locking (unless you have to synchronize on a change)

Synchronizing on changes is the whole problem, you can't just hand wave it away as if it is a niche scenario. Anyone can create a program that has threads read memory and do computations. If you can modify the memory in place with no overlap between threads, even better. These however are the real niche scenarios, because the threads eventually need to do something with their results whether it's sending to video memory, writing to disk, or preparing data for another iteration or state in the pipeline. Then you have synchronization and that's the whole issue.

What I meant is that a FP language could go its own way and let the side effects happen.. well on the side. Sure you will have to communicate computations to other systems but you can have these part clearly segregated and synchronized.

I must confess, I have no experience there, it's just years of reading about and writing functional code and seeing a potential trail here.

How exactly does functional programming solve this problem though? How does it differ from imperative programming?

Yeah, immutability was something that got kind of glossed over in that article (I'm the author). And Functional Programming was a lot less mainstream then. Maybe I should do an update.

Well you're excused, 2006 was a really different world. FP was still alien and multithreading probably a lot less mainstream.

That said if you have new thoughts on the subject, please write them :)

Yup, basically.

It's not exactly functional programming. It can be a traditional OOP, just structured in independent blocks, and you execute each block on different threads. But the block internally can have as much shared state as it wants.

You get the simplicity and familiarity of OOP and most of the benefits of multi-threading.

I think the key words are "reentrant function" which in the context means atomic, pure functions. He didn't say something about OO, like DI or multiple inheritance in a some clean way, or something about classes or structs. This is more FP than not. By far.

I am not as experienced as you are and I don't even know the term re-entrant functions but accidentally this is how I am now designing programs, mostly writing self contained functions that don't alter any global state. This approach has made my programming life hell because I am finding it incredibly hard to get rid of the old shared variables habit. What I have experienced is that shared data access is more or less inevitable and I design to restrict the shared data access at the database.

Re-entrant is basically what you said, self-contained functions that don't depend on external state. The "re-entrant" part just means that, basically, the function can be "re-entered." If it gets interrupted in the middle of execution, it will still produce the same output no matter what, no matter how long it sleeps for.

No, that's not what re-entrant means. A re-entrant function can be called multiple times at the same time.

That could be from different threads or from an interrupt handler. It can even come up in single-threaded code: you call function A() which internally calls B() which results in a nested call to A(). If A is re-entrant that's safe.

a lot of the time you can restrict shared data access to at least being read-only, which helps a lot.

In my limited experience of the single- to multi-threading circle of hell, the function calls performed outside of an effective form of closure are a bigger problem than globals / statics / singletons. Whenever an object is accessed concurrently by multiple threads, its state becomes a potential problem. Some programs may contain so many pathological dependencies that they cannot be multithreaded without introducing so much mutual exclusion that they cannot perform better than the single-threaded original, even if the task they implement is a candidate for multithreading. Putting those cases aside, the hardest part of conversion is probably that of understanding the existing code well enough to be able to find and analyze all the operations that cut across the thread boundaries created by the conversion, which are aspects of the dynamic behavior of the program.

I agree with pretty much everything you say!

I’ve been taking single threaded code in JavaScriptCore and making it thread-safe for a while now and it’s challenging but I wouldn’t call it hell. I used to think this was harder than it turns out to be.

Sounds a lot like Erlang BEAM

this comment reeks of experience, expertise and pragmatism. thank you! -- programming for 30 years

Got a repo with some example code?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact