When designing C++ code for threading, you have to keep a few things in mind, but it's not particularly harder than on other languages with thread support. You generally want to create re-entrant functions which pass all of their state as input, and return all results as output, without modifying ant kind of global state. You can run any number of these in parallel without thread safety issues, or locks. When using multiple threads with shared data, you have to protect it with locks, but I generally use a single thread to access data of this sort which produces work for other threads to do, which can be queued or distributed any other way. Producer/consumer queues show up a lot in good designs, and your single-threaded choke points are usually a producer or a consumer, which allows thread fan-in or fan-out. (As a side note, Go really got this right with channels).
What leads to problems in threaded C++ programs is unprotected, shared state. Some C++ design patterns make life particularly hard, such as singletons, which are just a funny name for global variables. C++ gives you lots of ways to make something global accidentally - static member variables, arguments by reference, for instance.
True C++ hell is taking a single threaded C++ program and making it multi-threaded after the fact, but that's hell in any language which allows you to have globals of any sort and doesn't perform function calls within some kind of closure.
The article in this post is being a little reductionist. You don't need to pick a threading model - some of your threads can pass messages, others can use shared data, all in the same program. The lack of thread local storage is not an issue, because you partition the data so that no two threads are working on the same thing at the same time.
I've written a lot of multithreaded C++, I did quite a lot back starting about 1994 for around 10+ years, a lot of middle-tier application code on Windows NT in particular (back in the day when it was all 3-tier architectures - now we'd call them services) - it's totally fine if you know what you are doing.
Work is usually organised in mutex-protected queues, worker threads consume from these queues, results placed a protected data structure, and receivers poll and sleep, waiting for the result.
Other tricks to remember are to establish a hierarchy of mutexes - if you have to take several locks, they must be done in order, and unlocked in reverse order, this should guarantee an absence of deadlocks. A second trick - a way to guarantee locks are released and bugs of non-release of mutexes do not occur, as well as the correct order of releases, is to strictly follow an RAII pattern, where destructors of stack-based lock objects, unlock your mutexes as you exit stack frames.
Of course, in later periods, you started to see formal recognition of these design patterns, in Java and C# libraries which had WorkerPools, Java and it's lock() primitive, but these design patterns were prevalent in my code at the time, because it was the only obvious and simple way to use multi-threading in a conceptually simple manner. KISS...
Nothing particularly hellish about any of it - but I remember it was not a development task for all developers, and without common libraries in the period (this is pre-STL), you had to work a lot of it out for oneself.
I do remember in the period you would get grandiose commentary from some public developers who would proclaim such things as, "it is impossible to have confidence in/write a bug-free threaded program."
I always felt that said more about the developer than multithreading though.
After using it for a while, it's amazing how much global variables tend to creep unannounced into code :-)
Are foreign calls assumed to be non-pure? Can they be marked pure in the case that the foreign (likely C) function doesn't reference global mutable state?
> Can they be marked pure in the case that the foreign (likely C) function doesn't reference global mutable state?
Yes. Here's an example:
It's true that the C Standard does not actually guarantee that they don't access mutable global state, but in practice they don't. We're not aware of one that does, and don't know why anyone would write one that does.
That's not true. In Rust you can have globals, but you have to declare the synchronization semantics for every global when you do. (That is, they have to be read only, or thread local, or atomic, or protected by a lock.)
This property makes it very easy to take a single-threaded Rust program and make it multithreaded later, and in fact this is how most Rust programs end up being parallelized in my experience.
That doesn't sound like it helps you determine which one of those is appropriate. So it seems rather like a system that will lead to unexpected bottlenecks when you parallelize. (Somebody deep inside the stack arbitrarily decided locking was the most appropriate - now you've got a contended lock or a potential lock ordering bug later on.)
Granted it does seem like it allows for more reasonable defaults than a default-unsafety policy.
We have experience with this. For a long time, WebRender rasterized glyphs and other resources sequentially. Switching it to parallelize was painless: Glenn just added a parallel for loop from Rayon and we got speedups.
This is a funny comment. You are implying that performance is of higher value than correctness. Speed without correctness is dangerous, and leads to significant bugs, especially when you're talking about concurrent modification of state across threads.
I'll take correct and need to improve performance over incorrect and fast where the cost of tracking down incorrect concurrent code is so extremely high, let alone dangerous for actual data being stored.
Of course it is. Tony Hoare noticed it as far back as 1993: given a safe program and a fast program, people would always choose the fast one. Correctness in a mathematical sense does not always map to correctness in the business sense; it's sometimes much more cost-effective to reboot a computer every day and not free any memory than try to be memory-correct which will cost at least a few thousand dollars more in employee time.
What really bothers me though, is that you might actually store incorrect data somewhere. That could have hugely negative implications for the business.
Funny would be an understatement.
> What leads to problems in threaded C++ programs is unprotected, shared state.
That's one of the main problem, but I don't think people start with saying "we'll just have this unprotected shared state and hope for the best", that shared state ends up being shared by accident or as a bug. I've seen enough of those and they not fun to debug (hardware environment is slightly different, say cache sizes are bit off, to make it more likely to happen at customer's site for example, on Wednesday evening at 9pm but never during QA testing).
Other things I've seen bugs in is mixing non-blocking (select / epoll / etc) based callbacks with threads. Pretty easy to get tangled there. Throw signal handlers in and now it is very easy to end up with a spaghetti mess.
Even worse, there is a difference between "we'll start with a clean, sane threaded design from the start" vs "we'll add a bit of threading here in this corner for extra performance". That second case is much worse and can result in subtle and tricky bugs. Sometimes it is not easy to determine if the code is re-entrant in a large code-base.
Interestingly and kind of tongue in cheek someone (I think Joe Armstrong, but I maybe wrong) said to try and think about your programming environment as an operating system. It is 2017, most sane operating systems have processes with isolated heaps, startup supervision (so services can be started / stopped as groups), preemption based concurrency (processes don't have to explicitly yield) and so on. So everyone agrees that's sane and normal and say putting their latest production release on a Windows 3.1 would not be a good idea. Why do we then do it with our programming environment? A bunch of C++ threads sharing memory are bit like that Windows 3.1 environment where the word processor crashes because the calculator or a game overwrote its memory.
Because people, convincing devs to adopt new ways is an uphill battle unless they have tried it themselves.
Joe Duffy's keynote at Rust Conf just went live, and one of the reasons of Midori's adoption failure was convincing Windows kernel devs of better ways of programming, in spite of them having Midori running in front of them.
And yet unikernels are here and used.
We've had it since C++11. http://en.cppreference.com/w/cpp/language/storage_duration
Note clang is not the only variable here. You need support from the dynamic linker.
Edit: some googling suggests they may have added this support last year with some caveats.
You can already use clang 5.0 on the NDK, so even early C++17 support is possible. I don't know about TLS support.
The NDK is only there for high performance graphics (vulkan), realtime áudio, SIMD and bringing native libraries from other platforms.
So apparently their motivation is that devs should touch the NDK as little as possible.
There plenty of apps that could really use a well-supported NDK. Games and audio apps for a start.
"ARCore works with Java/OpenGL, Unity and Unreal and focuses on three things:"
Which makes quite clear that even in AR, the role of the NDK is to support Java, Unity and Unreal applications, not to be used alone.
I am fine with it, given the security issues with the native code so we should actually minimize its use, I just would like they would provide better tooling to call framework APIs instead of forcing everyone to write JNI wrappers by themselves.
So they are still using C as a lingua franca for some low-level parts of the system. But without putting much effort into improving the C toolchain.
I'm not very familiar with Unreal but I understand it's a C++ library, so it seems like that would benefit from a solid C++ toolchain too.
I worked on a C++ trading app in the past and our approach was "services" (actors) talking via message passing. Basically a bunch of long-lived threads, each with a queue (nowadays that could be something like boost::lockfree::queue ), running an event loop. Message passing worked by getting a pointer to the target service's queue via a service registry (singleton initialized at startup, you could also keep a copy per service though), then pushing a message to it. If you needed to do something with the result, then that would come back via another message sent via the same system.
It does mean if there's back-and-forth between services the code would be harder to follow as you'd have to read through a few event handlers to get the whole flow, but with well-designed service boundaries it worked quite well. I don't think futures would have been useful in our context as blocking any service for an extended time would have been very bad.
 If you don't want boost, you can wrap std::queue yourself, this is a simple example where you can replace the boost mutex etc. with std ones: https://www.justsoftwaresolutions.co.uk/threading/implementi...
Futures are somewhat useful if you spawn off a thread to do a single task and wait for it to finish, however, on many platforms, the thread creation overhead is high enough that you don't want to do that, you want to have existing worker threads being given work to do.
Once again, this is something Go does really well. Go routines are almost free, so you can use them with a futures model very easily.
The advantages of this data structure over a linked list queue are as follows:
* Faster; only requires a single Compare-And-Swap instruction of sizeof(void *) instead of several double-Compare-And-Swap instructions.
* Simpler than a full lockless queue.
* Adapted to bulk enqueue/dequeue operations. As pointers are stored in a table, a dequeue of several objects will not produce as many cache misses as in a linked queue. Also, a bulk dequeue of many objects does not cost more than a dequeue of a simple object.
* Size is fixed
* Having many rings costs more in terms of memory than a linked list queue. An empty ring contains at least N pointers.
- skimming through doug lea concurrency articles: FP
- thread safe C: pure functions, shared nothing: FP
- comment above: FP
at what point will people start to call it what it is ?
Then you can get into what you mean by FP - if you share nothing then your communication is done by copying. This is not a magic bullet and isn't an option for many scenarios. If you do shared state for join parallelism you can cover other scenarios, but now you are sharing data.
Atomics are very fast and work very well when they line up with the problem at hand. Then again, you are creating some sort of data structure that is made to have its state shared.
If the problem was so easy to solve, it wouldn't be nearly as much a problem. Handwaving with 'just use FP' is naive and is more of a way for people to feel that they have the answer should anyone ask the question, but reality will quickly catch up.
Where is the synchronization in this scenario? You either have to decide how to split up the read only memory to different threads (fork join) or you have one thread make copies of pieces and 'send' them to other threads somehow. Arguably these are the same thing. This is one technique, but again, it doesn't cover every scenario.
I don't know if calling it 'immutability' changes anything.
> avoid locking (unless you have to synchronize on a change)
Synchronizing on changes is the whole problem, you can't just hand wave it away as if it is a niche scenario. Anyone can create a program that has threads read memory and do computations. If you can modify the memory in place with no overlap between threads, even better. These however are the real niche scenarios, because the threads eventually need to do something with their results whether it's sending to video memory, writing to disk, or preparing data for another iteration or state in the pipeline. Then you have synchronization and that's the whole issue.
I must confess, I have no experience there, it's just years of reading about and writing functional code and seeing a potential trail here.
That said if you have new thoughts on the subject, please write them :)
You get the simplicity and familiarity of OOP and most of the benefits of multi-threading.
That could be from different threads or from an interrupt handler. It can even come up in single-threaded code: you call function A() which internally calls B() which results in a nested call to A(). If A is re-entrant that's safe.