Hacker News new | comments | show | ask | jobs | submit login
Implementing a lightweight task scheduler in C++ (enkisoftware.com)
47 points by dougbinks on Aug 22, 2015 | hide | past | web | favorite | 21 comments



Using "volatile" for synchronization is wrong and won't work in some architectures and compilers. MSVC does guarantee that in x86 by default (/volatile:ms), but any other compiler is free to do whatever it wants with that code.

Suggest either std::atomic or intrinsics, but don't give false information in a tutorial.


And in fact, VC's volatile behavior provides only acquire-release, not full sequential consistency. SC is required for the correctness of some algorithms, and it's what programmers intuitively expect, so <atomic> provides it by default.


volatile is a pre-requisite for implementing atomic types, as used in the post and enkiTS. It is not on its own sufficient however, and so I use both atomic intrinsics and memory barriers where appropriate. This is mentioned in the article, though obviously I need to emphasize the role of volatile more.


For those interested in the implementation of atomic operations see Jeff Preshing's "How Ubisoft Develops Games for Multicore - Before and After C++11", especially after 15m55s in on the video, or slide 37 of the pdf on github.

Video: https://www.youtube.com/watch?v=X1T3IQ4N-3g&feature=youtu.be...

PDF: https://github.com/CppCon/CppCon2014/tree/master/Presentatio...


Do you know which platforms or compilers /volatile won't work?


Volatile works correctly on all platforms.

Don't use it as a synchronization primitive, that's not what it's meant to do. You only should use it:

a. If you're reading from HW.

b. If you want to use it like a const marker for member functions.


Here are some other places where volatile is appropriate:

1. Reading or writing shared memory

2. Values which may be modified in signal handlers

3. Implementing atomic types (e.g. boost::atomic)

4. RCU (e.g. ACCESS_ONCE in Linux kernel)


Pretty sure the lack of ordering(memory and execution) semantics for volatile invalidates all of those cases.

Really you should be using the appropriate platform memory and execution barriers in place of volatile 99% of the time.


Barriers are for ordering as you say, which is important, but orthogonal to volatile. Volatile is about enforcing that a particular access occurs.

An illustration:

    static volatile sig_atomic_t interrupted = 0;
    void sigint_handler(int s) { interrupted = 1; }

    void run()
    {
        signal(SIGINT, sigint_handler);
        for (;;)
        {
           if (interrupted) break;
           stuff();
        }
    }
"volatile" is the most precise way to prevent the compiler from hoisting the 'interrupted' access out of the loop. Memory barriers aren't necessary, because the code is single threaded. They may effectively defeat the optimization too, but it's by confusing the compiler instead of informing it.

edit: Oh, and std::atomic is a very bad solution to the above. std::atomic may take locks, which can lead to a deadlock if used in a signal handler!


The way I think of it is that volatile is for making sure the instructions are there and in the right order in the instruction stream. You may need to do other things to get around the processor's side of optimizations like store queues and caches depending on the context.


Thanks for responding here - this is correct. In the case of implementing atomic types (the use case for enkiTS) volatile is a prerequisite but not on its own sufficient - hence I use atomic intrinsics and memory barriers where appropriate. The repository on github notes that I've only implemented these for Windows, Linux and OSX on Intel x86 / x64, but a C++11 branch exists for those who want.


> Reading or writing shared memory

Yeah I was bit by that when writing a shared memory low latency library. When compiled under debug it worked, in release mode values were not being written to the shared memory area. Added some "volatile"-s in there and it worked.


And to be exceedingly pedantic, a signal handler should only modify a 'volatile sig_atomic_t' variable. I assume this is just typedef'd to a machine word on all typical POSIX platforms though.


/volatile the flag is only supported by MSVC as far as I'm aware.

volatile the keyword was not written with the intent of being a memory barrier (edit: or atomicity for that matter) in mind.

In practice, I've created unit tests to catch and fix issues on mobile (specifically iOS - ARM arch, under both GCC and later Clang compilers IIRC) with our "lock free" circular buffer.

It was indeed lock free. And also just plain broken in multi-core scenarios.


Putting a developer blog post out on a Saturday just before you head off for the night (I'm on European time) is about as irresponsible as lockless multithreaded programming without a code review. So please feel very free to comment here, on the blog or to @dougbinks on twitter and I'll get back to you asap!


That's an original response haha.


Would be nice to see, in contrast, a C implementation.


The task library enkiTS has a C interface, though this simply wraps the C++ implementation. Writing a C implementation from that interface would be pretty much the same, however the virtual function and inheritance would be replaced by a function pointer and plain struct.


Just use the cilkplus stuff? edit: thanks for the downvotes, but please explain why you wouldn't use cilkplus for this stuff? You do realize it's built into gcc and available for the other compilers?


Cilkplus is a valid approach if you are interested in using a task and data parallel language extension available on Intel's compiler, newer variants of gcc and Intel's fork of clang. As mentioned in the article a more comparable library for standard C++ is Intel's Threaded Building Blocks.

I used to work at Intel and was a technical lead for games developer architecture feedback and ran the multicore gaming initiative for a while. The feedback on cilkplus was that it didn't allow sufficient control, and as a non-standard language feature posed issues for porting to some systems. Intel's TBB however has been used by some games developers, and its API has evolved features in response to game developer feedback. Most games developers however prefer to roll their own.

Note that I didn't downvote this, but if you wanted to implement a task scheduler (the subject of the post) using cilkplus wouldn't be an obvious starting point.


Perhaps you're being a bit dismissive, but Cilk is an option people should explore. For those interested in how such things work, there are extensive papers from the Cilk group at MIT: http://supertech.csail.mit.edu/papers.html

In particular, I recommend "Cilk: An Efficient Multithreaded Runtime System" (http://supertech.csail.mit.edu/papers/PPoPP95.pdf) and "The Implementation of the Cilk-5 Multithreaded Language" (http://supertech.csail.mit.edu/papers/cilk5.pdf).




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: