
Implementing a lightweight task scheduler in C++ - dougbinks
http://www.enkisoftware.com/devlogpost-20150822-1-Implementing_a_lightweight_task_scheduler.html
======
Matheus28
Using "volatile" for synchronization is wrong and won't work in some
architectures and compilers. MSVC does guarantee that in x86 by default
(/volatile:ms), but any other compiler is free to do whatever it wants with
that code.

Suggest either std::atomic or intrinsics, but don't give false information in
a tutorial.

~~~
danieltillett
Do you know which platforms or compilers /volatile won't work?

~~~
vvanders
Volatile works correctly on all platforms.

Don't use it as a synchronization primitive, that's not what it's meant to do.
You only should use it:

a. If you're reading from HW.

b. If you want to use it like a const marker for member functions.

~~~
ridiculous_fish
Here are some other places where volatile is appropriate:

1\. Reading or writing shared memory

2\. Values which may be modified in signal handlers

3\. Implementing atomic types (e.g. boost::atomic)

4\. RCU (e.g. ACCESS_ONCE in Linux kernel)

~~~
vvanders
Pretty sure the lack of ordering(memory and execution) semantics for volatile
invalidates all of those cases.

Really you should be using the appropriate platform memory and execution
barriers in place of volatile 99% of the time.

~~~
ridiculous_fish
Barriers are for ordering as you say, which is important, but orthogonal to
volatile. Volatile is about enforcing that a particular access occurs.

An illustration:

    
    
        static volatile sig_atomic_t interrupted = 0;
        void sigint_handler(int s) { interrupted = 1; }
    
        void run()
        {
            signal(SIGINT, sigint_handler);
            for (;;)
            {
               if (interrupted) break;
               stuff();
            }
        }
    

"volatile" is the most precise way to prevent the compiler from hoisting the
'interrupted' access out of the loop. Memory barriers aren't necessary,
because the code is single threaded. They may effectively defeat the
optimization too, but it's by confusing the compiler instead of informing it.

 _edit_ : Oh, and std::atomic is a very bad solution to the above. std::atomic
may take locks, which can lead to a deadlock if used in a signal handler!

~~~
monocasa
The way I think of it is that volatile is for making sure the instructions are
there and in the right order in the instruction stream. You may need to do
other things to get around the processor's side of optimizations like store
queues and caches depending on the context.

------
dougbinks
Putting a developer blog post out on a Saturday just before you head off for
the night (I'm on European time) is about as irresponsible as lockless
multithreaded programming without a code review. So please feel very free to
comment here, on the blog or to @dougbinks on twitter and I'll get back to you
asap!

~~~
nickpsecurity
That's an original response haha.

------
Keyframe
Would be nice to see, in contrast, a C implementation.

~~~
dougbinks
The task library enkiTS has a C interface, though this simply wraps the C++
implementation. Writing a C implementation from that interface would be pretty
much the same, however the virtual function and inheritance would be replaced
by a function pointer and plain struct.

------
a8da6b0c91d
Just use the cilkplus stuff? edit: thanks for the downvotes, but please
explain why you wouldn't use cilkplus for this stuff? You do realize it's
built into gcc and available for the other compilers?

~~~
dougbinks
Cilkplus is a valid approach if you are interested in using a task and data
parallel language extension available on Intel's compiler, newer variants of
gcc and Intel's fork of clang. As mentioned in the article a more comparable
library for standard C++ is Intel's Threaded Building Blocks.

I used to work at Intel and was a technical lead for games developer
architecture feedback and ran the multicore gaming initiative for a while. The
feedback on cilkplus was that it didn't allow sufficient control, and as a
non-standard language feature posed issues for porting to some systems.
Intel's TBB however has been used by some games developers, and its API has
evolved features in response to game developer feedback. Most games developers
however prefer to roll their own.

Note that I didn't downvote this, but if you wanted to implement a task
scheduler (the subject of the post) using cilkplus wouldn't be an obvious
starting point.

