Nicely written article - clarified some points for me.

Isn't the wait for the unordered collection to be "finalized" a huge bottleneck, though. I can see this working nicely for low level parallelisation but not so well for more traditional producer/consumer threads or I/O operations.

