(Here is a plug for our C version: https://github.com/redjack/varon-t).
Our implementation competes with boost::lockfree::queue and it's much faster since Boost implementation uses more heavy synchronization techniques. The benchmark on GitHub also has Boost implementation, so you can compare both the queues.
Unfortunately, there is no adequate algorithm description for disruptor queue, only some indistinct descriptions mostly suitable for business people rather than engineers. So it was not easy to dig it's source code. However, I learned it and there are some notes about the implementation.
The implementation is bit inaccurate: there are a lot of branches without branch prediction information available at compile time, to avoid cache line bouncing it wastes two cache lines instead of simple align an item on cache line (I mean vrt_padded_int). I didn't pay too much attention to memory barriers usage, but giving that X86-64 provides relatively strict memory ordering, probably some of them also could be eliminated. Disruptor uses very cleaver ideas, but I believe its performance can be improved after good code review.
One more point is that while our queue implementation is C++, it's still self sufficient and can be easily ported to C for using in kernel space. It's doubtful (in my humble opinion) that generic container depends on non-standard libcork and moreover logging library (clogger).