As always, be sure to performance test your code if you're getting into obfuscating your algorithms to boost speeds.
That said, there are still plenty of real gems in there. And on that note, here's one of my favorite bit-twiddling hacks:
(these functions are for RGB555, but you can easily extend these to RGB888 ops.)
Agreed that performance testing is absolutely necessary, preferrably with something that can tell you not only where exactly the processor is spending its time but also whether it's spending its time computing or waiting on memory etc, whether branches are being predicted well...
 http://link.springer.com/chapter/10.1007/978-3-540-77535-5_5 (and I deeply apologize I can't find the free PDF version, though sci-hub may help here)
Next favorite is "compress selected bits", because I figured out the inverse, which Henry Warren then published as "expand" in the 2nd edition: http://www.hackersdelight.org/hdcodetxt/expand.c.txt
I think you'll probably enjoy it more if you do low-level programming, but there's a lot of cool mind-expanding stuff in there--the treatment of Hilbert Curves is quite fun.
For anyone interested in this kind of things, you should go read that book. It actually is a delight.
The quick and dirty version is still useful (portable, works for all values) for the case of 1/2 word size unsigned versions of min and max:
unsigned max(unsigned short a, unsigned short b)
unsigned q = a - b;
unsigned r = ((~q) >> 16);
return r + b;