Talking to some people here, they think your DSSIM tool  isn't what I should use. Specifically, they said it runs blur and downscale steps aren't part of the SSIM metric. They suggested using Mehdi's C++ implementation , which I understand yours is a rewrite of.
Presumably you think I should use your tool instead? What makes the (D)SSIM numbers from yours a better match for human perception than those from Mehdi's? Or should they be giving the same numbers?
I have two issues with Mehdi's implementation:
* It works on raw RGB data, which is a poor model for measuring perceptual difference (e.g. black to green range is very sensitive, but green to cyan is almost indistinguishable, but numerically they're the same in RGB). Some benchmarks solve that by testing grayscale only, but that allows encoders to cheat by encoding color as poorly as they want to.
* It's based on OpenCV and when I tested it I found OpenCV didn't apply gamma correction. This makes huge difference on images with noisy dark areas (and photos have plenty of it underexposed areas). Maybe it's a matter of OpenCV version or settings — you can verify this by dumping `ssim_map` and seeing if it shows high difference in dark areas that look fine to you on a well-calibrated monitor.
I've tried to fix those issues by using gamma-corrected Lab colorspace and include score from color channels, but tested at lower resolution (since eye is much more sensitive to luminance).
However, I have tested my tool against TID2008 database and got overall score lower than expected for SSIM (0.73 instead of 0.80), but still better than most other tools they've tested.