Testing now with this image, to bring the DSSIM-measured distortion on the JPEG up to 0.015  I need to encode it at quality=69, which gets me a file size of 24317  bytes to the 20158  I had for the WebP. This is 17.1% improvement for WebP over JPEG, as opposed to the 47.7% improvement I found before. This is all with libjpeg-turbo as included by PageSpeed 1.9.
Running mozjpeg (jpegtran) with no arguments built at commit f46c787 on this image, I get 23870 bytes with no change to the SSIM . This is a 15.6% webp-over-jpeg improvement.
It looks like:
1) We should run some timing tests on the mozjpeg encoder, and if it's in the same range as the WebP encoder or not too much worse switch PageSpeed from libjpeg-turbo to mozjpeg.
2) We should check that quality-80 with WebP is correct for getting similar levels of distortion as quality-85 with JPEG. Is this image just a poor case for WebP or is it typical and something's wrong with our defaults?
 Technically, 0.014956 with JPEG compared to 0.015060 for the WebP.
If you only ran mozjpeg's jpegtran on a file created with another JPEG library, you won't get benefit of trellis quantization. Try creating JPEGs with mozjpeg's cjpeg (and -sample 2x2 to match WebP's limitation).
Here are the files I've been testing (one is same size, one is same quality based on my DSSIM tool v0.5):
Talking to some people here, they think your DSSIM tool  isn't what I should use. Specifically, they said it runs blur and downscale steps aren't part of the SSIM metric. They suggested using Mehdi's C++ implementation , which I understand yours is a rewrite of.
Presumably you think I should use your tool instead? What makes the (D)SSIM numbers from yours a better match for human perception than those from Mehdi's? Or should they be giving the same numbers?
I have two issues with Mehdi's implementation:
* It works on raw RGB data, which is a poor model for measuring perceptual difference (e.g. black to green range is very sensitive, but green to cyan is almost indistinguishable, but numerically they're the same in RGB). Some benchmarks solve that by testing grayscale only, but that allows encoders to cheat by encoding color as poorly as they want to.
* It's based on OpenCV and when I tested it I found OpenCV didn't apply gamma correction. This makes huge difference on images with noisy dark areas (and photos have plenty of it underexposed areas). Maybe it's a matter of OpenCV version or settings — you can verify this by dumping `ssim_map` and seeing if it shows high difference in dark areas that look fine to you on a well-calibrated monitor.
I've tried to fix those issues by using gamma-corrected Lab colorspace and include score from color channels, but tested at lower resolution (since eye is much more sensitive to luminance).
However, I have tested my tool against TID2008 database and got overall score lower than expected for SSIM (0.73 instead of 0.80), but still better than most other tools they've tested.