My guess is that the average mobile CPU speed makes decoding a minor time factor within the process of getting an image from server to the clients screen.
If this is the case, then putting in a hardware decoder would be a needless expense.
I used to own the jpeg decoder at Mozilla, and this was certainly the case last time I looked at it. Even without SIMD, JPEG decoding is pretty cheap compared to most of the things the browser does. I certainly wasn't clamoring for someone to burn JPEG into silicon; I had other things keeping me up at night.
Strange: JPG is, when compressed, order of times smaller than uncompressed, so if the graphic card would accept the compressed image, the amount of used memory bandwidth can significantly decrease. I also believe that the graphic card can achieve more parallelisation and therefore handle much bigger JPG images, which we produce all the time with always bigger cameras.
I believe that some browsers actually keep in RAM all the images uncompressed, that can acually be the reason that makes the potential for optimizations less obvious.
> I also believe that the graphic card can achieve more parallelisation and therefore handle much bigger JPG images
Very large images are fairly rare on the web. In particular phones don't have giant displays, so it's usually a waste of bandwidth to send a huge picture to a phone as part of a webpage. And, speaking only for mobile Firefox at the time I worked on it, we had lots of bigger problems with large images than decoding speed. For example, Firefox would keep the whole decoded image in memory while it was onscreen, even if it was only displaying only part of the image or only a downsampled version of the image. If it's a giant image, the decoded image could use up a large fraction of the device's memory.
I'm also not aware of any major phones with have GPGPUs -- I'm not an expert in this stuff, but I think you'd need that to get any serious acceleration from JPEG decoding.
When transistor size goes down, accelerators which are idle most of the time become cheaper. Your power budget doesn't allow you to run all the chip's parts at the same time anyway.
Maybe it has something to do with the browsers displaying the images progressively as they are loaded. Maybe the hardware decoders don't support that? That's guessed.