There is in fact a very high correlation between CPU cycles and energy efficiency, since compression algorithms don't sit idle and use roughly the same instructions. In fact, Yann Collet's Zstd uses the same principles as LZFSE, as both were sprouted from Jarek Duda's research: http://arxiv.org/abs/1311.2540.
The reference LZ4 implementation is absolutely more energy efficient than LZFSE, and in fact Apple does push for its use by offering it in its compression library. However, it tends not to compress as well as both LZFSE and Zstd. For 4G or WiFi (or even broadband), the time lost by transferring more data is not compensated by the time won by decompressing it faster, resulting in much slower downloads than even zlib. LZ4 is still relevant for higher speeds, such as those offered by magnetic hard drives. (Beyond a certain speed, such as for SSDs, compression no longer offers a benefit, but you might be ok with the slowdown given that you win drive space.)
There is a separate discussion to be had about the fact that the open-sourced LZFSE reference implementation is not the one they use (which explains how little they touched it since), as it does not even have ARM-specific code. Also, LZFSE does not claim to be patent-unencumbered. LZ4 and Zstd do have optimized code for ARM.
All in all, it is not a stretch to assume that Apple benefits from this FUD, which explains why there is no comparative benchmark anywhere to be found on their GitHub or in their documentation. It really looks like Zstd is better all around.
I think once Zstd gets a bit more mainstream the best they would do is add it as an option.