Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Boosting zopfli performance (roartindon.blogspot.com)
104 points by jsnell on April 19, 2016 | hide | past | favorite | 13 comments


> Now that I have a 32-bit floating point representation to compare to, I can go one step further: Since mincostaddcostj is always greater than or equal to 0, and because non-negative floats are monotonically ascending when the bits are interpreted as an integer, I can use an integer comparison to do the check!

Good points. It exactly sounds like that we can benefit from specialized "non-negative" or "non-NaN" types, either implemented in the library (with a limited operator overloading) or in the compiler level (via static range analyses).


Very nice work. I just wish that compilers did some of these things automatically. Seems like even with modern optimising compilers it's still up to the programmer to do a lot of the work manually.


That has been my experience. The conventional wisdom is often "don't worry about optimization, the compiler will do a better job of it than you," and while it is true that the compiler will do many things I wouldn't have thought to have done, my experience has been that one can still often improve upon the compiler's results through careful micro-optimization as illustrated in TFA.


Very interesting, unfortunately he doesn't say if he has submitted these improvements to the upstream developers so that other users of zopfli will benefit from them.


Hi, author of the article here.

Yes, talking to Lode (from Google) about getting the changes into the main development branch.


Is

    if (costs[j+k] - costs[j] <= mincost)
really the same as

    if (costs[j+k] <= mincost + costs[j])
even with floating point math?


My first guess is that it should be (because addition on FP gives the nearest representable value), but I'd also guess that any small differences here are irrelevant.

I wonder how many of those optimizations (e.g. maybe this exact one) would be done by compilers with -ffast-math.


This is a good question. I pondered about it myself.

I could prove that a - b < c is not the same as a < b + c (large a, a=c,, b the minimum value to cause a-b change --> would result in first being true, 2nd being false)

But I couldn't think of a failure case for a - b <= c not being the same as a <= b + c.

Would love it if someone could verify/prove this.

To further complicate it, the LHS of the code in question is a float evaluation, that is then upcast to a double to compare with the right (which is a double).


I keep a list of the most efficient lossless PNG optimizers, and Zopfli is definitely one of the best.

https://olegkikin.com/png_optimizers/


Unsurprisingly, PNGOut, ZopfliPNG and AdvanceCOMP all use non-zlib DEFLATE implementations (respectively pkzip, zopfli, and 7zip or zopfli depending on setting).

Interestingly, as far as I know OptiPNG uses the standard zlib DEFLATE and still closes in on the lead optimisers for all but picture 3.

Although it would be good to include more information about your compression parameters: while OptiPNG -o7 doesn't do anything to picture 3, a complete exhaustive search (-o7 -zm1-9) does manage to compress it down to 8724 bytes (8471 IDAT)

edit: in fact, despite the help text's claim, optipng shows a very slight improvement on the first three pictures by using the strongest most exhaustive search parameters (compared to those you list, using optipng 0.7.6):

* Picture 1 9787 bytes (51.80%) versus 9979 (50.85%)

* Picture 2 20513 bytes (10.63%) versus 20671 (9.94%)

* Picture 3 8724 bytes (10.09%) versus 9703 (0%)

Pictures 4 and 5 compress to the sizes you listed even using exhaustive searching.


For the change to reorder the additions to add int-to-int and float-to-float to remove one conversion, I wonder if the compiler would do that itself if allowed to by -ffast-math (included in -Ofast)? Normally, the compiler cannot (even with -O3) perform changes that might change exact floating-point results.

EDIT: No, looks like gcc can't do this even with -ffast-math or -Ofast. Given the following C code:

    double test1(int lbits, double lsym, int dbits, double dsym)
    {
        return lsym + lbits + dsym + dbits;
    }

    double test2(int lbits, double lsym, int dbits, double dsym)
    {
        return lbits + dbits + lsym + dsym;
    }
GCC, with either -O3 or -Ofast, with or without -ffast-math, always produces two cvtsi2sd instructions for test1 and one cvtsi2sd instruction for test2.

Filed as a GCC feature request: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70731


Hi, author of article here.

I tested -ffast-math on clang and it won't reorder the additions to omit the conversion here.

I'm actually VERY wary of -ffast-math due to a bad experience some years ago where it just totally messed up my calculations and I couldn't avoid it messing up my calculations.

Even a 1ulp change (not accounting for the round down) in the final floating point comparison changed my compression results; so I'm not entirely confident about using -ffast-math without a huge test corpus -- but turning it on does improve the run time.


I would be curious to know if profile-guided-optimisation would also have helped.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: