
Optimized Go gzip/zip packages, 30-50% faster - sajal83
http://blog.klauspost.com/optimized-gzipzip-packages-30-50-faster/
======
andrewstuart2
Is there any reason you're not approaching the team for inclusion into stdlib
(for more than just the crc32 bit)? If it's really drop-in and passes all
tests then I can imagine it would be an easy sell.

[https://groups.google.com/forum/#!forum/golang-
dev](https://groups.google.com/forum/#!forum/golang-dev)

------
jasode
I'm not criticizing the programmer's effort but whenever I see a new project
as a "drop in replacement" and/or "50% to 3x faster", the first thing I always
look for is the _test data_ to verify that new bugs weren't introduced.

I went to the test subfolder:

[https://github.com/klauspost/compress/tree/master/testdata](https://github.com/klauspost/compress/tree/master/testdata)

I saw 3 files.

The kind of test data I'd like to see for something like a compression library
would be:

0 byte files

1 byte files

2,147,483,647 byte file

2,147,483,648 byte file

2,147,483,649 byte file

(to check for bugs around signed 32-bit integers)

4,294,967,295 byte file

4,294,967,296 ...

4,294,967,297 ...

(to check for bugs around unsigned 32-bit integers and see if 64bit was used
when necessary for correctness)

Also add files of sizes that straddle the boundary of algorithm's internal
block buffers (e.g. 64kb or whatever). Add permutations of the above files
filled with all zeros 0x00 and all ones 0xFF. I'm sure I'm forgetting a bunch
of other edge cases.

The programmer may have done a wonderful job and all the code may be 100%
correct. Unfortunately, I can't trust a library replacement unless I also
trust the test bed of data that was used to check for defects. It's very
common for performance optimizations to introduce new bugs so there has to be
an extensive suite of regression tests to help detect them. Test data for bug
detection has a different purpose than benchmark data showing speed
improvements.

Those multi-gigabyte files are not git repository friendly so perhaps a
compromise would be a small utility program to generate the test files as
necessary.

~~~
pdq
I'm in agreement he should use more test cases, but it looks like he used the
same as Golang's compression tests [1]. Note he has other test cases in the
subdirectories (also the same as Go) [2].

However I believe even better testing would be to fuzz it for a few hours [3].

[1]
[http://golang.org/src/compress/testdata/](http://golang.org/src/compress/testdata/)

[2]
[https://github.com/klauspost/compress/tree/master/zip/testda...](https://github.com/klauspost/compress/tree/master/zip/testdata)

[3] [https://github.com/dvyukov/go-fuzz](https://github.com/dvyukov/go-fuzz)

------
tosseraccount
Don't they really need to compare this with the baseline C implementation and
Intel-optimized and CloudFlare-optimized zlib ?

------
mc_hammer
curious what flags was the c version compiled with? optimized or non? i guess
x86 or 64 bit also?

~~~
oofabz
He didn't benchmark any C implementations, just Go.

~~~
mc_hammer
oh i thought it was versus the c implementation

thanks

