
Lossless compression with Brotli - nuriaion
https://blogs.dropbox.com/tech/2016/06/lossless-compression-with-brotli/
======
mmastrac
Somewhat interesting for those who missed it - another rust/brotli project was
a target for afl-rust. IIRC it didn't find any memory safety issues, but it
definitely exposed a few panic crashes:

[https://github.com/frewsxcv/afl.rs](https://github.com/frewsxcv/afl.rs)
(discussion
[https://news.ycombinator.com/item?id=11936983](https://news.ycombinator.com/item?id=11936983))

I don't think they mentioned fuzzing their Brotli decompressor in this
article, but I hope they try afl-rust.

~~~
sanxiyn
Apparently this([https://github.com/dropbox/rust-
brotli](https://github.com/dropbox/rust-brotli)) is much faster than the
one([https://github.com/ende76/brotli-rs](https://github.com/ende76/brotli-
rs)) fuzzed by afl.rs. Like 10x faster. See [https://github.com/ende76/brotli-
rs/issues/24](https://github.com/ende76/brotli-rs/issues/24).

~~~
seeekr
Relevant comment from ende76 on why brotli-rs is relatively slow and what the
goals of the project are/were:

"Yes, performance has not been a focus point at all so far. I've started this
project as a a way to get familiar with Rust, and also with the Brotli spec.
For the implementation, I have made it a point to work only from the spec,
while avoiding looking at the reference implementation, because another goal
was helping to improve the Brotli spec itself. I do believe that there are a
number of places where copies could be avoided, and other places where I used
clone() simply because at the time I was unable otherwise to appease the
borrow checker.

For a performance-oriented implementation, I would probably take the test
cases and experiences so far, and start over with a new implementation."

------
AceJohnny2
On a purely compression algorithm front, the recently format-stabilized [1]
zstd [2] beats the pants down Brotli, in terms of compression speed,
decompression speed, and compression ratio. Zstd isn't some random effort
either, but a more flexible (in terms of compression time-for-ratio) effort by
the creator of the popular and insanely fast lz4 compressor[3].

[1] [http://fastcompression.blogspot.com/2016/06/zstandard-
reache...](http://fastcompression.blogspot.com/2016/06/zstandard-reaches-
final-format.html)

[2] [https://github.com/Cyan4973/zstd](https://github.com/Cyan4973/zstd)

[3] [https://github.com/Cyan4973/lz4](https://github.com/Cyan4973/lz4)

~~~
JyrkiAlakuijala
Brotli's fastest compression is slightly faster than zstd's. Zstd decompresses
faster, but neither is slow. Zstd can use sliding window size longer than 16
MB, in brotli this is limited to 16 MB to have guarantees of the maximum
resource use at decoding time. Zstd's longer sliding window helps with the
longest files (16 MB+), and often benchmarking is done with 100 MB or even 1
GB files.

Brotli compresses usually more, and quite a lot more on shorter files. Try
with cp.html, sparc sum or xargs.1. Brotli compresses these 9-19 % more
densely on the following benchmark:

[https://quixdb.github.io/squash-
benchmark/unstable/](https://quixdb.github.io/squash-benchmark/unstable/)

Note, that on this benchmark brotli is always limited to the 4 MB sliding
window. Other algorithms are run with wider windows, too. This will make
brotli seem worse on large files (4+ MB).

~~~
blank_state
> Brotli's fastest compression is slightly faster than zstd's.

Come on, this is not serious.

Brotli's fastest compression algorithm is still significantly slower than
zstd. And more importantly, it compresses _much worse_.

For a 3rd party evaluation, one can try
[TurboBench]([https://github.com/powturbo/TurboBench](https://github.com/powturbo/TurboBench))
or even
[lzbench]([https://github.com/inikep/lzbench](https://github.com/inikep/lzbench))
which are open-sourced. Squash introduces a wrapper layer with distortions
which makes it less reliable, and more complex to use and install, quite a
pity given the graphical presentation is very good. I'm interested in speed,
and in this area, all benchmarks point in the same direction : for a given
speed budget, Zstandard offers better ratio (and decompresses much faster).

~~~
JyrkiAlakuijala
TurboBench didn't compile straigth after git clone. lzbench did:

I get a test file by: wget
[https://web.archive.org/web/20151222062543/http://www.micros...](https://web.archive.org/web/20151222062543/http://www.microsoft.com/surface/th-
th/devices/surface-pro-4?ocid=OCTEVENT_MSCOM)

The test file is 267253 bytes.

$ ./lzbench -ebrotli,0,1,2,5,7,9,11/zstd,1,22 testfile

brotli 0.4.0 -0 compresses 783 MB/s and decompresses 809 MB/s

zstd 0.7.1 -1 compresses 586 MB/s and decompresses 1691 MB/s

brotli 0.4.0 -7 compresses 57 MB/s, decompresses 873 MB/s to 28185 bytes

brotli 0.4.0 -11 compresses to 25413 bytes

zstd 0.7.1 -22 compresses in 4.01 MB/s to 28363 bytes

Of course it is an unfair example because of the static dictionary that brotli
uses, but it is not a pathological example: Thai is not part of the static
dictionary. The numbers are on a i7-4790K@4.00 GHz.

Brotli's fastest compression is faster than that of zstd, at least as shown
with lzbench and this file. Also brotli wins in compression density. In this
file the win is 10.5 % less bytes for brotli -11 than for zstd -22.

~~~
blank_state
> Of course it is an unfair example because of the static dictionary that
> brotli uses

It is certainly a favorable ground for Brotli. Brotli claims an advantage in
html compression, thanks to its integrated specialized dictionary. The real pb
though is the suggested conclusion that these favorable results are broadly
applicable everywhere else. That's a terrible suggestion. We need more
examples, not just "html files" which happen to be Brotli's best case.

> brotli 0.4.0 -11 compresses to 25413 bytes > zstd 0.7.1 -22 compresses in
> 4.01 MB/s to 28363 bytes

Why you don't disclose the compression time of brotli ? Of course it does
matter : everyone understand that an algorithm that spend 10x more cpu has the
budget to compress more.

> brotli 0.4.0 -0 compresses 783 MB/s and decompresses 809 MB/s > zstd 0.7.1
> -1 compresses 586 MB/s and decompresses 1691 MB/s

Here, you don't disclose the compression ratio of both algorithm, implying
they are equal. By such standard, LZ4 is probably the best : it's so much
faster ! Of course, they do not compress the same...

I was initially thrilled at your detailed answer, but now, quite frankly, I
feel cheated. Grossly so.

This is really disappointing. I was so much vexed that I decided to run the
tests myself.

Downloading and using __the same html file__, the same lzbench, same library
versions, just a different computer and compiler, here is what it produced :

| Algo | compressed size | compression speed | decompression speed | |
--------- | --------------- | ----------------- | ------------------- | |
brotli -2 | 36223 | 220 MB/s | 670 MB/s | | zstd -1 | 36655 | 480 MB/ | 1400
MB/s | | brotli -1 | 38292 | 360 MB/s | 650 MB/s | | brotli -0 | 41141 | 560
MB/s | 610 MB/s |

__Conclusion__ : brotli -0 is indeed fast, faster than in my previous tests.
It seems to be tuned to reach this objective, but throw away a lot of
compression ratio to get there.

Consequently, brotli -0 is _not comparable_ to zstd -1, __it takes brotli -2
to produce an equivalent compressed size __. By that time though, zstd is
much, much faster.

Which is exactly the question I'm trying to get answers to : which algorithm
compresses better for a given speed budget ? That's what matters, at least in
our datacenter.

I'm not interested in ultra slow mode, but while at it, I wanted to complete
the picture with the missing compression speed of brotli - 11. It produced :
zstd -22 : 2.95 MB/s brotli -11 : 0.53 MB/s

So that's > 5x difference. It surely helps to reach better compression ratios.

I also wanted an answer to "by how much the dictionary helps ?".

Fortunately, TurboBench can help, thanks to a special mode which turns off
dictionary compression. By using it on the very same sample, brotli -11
compressed size increases from 25413 to 26639 bytes. 5% larger, clearly not
negligible. Still good, but it cuts the advertised size difference in half.

Anyway, clearly I feel disappointed to have to redo the tests myself, because
some inconvenient results were intentionally undisclosed (or not produced).
This really undermines my trust in future publications.

That learned me something : trust only benchmark done by yourself. And now, I
should probably benchmark even more ...

~~~
blank_state
The formatting is just so bad, let's retry it here :

    
    
       | Algo      | compressed size | compression speed | decompression speed |
       | --------- | --------------- | ----------------- | ------------------- |
       | brotli -2 |  36223          |  220 MB/s         |  670 MB/s           |
       | zstd -1   |  36655          |  480 MB/          | 1400 MB/s           |
       | brotli -1 |  38292          |  360 MB/s         |  650 MB/s           |
       | brotli -0 |  41141          |  560 MB/s         |  610 MB/s           |

~~~
JyrkiAlakuijala
If you read your own results with care you will notice that your results also
show that fastest brotli is faster than fastest zstd. Your results also show
that brotli compresses more than zstd at higher settings, even if you turn off
the static dictionary of brotli.

If you are interested at zstd 0.7.1 -22, you can reach the same compression
density with brotli 0.4.0 at quality setting -7 (at least for this file with
the static dictionary). Then you are comparing a brotli's compression speed of
57 MB/s to zstd's 4.01 MB/s. Brotli is 14x faster at this compression density.

For decompress-once roundtrip at this density, brotli achieves 53 MB/s, and
zstd 4 MB/s. Brotli's roundtrip is 13x faster.

In decompression brotli is 2x slower on Intel, but decompression times at 800+
MB/s are going to be negligible in most use (think < 1 % of cycles in your
datacenter), if the data is parsed/processed somehow afterwards.

Brotli's entropy encoding is simpler (no 64-bit operations), and because of
this on 32 bit arm the decompression speeds of zstd and brotli are about the
same.

I acknowledge that there can be use cases where zstd 0.7.1 can be favorable to
brotli 0.4.0 -- particularly those where a 32+ MB file is compressed at once
and the 150-500 MB/s compression speed range, but even this simple compression
test shows that brotli can compress significantly (5-10+ %) more with the
higher quality settings.

------
vvanders
Awesome to see the contributed back their custom allocator that lets you
allocate from a fixed block: [https://github.com/dropbox/rust-alloc-no-
stdlib](https://github.com/dropbox/rust-alloc-no-stdlib)

Solid stuff from the looks of it.

~~~
saidajigumi
Now I have a huge hankering to whip up a frame-based allocator, as used to be
popular in game and graphics programming. For those unfamiliar, the idea is
that you segregate allocation of memory objects of different sizes into their
own fixed blocks. This provides a means to control memory fragmentation, and
avoids hitting the typically slow system allocator with a lot of small
allocations/deallocations.

Fancy frame allocators would actually use a fairly standard malloc-style
interface, but transparently place allocations of different sizes in their own
regions. The allocator would internally use the slow system malloc to create
and expand the fixed blocks it uses for its own allocations. During
development, the frame allocator would be used to profile and optimize the
actual object allocation sizes being made. E.g. sometimes it might make sense
to pad some objects to consolidate the number of allocation-size regions. As
the project progresses, the initial block allocations for each allocation size
would be adjusted so that the system malloc was called very infrequently,
usually only at application startup.

~~~
Manishearth
It might be possible that jemalloc already does this? Rust uses it by default,
and while I don't recall the specifics jemalloc does a bunch of things in
userspace wrapped around malloc that make it better than malloc.

Rust makes it easy to swap out the allocator too, so I'd love to see an
allocator lib specifically focused on size management :)

~~~
Jweb_Guru
Yes, this is precisely how jemalloc works.

~~~
vvanders
Does jemalloc allow you to report telemetry and feed that back into the
initial pool sizes? That's what makes the gamedev oriented allocators
interesting.

FWIW a "Frame allocator" was always a block of memory where malloc returned
pointer into the block and advanced the pointer. At the end of rendering a
frame you just reset the pointer to the start of the block. The downside is
you need to guarantee that nothing in has a lifetime longer than a frame,
something Rust should be good at :).

I think what the GP is referring to is a pooling allocator that pools similar
sized allocations.

~~~
Jweb_Guru
> Does jemalloc allow you to report telemetry and feed that back into the
> initial pool sizes? That's what makes the gamedev oriented allocators
> interesting.

Not AFAIK. There are a lot of optimizations it's quite difficult to do with
dynamic pool sizes, which aren't worthwhile in the general case.

------
sams99
On the topic of Brotli, don't forget

\- It is available today on close to 50% or web browsers

\- Vast majority of CDNs block it today

I wrote about it here: [https://samsaffron.com/archive/2016/06/15/the-current-
state-...](https://samsaffron.com/archive/2016/06/15/the-current-state-of-
brotli-compression)

------
saynsedit
I'm having a hard time understanding the motivations for porting to rust...

If they were going to run the whole thing in a SECCOMP container anyway, there
is little damage a compromised C library could do.

If reasoning about uninitialized memory would take a review of the entire
brotli code base, didn't the rust port require that anyway? (speaking as
someone who has done a couple of cross-language rewrites)

~~~
Jweb_Guru
> If they were going to run the whole thing in a SECCOMP container anyway,
> there is little damage a compromised C library could do.

Not really. For instance, a compromised library can still rewrite the file
you're storing in DropBox to contain different contents, which could
ultimately result in remote code execution when you redownloaded it. Seccomp
only makes sure the _decompression_ server is safe from a rogue process.

~~~
daniel_rh
The file's sha256sum can be verified before the file is sent to any users, so
there's no chance of RCE there, even with a hypothetical C brotli-- but a
reproducible decompression is key.

Additionally, if you want to do the decompression on client-side, facilities
like SECCOMP simply may not be available on that platform. And in that case,
having a language like Rust to guard against RCE is an excellent idea. Also it
is easiest to maintain the same code running on all platforms rather than C
code where SECCOMP is available and Rust code where it is not.

~~~
saynsedit
Is the brotli decompression step really the most dangerous vector on the
client-side? What about all the non-verified client-side native code that
actually interprets file data? From a practical perspective using the C code
doesn't deteriorate existing conditions and using the Rust code doesn't
improve them.

------
maxst
Somebody, make "image format powered by brotli", please.

~~~
niftich
Since most of Brotli's improvements over its ancestor LZ77 are due to its
large, hardcoded, text-corpus dictionary [1], most of the algorithm's
strengths would be wasted on binary data like images.

Zopfli, from the same people, is a DEFLATE encoder, so it can be used in PNG
[2] and this has already been added to some optimizers, e.g. AdvanceCOMP [3]

[1]
[https://gist.github.com/klauspost/2900d5ba6f9b65d69c8e](https://gist.github.com/klauspost/2900d5ba6f9b65d69c8e)
[2]
[https://github.com/google/zopfli/commit/337d27f25ef15a6cf34f...](https://github.com/google/zopfli/commit/337d27f25ef15a6cf34fef2acd0613fddc411cb1)
[3] [http://www.advancemame.it/doc-advpng.html](http://www.advancemame.it/doc-
advpng.html)

~~~
est
First time saw the Brotli dictionary. It has duplicates.

Line 3131 and 8704 both are "操作"

~~~
niftich
Interestingly, there are also 121 different transformations [1] you can apply
to each dictionary word, from adding various prefixes and suffixes, trimming
letters, and some more complex ones [2].

If the plain-text dictionary linked earlier [3] is accurate, it'd appear that
the dictionary contains a lot of redundant forms.

[1] [https://tools.ietf.org/html/draft-alakuijala-
brotli-11#page-...](https://tools.ietf.org/html/draft-alakuijala-
brotli-11#page-27) [2] [https://tools.ietf.org/html/draft-alakuijala-
brotli-11#appen...](https://tools.ietf.org/html/draft-alakuijala-
brotli-11#appendix-B) [3]
[https://gist.github.com/klauspost/2900d5ba6f9b65d69c8e](https://gist.github.com/klauspost/2900d5ba6f9b65d69c8e)

------
mozumder
Can Brotli compress streamed blocks, with SYNC_FLUSH or FULL_FLUSH the way
zlib can?

~~~
Jweb_Guru
Brotli compression is fully streamable, as is mentioned several times in the
article.

------
artur_makly
go PiedPiper go!

~~~
artur_makly
oof.no humor allowed . got it.

~~~
conceit
I don't even get it and so may the downvoters

~~~
artur_makly
Pied Piper = name of the startup in Silicon Valley TV series

"... The team rushes to produce a feature-rich cloud storage platform based on
their compression technology.."

[1][https://en.wikipedia.org/wiki/Silicon_Valley_(TV_series)](https://en.wikipedia.org/wiki/Silicon_Valley_\(TV_series\))

