Static dictionary is not why Brotli reaches excellent compression density. Much of it comes from the more advanced context modeling and in general more dynamic entropy modeling.
Brotli wins over Zstd on web compression even when you fill the static dictionary with zeros.
No Mountain View based engineers participated in the development of Brotli, it was built in Zurich. The static dictionary was distilled from a language corpus originally having 40 languages. To reduce the binary size while keeping effectiveness, I reduced it to six: English, Spanish, Chinese, Russian, Hindi, Arabic. What got into the dictionary and in which order it was (both words and transforms) were chosen by how much entropy they reduced in a relatively large test corpus including diverse material (for example over 110 natural languages).
Brotli was originally designed as a faster to decode replacement of LZMA for fonts to enable WOFF2. LZMA was too slow for that use. WOFF2 is just geometry, no natural languages there. There, W3C observed that Brotli performed similarly to LZMA in density but was significantly (~5x) faster to decode -- and enabled the WOFF2 launch.
Unlike SDCH, Brotli's performance does not degrade when the static dictionary ages. This is because Brotli only uses invalid LZ77 commands to codify static dictionary entries. If you don't use them, you are not paying for them. Just like Zstd, users can bring their own dictionaries when they prefer that -- a functionality that was in Brotli before it was introduced in Zstd.
Unlike ZStd, Brotli degrades less for exotic languages heavy on utf-8 (Korean, Vietnamese, Chinese, Japanese, etc.), and especially so for mixed data such as HTML-commands mixed with the above utf-8 sources.
Unlike LZMA, Brotli's context modeling works for very short data, too -- It becomes about 0.6 % worse on gigabyte corpora, but can be 15+ % better on shortest documents.
Unlike Zstd, Brotli offers the data in a streamable order, i.e., hides less data during the transfer. The user can decode much more out of a brotli stream than a respective fraction of zstd stream (shadowed amount is in tens of bits vs. tens of kilobytes in zstd). This is because brotli does not reshuffle the data within the blocks for less cpu-work at decoding time. Any shadowing of data will make other dependant resources loads start later or dependant processing (Javascript or HTML parsing) to happen later.
Most of Brotli's advances come from other algorithms than LZ77. The LZ77 part of ZStd and Brotli are essentially the same. The LZ77 algorithm is proven optimal when the data is infinitely long as well as the sliding window -- this means that the longer the data, the less difference you see. If you benchmark Brotli vs. Zstd with real life data (like HTML pages of 75 kB) you see a different performance behaviour than if you compare them using a 100 MB or 10 GB file. There, most of Brotli's benefits disappear.
Zstd's encoder has seen more work during the years and some features (like the long maching) is missing from Brotli. Reaching parity in the encoder could reduce the gap. An interesting option would be to have a single encoder for both.
Compiling is also difficult: One continual topic has been performance degradations when moving from GCC/CLANG to MSVC. Brotli was never optimized for MSVC and it seems something is going badly wrong there. Also, in the past several benchmarks were comparing a non-optimized build of Brotli vs a release build of Zstd. This was because until summer 2020 Brotli compiled without options produced a non-optimized build -- you'd need to configure release specifically to get it right.
Brotli wins over Zstd on web compression even when you fill the static dictionary with zeros.
No Mountain View based engineers participated in the development of Brotli, it was built in Zurich. The static dictionary was distilled from a language corpus originally having 40 languages. To reduce the binary size while keeping effectiveness, I reduced it to six: English, Spanish, Chinese, Russian, Hindi, Arabic. What got into the dictionary and in which order it was (both words and transforms) were chosen by how much entropy they reduced in a relatively large test corpus including diverse material (for example over 110 natural languages).
Brotli was originally designed as a faster to decode replacement of LZMA for fonts to enable WOFF2. LZMA was too slow for that use. WOFF2 is just geometry, no natural languages there. There, W3C observed that Brotli performed similarly to LZMA in density but was significantly (~5x) faster to decode -- and enabled the WOFF2 launch.
Unlike SDCH, Brotli's performance does not degrade when the static dictionary ages. This is because Brotli only uses invalid LZ77 commands to codify static dictionary entries. If you don't use them, you are not paying for them. Just like Zstd, users can bring their own dictionaries when they prefer that -- a functionality that was in Brotli before it was introduced in Zstd.
Unlike ZStd, Brotli degrades less for exotic languages heavy on utf-8 (Korean, Vietnamese, Chinese, Japanese, etc.), and especially so for mixed data such as HTML-commands mixed with the above utf-8 sources.
Unlike LZMA, Brotli's context modeling works for very short data, too -- It becomes about 0.6 % worse on gigabyte corpora, but can be 15+ % better on shortest documents.
Unlike Zstd, Brotli offers the data in a streamable order, i.e., hides less data during the transfer. The user can decode much more out of a brotli stream than a respective fraction of zstd stream (shadowed amount is in tens of bits vs. tens of kilobytes in zstd). This is because brotli does not reshuffle the data within the blocks for less cpu-work at decoding time. Any shadowing of data will make other dependant resources loads start later or dependant processing (Javascript or HTML parsing) to happen later.
Most of Brotli's advances come from other algorithms than LZ77. The LZ77 part of ZStd and Brotli are essentially the same. The LZ77 algorithm is proven optimal when the data is infinitely long as well as the sliding window -- this means that the longer the data, the less difference you see. If you benchmark Brotli vs. Zstd with real life data (like HTML pages of 75 kB) you see a different performance behaviour than if you compare them using a 100 MB or 10 GB file. There, most of Brotli's benefits disappear.
In a gigabyte multi-corpus comparison, Brotli still compresses ~5 % more than Zstd. See: https://github.com/google/brotli/issues/642 -- aggregated here: https://encode.su/threads/2947-large-window-brotli-results-a...
Zstd's encoder has seen more work during the years and some features (like the long maching) is missing from Brotli. Reaching parity in the encoder could reduce the gap. An interesting option would be to have a single encoder for both.
Compiling is also difficult: One continual topic has been performance degradations when moving from GCC/CLANG to MSVC. Brotli was never optimized for MSVC and it seems something is going badly wrong there. Also, in the past several benchmarks were comparing a non-optimized build of Brotli vs a release build of Zstd. This was because until summer 2020 Brotli compiled without options produced a non-optimized build -- you'd need to configure release specifically to get it right.