
Broccoli: Syncing Faster by Syncing Less - daniel_rh
https://dropbox.tech/infrastructure/-broccoli--syncing-faster-by-syncing-less
======
daniel_rh
Hi folks, I'm Daniel from Dropbox, and I am happy to answer any questions
about this tech.

~~~
Osiris
When are you going to offer a cheaper plan with less storage for people that
only need <50GB?

I lucked out and have 2 free plans that have bonus storage from various
promotions. I get about 25 GB per account. I haven't maxed either one.

I absolutely love the product. My wife scans a file, I can grab it right away.
I'm at work and need some document (e.g., my driver's license photo), I hop on
the website and download it.

I pay $5 for backblaze to backup 5TB. I don't want to spend $10 a month for
storage I'll never use (I couldn't even keep that much synced on most of my
devices) but I'd gladly pay $3-5 a month for 50-100GB.

For now, I'll keep mooching with my free plan.

~~~
daniel_rh
There's the family plan which offers up to 6 members an account for a great
monthly price.

[https://help.dropbox.com/accounts-billing/plans-
upgrades/dro...](https://help.dropbox.com/accounts-billing/plans-
upgrades/dropbox-family-plan)

With Dropbox Family, each member of the plan has their own Dropbox account. A
single person, the Family manager, will manage the billing and memberships for
the entire Family plan.

------
manigandham
This is why I continue to use Dropbox for daily work and constantly changing
files. The syncing is unmatched. It’s surprising how bad the others like
OneDrive and google drive are in comparison.

~~~
signal11
OneDrive completed its rollout of differential sync in April 2020[1], after
beginning in Sep 2019. This should improve OneDrive’s sync speed
substantially.

[1]
[https://techcommunity.microsoft.com/t5/office-365/onedrive-c...](https://techcommunity.microsoft.com/t5/office-365/onedrive-
completes-roll-out-of-differential-sync/td-p/1343279)

~~~
manigandham
They already had this for Office files, it's just finally extended to all file
types after several years. It's still nowhere near as fast as Dropbox,
especially for complex directories, and the fact that it took until 2020 to
finish this feature shows how far behind they are.

------
AaronFriel
I'm more of a security-focused engineer so I'm most interested in the
"specially crafted low-privilege jail". What protocol gets data in and out,
not shared memory I'm sure? Do the jail processes also have to implement an
RPC server (protobuf/gRPC/HTTP?) or is there another mechanism for giving them
work and receiving results?

~~~
daniel_rh
Dropbox uses a similar toolbox as
[https://chromium.googlesource.com/chromiumos/docs/+/master/s...](https://chromium.googlesource.com/chromiumos/docs/+/master/sandboxing.md)

And yes, much of the overhead stems from the RPC server that needs to be
implemented. For lepton we used a raw TCP server (a simple fork/exec server)
to answer compression requests. For Lepton we would establish a connection and
send a raw file on the socket and await the compressed file on the same
socket. A strict SECCOMP filter was used for lepton. It was nice to avoid this
for broccoli since it was implemented in the safe subset of rust.

~~~
AaronFriel
Thank you for the technical answer!

------
rspoerri
In my opinion broccoli does not go so well with bread (brötli = bread roll in
swiss german), so some more matching name suggestions are: gipfeli
(Croissant), weggli, pfünderli (500g bread), bürli, zöpfli

:-)

~~~
daniel_rh
Savory with a touch of sweetness, Broccoli Bread cooks up like cornbread but
offers fiber and calcium. The original name was Brot-cat-li (since files could
be concatenated and compressed in parallel), but when we said it fast it
sounded like "Broccoli" and the name stuck.

------
vmchale
Surprised they didn't look more at zstd.

IME it's faster than brotli and often has a better compression ratio.

~~~
repiret
It looks like they did, but having an implementation in a memory-safe language
was one of their requirements. Learning _that_ was for me the most fascinating
part of the article.

~~~
JosephRedfern
Surely Dropbox would have the engineering power to re-implement zstd in a
memory safe language if it was sufficiently beneficial.

~~~
nawgz
I'm sure they could implement it technically speaking, but if a compression
protocol is not widespread enough to have others doing such a thing, they can
probably consider that a sign of how supported it is.

------
kevincox
The header on the page keeps hiding and reappearing as I scroll making it
incredibly difficult to read.

------
lifthrasiir
> Maintaining a static list of the most common incompressible types within
> Dropbox and doing constant time checks against it in order to decide if we
> want to compress blocks

There is also a format-agnostic and adaptable heuristic to stop compression if
the initial part (say, first 1MB) of the file seems incompressible. I'm not
sure whether this is widespread, but I've seen at least one software doing
that and it worked well. This can be combined with other kinds of heuristics
like entropy estimation.

------
no_wizard
This is a really interesting write up of their use of Brotli! Makes me wonder
if there might be a novel way I could leverage it beyond HTTP Responses.

I never realized the advantages of brotli over zlib could be so extensive, in
particular, it appears they're getting a huge speed boost (I think also in
part that its written in Rust)

>we were able to compress a file at 3x the rate of vanilla Google Brotli using
multiple cores to compress the file and then concatenating each chunk.

Side note: I admit, at first I thought they were talking the Broccoli build
system[0]

[0][https://github.com/broccolijs/broccoli](https://github.com/broccolijs/broccoli)

------
jeffbee
The tradeoff between client CPU time and upload speed is interesting. If they
need to be able to output compressed text at 100mbps, that gives a budget of
~100ns/byte, or pretty much what they would have been spending with zlib in
the first place. But on my fiber connection I only have a budget of 10ns/byte.
Does that mean you'd use the equivalent of `brotli -q 1` for me? If so,
doesn't the march of progress continually erode the advantages of compression
in this use case?

------
shadykiller
Is it possible to use this as rsync replacement ?

~~~
zmj
They aren't on the same level of abstraction. Rsync currently uses zlib for
block compression on the wire. Brotli/broccoli would be an alternative option.

~~~
celias
New compression options were added in rsync 3.2. From
[https://download.samba.org/pub/rsync/NEWS#3.2.0](https://download.samba.org/pub/rsync/NEWS#3.2.0)

Various compression enhancements, including the addition of zstd and lz4
compression algorithms and a negotiation heuristic that picks the best
compression option supported by both sides.

------
lanius
Is there a pun between Broccoli and Brotli I'm not aware of? There's another
Brotli compression tool called Broccoli (written in Go), just a coincidence?

~~~
nerdponx
_We codenamed the Brotli compressor in Rust “Broccoli” because of the
capability to make Brotli files concatenate with one another (brot-cat-li)._

------
tyingq
Curious if there's enough of any one type of file that a specialty compression
for it would be worth the added complexity.

~~~
daniel_rh
Great question! We developed and deployed Lepton to losslessly encode JPEG
image files. Lepton continues to deliver substantial storage and cost savings
every year. You can read more about it here
[https://dropbox.tech/infrastructure/lepton-image-
compression...](https://dropbox.tech/infrastructure/lepton-image-compression-
saving-22-losslessly-from-images-at-15mbs)

------
andrewshadura
I wonder whether syncthing can use it.

------
Scaevolus
None of the images are loading. :(

~~~
jainr
Should be fixed now :)

------
rmhorn
Good supporting data

~~~
ksoong2
Yeah, I really like how well the performance is quantified

------
myrloc
Middle out compression has shown considerable performance over the
investigated options listed in the article. I wonder why it was not mentioned?

Just kidding :) great article. As others have said, supporting data was very
informative.

