
Zsync: Differential file downloading over HTTP using the rsync algorithm (2010) - pmoriarty
http://zsync.moria.org.uk/
======
Doman
We are using this for addon synchronization in our community through
Arma3Sync. On server side we need to "build" repository - it just generates
.zsync files, then clients are downloading just diff. Update size came down
from 10-15GB to <3GB.

------
blacksqr
Fossil ([https://www.fossil-scm.org](https://www.fossil-scm.org)) is a SCM
tool that uses the rsync algorithm for syncing repositories. It has a built-in
web server, and can also be accessed via CGI from any CGI-capable web server.
It also has a SSH option.

Its features make it very handy for a number of file transfer/sync tasks, over
and above its chief SCM role.

------
just_astounded
If you like rsync and javascript, I wrote
[https://github.com/claytongulick/bit-
sync](https://github.com/claytongulick/bit-sync) which is kinda fun.

------
js2
2005\. So I guess this didn't catch on.

~~~
cbhl
Ubuntu still provides zsync of its installation media:
[http://cdimage.ubuntu.com/ubuntu/releases/17.04/release/](http://cdimage.ubuntu.com/ubuntu/releases/17.04/release/)

That said, I'm not sure I know of any other major users of it -- most people
just use a .torrent (which similarly has checksums of each piece so you know
which pieces need to be downloaded).

~~~
dividuum
Not a major user, but we're using zsync for system updates of our Raspberry Pi
based digital signage operating system ([https://info-
beamer.com/hosted](https://info-beamer.com/hosted)). It's pretty great and
offers a few things we couldn't do with bittorrent: Every time we have a new
release we put together an install.zip file of everything required (kernel,
firmware files, initrd, squashfs). Users can download this file directly and
unzip it on their SD card and it will boot our OS. For updates we use a
previous (see below) version of our install.zip already available on the
device and only download the changes. We then unzip that into a new partition
for an A/B booting.

Zsync is awesome as we can specify any number of existing files already
available on the device (with the -i command line option) and zsync will try
to make use of them to minimize downloads. We really use this feature to our
advantage: zsync by default will keep the previous version of a file if it's
going to overwrite it. So we have two versions of install.zip on a device.
When switching between OS releases (stable / testing...) we can switch back
and forth with zero additional downloads as both versions are available and
zsync makes use of that. Similarly after a user installs our OS, we just have
the unpacked artefacts (kernel, etc.) on the SD. We can quickly recreate an
initial version of the install.zip file on the device by seeding the download
with those files. It's usually just 500k to construct an initial install.zip
file we then later use to minimize all future updates.

~~~
alex_hirner
Did any IoT device management service fit the bill too (back then or now)? We
are heading towards a similar use case.

~~~
dividuum
OS development for info-beamer started in 2013. I'm fairly sure nothing even
close was available back then. I'm not sure about today. So far I don't regret
the NIH approach we took.

------
iFire
I'm not related to Itchio.

If you are looking for a maintained system for online systems that provide
software updates I would look into [https://github.com/itchio/wharf-
spec](https://github.com/itchio/wharf-spec).

Wharf is used for Itchio to sync folder structures differentially /
incrementally. It uses the latest compression algorithms. It has a reference
server.

Alternatively, a port of zsync is
[https://github.com/salesforce/zsync4j](https://github.com/salesforce/zsync4j)
written in Java.

I had trouble compiling zsync for windows.

~~~
eeZah7Ux
How is the wharf protocol better than zsync?

~~~
iFire
[https://itch.io/docs/wharf/design-
goals.html](https://itch.io/docs/wharf/design-goals.html) Describes the goals.

[https://itch.io/docs/wharf/algorithms/diff.html](https://itch.io/docs/wharf/algorithms/diff.html)
and
[https://itch.io/docs/wharf/algorithms/apply.html](https://itch.io/docs/wharf/algorithms/apply.html)
describe patching

The most important thing is that Itchio runs a business on the usability of
this system.

~~~
eeZah7Ux
Is it outperforming zsync in terms of bandwidth? That's the most important
question for me.

------
bruxis
This makes the claim that you don't need to run any additional software on the
server, other than having an HTTP/1.1 compliant host.

How does zsync diff against the local file without downloading the contents
from the server?

~~~
deelowe
It uses a special file that has to be installed on the server. Perhaps you
thought (as did I) that this was supposed to work with any file over http?
That does not appear to be the case.

[http://zsync.moria.org.uk/server](http://zsync.moria.org.uk/server)

~~~
oliwarner
Installed on? Uploaded to.

It's just a pair of files. The big thing you're trying to transfer and the
zsync file that details the content of that first file, to guide the
downloader.

~~~
deelowe
I'm not sure I see the difference?

~~~
DrJosiah
You can upload the .zsync file to anywhere Apache, Nginx, etc., can read, and
it all just works.

You don't need a plugin, you don't need to change your httpd.conf, ...

~~~
deelowe
Yeah. I assumed that.

------
takeda
This reminds me about the demo utility that comes wit rsynclib called rdiff,
except they approached this problem in less practical way, although it shows
rsynclib better.

It works as follows:

\- Let say you already have file that is older version, or perhaps corrupted,
you use rdiff to generate its signature

\- you go then to the place which contains proper file and use the signature
file to generate a patch file

\- then you use the patch file to fix your local file

~~~
feelin_googley
Not that I would know any better but I always saw a user-controlled approach
built around rdiff as a better alternative than surrendering files to a non-
transparent third party such as Dropbox (who, go figure, used librsync
originally).

There is an elegant simplicity to rdiff, IMHO.

~~~
teraflop
[http://duplicity.nongnu.org/](http://duplicity.nongnu.org/) implements this
approach, FWIW.

~~~
feelin_googley
I am aware of another project called "rdiff-backup", also at nongnu.org:

rdiff-backup backs up one directory to another, possibly over a network. The
target directory ends up a copy of the source directory, but extra reverse
diffs are stored in a special subdirectory of that target directory, so you
can still recover files lost some time ago. The idea is to combine the best
features of a mirror and an incremental backup. rdiff-backup also preserves
subdirectories, hard links, dev files, permissions, uid/gid ownership (if it
is running as root), and modification times. Finally, rdiff-backup can operate
in a bandwidth efficient manner over a pipe, like rsync. Thus you can use
rdiff-backup and ssh to securely back a hard drive up to a remote location,
and only the differences will be transmitted.

~~~
Bromskloss
Yes! I use this and like that it's so simple, and that the latest version of
the backup is easily available as plain files. (Any metadata that the
filesystem doesn't support are stored separately in files, so it works across
different type of filesystems and operating systems.) There is even a FUSE
filesystem, "rdiff-backup-fs", for mounting the whole backup history, with
each backup point in a subdirectory of its own, like it should be!

Unfortunately, it seems not to be developed any longer, and it has a few
things that would need ironing out:

* You can't pause a backup and continue later. * Some operations (notably recovery after an aborted backup run) is excruciatingly slow. It takes tens of hours for me with a backup of 40 GB or so (on a low-powered computer as server, though). I think rdiff-backup-fs is resource hungry as well, which is perhaps partly understandable, since it has to go through a series of reverse diffs to present old versions of a file. * I tried it on Windows once, and it could apparently not handle paths longer than a few hundred characters (due to using that older Windows API, whatever it's called). * You can't delete intermediate backups, only the oldest one.

~~~
deivid
Have you tried rsnapshot?

------
a4dev
Lennart Poettering has an interesting tool, casync , which overlaps with
zsync. Claims to be a better solution for image syncing. Reasons given here:
[http://0pointer.net/blog/casync-a-tool-for-distributing-
file...](http://0pointer.net/blog/casync-a-tool-for-distributing-file-system-
images.html)

------
e12e
There's also xdelta which is just an algorithm / program for calculating and
applying binary diffs. I suppose the advantage of zsync is that you can always
point "new" users to "/current.file", and use zsync to patch "up" to the
latest version - with xdelta people would need to explicitly get
"my..current.xdelta".

[http://xdelta.org/](http://xdelta.org/)

~~~
rakoo
The big difference is that xdelta (much like any diff program) needs both the
old and the new versions to create a patch. With zsync, the server only needs
to have the new version (which is the only one of interest), and the clients
then gets only the parts they want, because _they only_ have the old version.

zsync also does all the fetching stuff directly.

------
feelin_googley
On a LAN or UDP-based "Layer 2 overlay", there is mrsync for this purpose. One
could efficiently distribute regular updates to some data that everyone on the
network needs, e.g., "domain names" and IP addresses.

------
Bromskloss
I've been thinking that we should replace "cp" with "scp", and then replace
both of them with "rsync". Is that a bad idea?

~~~
Piskvorrr
For what _use case_? We do use `rsync` instead of `cp` in some capacity, even
for local-to-local file copy - as there is slightly more verification of a
successful copy, and the destination is a quirky flash medium. Not sure how
SCP would help here.

~~~
Bromskloss
For all use cases.

> Not sure how SCP would help here.

I mean that my train of though was first "scp, it seems, can do everything cp
can, and more, so let's drop cp and use scp instead", then "but hey, rsync, it
seems, can do everything that scp can, and more, so let's drop scp too and use
rsync instead".

~~~
Piskvorrr
Possibly. `cp` is ancient and rather basic; OTOH, it is everywhere (as opposed
to `rsync`, which I found out the hard way) and it is tiny (fewer toggles to
push - less stuff to break).

