KySync is an efficient way to distribute new data which makes use of older (but similar) data that you may already have present locally. KySync supports HTTP v1.1, but can easily be extended to support any server protocol which supports range queries.
KySync is [released](https://github.com/kyotov/kysync/releases) under the MIT Open Source License (see [COPYING](https://github.com/kyotov/ksync/blob/master/COPYING) in root of repository).
KySync is a full rewrite of [Zsync](http://zsync.moria.org.uk/) in modern C++. While no code was reused from Zsync, the awesome [Zsync technical paper](http://zsync.moria.org.uk/paper200503/) was the major resource used for the implementation of KySync.
The value proposition of KySync over Zsync is that it takes advantage of modern architecture features (multi-core multi-CPU systems as well as exceedingly fast IO subsystems, e.g. NVMe SSDs). KySync is 3x-10x (or more) faster than Zsync on such commonly available modern hardware. We have not spent much time optimizing KySync single-thread performance, so there are cases where with sufficiently high similarity, Zsync is faster when less then 4 threads are used in KySync.
I skimmed through kysync docs and I don't see any meaningful discussion of this aspect.
If you are to start with B and then parallelize its bottlenecks (where possible), you will normally see performance gradually increasing with the thread count, then plateauing and only then, possibly, starting to drop.
But if A with a single thread is outright worse than B, it means that A's per-thread overhead is higher. In case of kysync it appears to be 3-5 times as high as zsync's. That's immense!
Similarly, if A with a low thread count is worse than B, then it may be an issue with A's sync/threading model (prolonged contended waits, etc.).
In either case, it's a sign that there's something inherently inefficient with A's implementation... even if spawning 16 threads helps offsetting it and beating B's single thread.
KySync started as a pet project to brush up on my C++ skills which I had not used for 20 years and was going to need for a job that I started last week! I had spent my fair share of time optimizing single threaded performance back in grad school, and was looking for different experiences.
That said, and as we mentioned in the write up, we had some improvements cooking that we did not put in v1.0. I spent some time tonight to merge them in and run some performance experiments.
Single threaded performance is now much improved and practically on par with Zsync. I released it as v1.1.
We will of course keep looking for more performance, but if you have any suggestions for further improvements, please share -- or join us!
P.S. Both of these are contributed by Chaim Mintz, whom I worked with in a previous life.
Within a single physical system ("multi-core multi-CPU" or with NVMe) - you rarely use something like rsync, zysnc or keysync: Your files are already local, on your system. If you want a second copy of a file on the same system, you would symlink or hard-link, or use other filesystem mechanisms. It's not even clear a clean copy would be slower than doing a bunch of comparisons.
On the other hand, between remote systems, the "modern day architecture features" mostly don't apply. I suppose a more clever use of modern kernels could help performance somewhat. Maybe.
The primary reason to do the performance comparison on a single system, is so that the results are easy to replicate with as little setup as possible. Because we do this for both KySync and Zsync it is apples to apples.
HTTP bandwidth and storage cost money, and this is self funded project, so I can't afford to put up test files of data publicly visible to the world.
One thing we can look into is leverage AWS/S3 to upload some data and use it for a performance experiment, but that will need some logistics for the developer to set their AWS account properly. Will look into it.
Of course, the more similar the files are, the closer the remote results will be to this first set we published.
* Within the same system?
* Over close-by systems on a LAN segment?
* To typical far machines over the Internet?
Intuitively, the more similar the files are, the more 3x-10x+ will be representative in the real world. As the files become more and more different, one is of course at the mercy of the bandwidth between the computers. If you need to transfer 1 GiB differences over 1 MiB/s connection, it will take ~1000 seconds -- no magical way around it.
The comparison so far is practically on the computational cost of the sync, which can be significant as differences pile up.
Assuming zsynch is single threaded, in an apples/apples comparison (1 thread) zsynch wins by over 10x. I can't help but wonder how much faster kysynch would be if they focused more on single threaded performance. The way things stand, kysynch's implementation sounds rather wasteful I'm terms of CPU resources, in spite of what this announcement claims. Still, kudos on this impressive project!
One thing most client/home systems have in abundance most the time is spare cores.
The huge speed-ups for KySync are when it has 16+ threads on a 96 core system. But if my computer has two cores, it is not in fact going to efficiently run 16 CPU banging threads, and actually it might choke badly and I should rather run the dual thread KySync, which is much less impressive.
Some problems in this area (data synchronization) look embarrassingly parallel, which means that sure, if you can run more at once you win, but, there's no interesting problem, you could also run four copies of Zsync, "embarrassingly parallel" means the concurrency problem is trivial, in this case each of those four maybe handles a separate file or region of the file, it's embarrassing because all the clever techniques you learned in class aren't important, the problem is so easy.
I can see it wasn't the point of this work, but from an inquisitiveness point of view my question even after just one chart is, "What happens if we run this on a much more modest VM?". Do the extra threads trample on each other? Are they automatically suppressed?
Only if you cherry pick the results and look only at 128MB files. At 512MB files, there is parity at 4 threads. At 2GB files there is almost parity at 2 threads, and significant gains at 4 threads.
> But if my computer has two cores
Then your CPU wasn't sold to you in recent years, as entry level desktop CPUs ship with 4 cores now and have for a couple years.
That said, if you're actually in this situation, then you just run zsync if it's faster. The existence of kysync isn't going to make it disappear or prevent you from using it, any more than zsync prevented rsync from being used.
I'm not sure the point you're trying to make. A new implementation of an existing protocol came out that allows for speed gains if certain criteria are met, which includes throwing more compute at it. It feels like you think this is a bad thing for some reason?
Zsync improves on rsync by
supports handling of some compressed formats.
target can be http.
offload more work to the client
Please consider not breaking how many people browse web sites.
Since the content of the page is pretty stable now, I can probably host it in more normal way than Notion.
It is so convenient to edit in Notion, but I share the frustration about paging and navigation for reading!
In summary, we see:
- 10%-100%+ performance improvement with v1.1 on 2 GiB data and 16 threads;
- A significantly improved single thread performance, now on par with Zsync for 2GiB data.
Checkout the whole write-up at https://kyall.notion.site/KySync-v1-1-dd9931f330f241469d3e60...
Zsync has a comparison with rsync in isolated cases where it makes sense and it is very close in performance!