zsync – transfer large files efficiently

downloadA few days ago I have stumbled upon a zsync tool used for a fast transfer of very large files. The reason I have noticed it is because Ubuntu started to use it for its daily live images. And because I am curious, I have studied it and realized that zsync is great! 🙂 And I have also created some tests to see how well it works.

So what is the zsync?

zsync is a file transfer program. It allows you to download a file from a remote server, where you have a copy of an older version of the file on your computer already. zsync downloads only the new parts of the file. It uses the same algorithm as rsync. However, where rsync is designed for synchronising data from one computer to another within an organisation, zsync is designed for file distribution, with one file on a server to be distributed to thousands of downloaders. zsync requires no special server software — just a web server to host the files — and imposes no extra load on the server, making it ideal for large scale file distribution.

How does it work?

You simply generate a small .zsync file on the server for each big file you offer users to download. This .zsync file contains description of the contents of the big file. The user can then use the “zsync” tool with your .zsync file as an argument and use arbitrary file as a base for the new big file. It can be really any file, a previous version of the big file, a pre-pre-pre-previous version of the big file, or even a newer version of it! But the more it is similar the better. The zsync tool will then compare the files and download only the differences needed to assemble the new big file.

Usages

zsync is a great tool for developers who regularly download updated big files like daily CD/DVD images and similar stuff. It can really save you a lot of time and bandwidth it the files haven’t changed a lot. Especially if you have a slow internet line or the server has a slow line you might appreciate it. So what’re you waiting for?

Pitfalls

Oh, I haven’t told you yet. zsync is not currently available in Fedora because of some license problems 😦 But you can always find RPMs elsewhere.

Testing the efficiency

Here are the tests I have performed to see how well it works.

Ubuntu Karmic development images:

base image resulting image download size saved bandwidth
20090830 (703 MB) 20090831 (687 MB) 57 MB 91.7%
20090831 (687 MB) 20090901 (687 MB) 1 MB 99.9%
20090830 (703 MB) 20090901 (687 MB) 57 MB 91.7%
20090901 (687 MB) 20090830 (703 MB) 73 MB 89.6%

As you can see, zsync works great for Ubuntu images. When updating very recent images, you can save about 90% of bandwidth (and time). Unfortunately I haven’t had any older images to try to update from not-so-recent image. But notice that with zsync you can also go “back in time” from a more recent image to an older one (the last line).

Fedora 12 Alpha images:

base image resulting image download size saved bandwidth
RC1 (701 MB) RC2 (705 MB) 241 MB 65.8%
RC2 (705 MB) Final (705 MB) <1 MB 100.0%
RC1 (701 MB) Final (705 MB) 241 MB 65.8%
Final (705 MB) RC1 (701 MB) 237 MB 66.2%

For Fedora 12 Alpha images you can still save more than half when updating from RC1 to RC2 and you will be extremely pleased when updating from RC2 to Alpha Final. Although those two files are not the same, they are nearly the same (probably just some system label has changed) and you will get the image instantly.

Fedora 12 nightly composes:

base image resulting image download size saved bandwidth
20090809 (703 MB) 20090818 (702 MB) 287 MB 59.1%
20090818 (702 MB) 20090824 (630 MB) 404 MB 35.8%
20090824 (630 MB) 20090825 (636 MB) 361 MB 43.3%
20090825 (636 MB) 20090827 (640 MB) 340 MB 46.9%
20090809 (703 MB) 20090827 (640 MB) 423 MB 33.9%
20090827 (640 MB) 20090809 (703 MB) 486 MB 30.8%

For Fedora 12 nightly composes the savings number is apparently not so great as for Ubuntu images. Quite interestingly some long term updates may be more efficient than some short term updates – it depends how much the contents have changed over time. You still get around 45% of saved bandwidth, which is very nice. Why the Fedora images differ so much as opposed to Ubuntu images? You can read the Oxf13’s explanation on Fedora QA IRC meeting – the whole ISO is a squashfs image, that is not suitable for this kind of difference comparison. Ubuntu is probably using other method, which is more compatible with zsync algorithm.

Conclusion

So what do you think? Should there be .zsync files for ISO images in Fedora? I hope this article will help the infrastructure team to decide 🙂