restoring files after doing rsync with compression


Can someone tell me with these options, does it affect the files being backed up in any way if ever they need to be restored to its original folder?

2015-01-26 18:11:36gstlouis

No, the -z switch does not in any way affect the data written to the target location, and -a is simply meant to ensure a more faithful copy is made.

more info



-a is simply a shorthand that enables switches meant to ensure the target location data is in all respects identical to the source location data after the transfer finishes. It has an effect on file metadata, but not file contents; the contents of the copied files will be the same whether you used -a or not.)

Notice that the description for -z says compression during the transfer. That is the important part, but it may not be easy to understand without some background knowledge.

Primarily, the fact that rsync uses a client/server architecture, even for local transfers; one side reads data from the source location, and the other side writes to the target location. Between these two may be a network connection, or they may be running on the same host. This architecture allows rsync to use the exact same protocol and essentially the same implementation for copying locally or copying over a network: the only part that needs changing is an intermediary layer that actually forwards data back and forth between the rsync instances.

When your throughput is limited by the network throughput, or if you are charged by the amount of network data transferred, and possibly if you have free CPU time on the source and target systems, then compressing the data stream that flows across the network can help the copying process finish sooner (or cost less) because you trade some additional CPU time used for less data flowing across the network.

The target rsync process will then uncompress the data before processing it further and eventually writing it to storage.

It follows from this that turning compression on when copying files locally using rsync is essentially wasting CPU time, as the connection between the two rsync instances involved is much faster than any other I/O involved and the same CPU would be doing both compression and decompression of the data stream. In such a case, ignoring caching for a moment, the data would be read from disk (slow) into RAM, possibly copied within RAM (fast) followed by being written out to disk again (slow). The slow components are going to dominate, and copying less data within RAM is not going to help speed up the process noticably (and may very well slow it down due to the additional processing required, which in itself quite possibly requires in-RAM copying of data). If you are really unlucky, enabling compression will put you over the limit where swap space needs to be used to fit all required data in memory, which will basically kill performance.

Regular RAM these days can handle multi-gigabytes per second sustained transfers without breaking a sweat. A 7200 rpm spinning disk drive tends to top out at about 120-150 MB/s in sequential operation, and random file I/O (as done by rsync) is virtually never sequential for more than short bursts. SSDs can do better both in terms of latency and throughput, but are still orders of magnitude slower than RAM. Hence, when copying locally, you will always be I/O bound, and compressing the in-transit data stream at best does not make any difference because the same amount of data is still read and written, which as we saw above are the slow parts of the process.

2015-01-26 18:12:34