Rsync

From Christoph's Personal Wiki
Jump to: navigation, search

rsync is a command line tool which synchronizes files and directories from one location to another while minimizing data transfer using delta encoding when appropriate. An important feature of rsync not found in most similar programs/protocols is that the mirroring takes place with only one transmission in each direction.

rsync can copy or display directory contents and copy files, optionally using compression and recursion.

rsync has the default TCP port of 873.

The rsync algorithm

Abstract: This report presents an algorithm for updating a file on one machine to be identical to a file on another machine. We assume that the two machines are connected by a low-bandwidth high-latency bi-directional communications link. The algorithm identifies parts of the source file which are identical to some part of the destination file, and only sends those parts which cannot be matched in this way. Effectively, the algorithm computes a set of differences without having both files on the same machine. The algorithm works best when the files are similar, but will also function correctly and reasonably efficiently when the files are quite different.[1]

Some features of rsync include:

  • can update whole directory trees and filesystems
  • optionally preserves symbolic links, hard links, file ownership, permissions, devices and times
  • requires no special privileges to install
  • internal pipelining reduces latency for multiple files
  • can use rsh, ssh, or direct sockets as the transport
  • supports anonymous rsync which is ideal for mirroring

Usage

Examples

Let us say you wish to rsync your home directory (e.g. /home/bob) with a backup directory/disk (e.g. /backup). The following command will accomplish this:

$ rsync -avz --delete /home/bob/ /backup/

where a means archive, v means do it verbosely, z means compress the data, and delete means to delete the backup file if the original file (i.e. in /home/bob) has been deleted since the last rsync.

The same can be done where the backup disk is on a remote machine via ssh. See here for more information.

  • Copy files to/from computers in your local area network

Consider a case where you have two computers plugged into your home router. To copy files/directories between the two, first find out their local IPv4 addresses (e.g., configured to use eth0) using `ifconfig` or `ip a`. The following command will copy files between the two machines (make sure your firewall rules allow connections via port 22 for SSH):

$ rsync -e 'ssh -p 22' -avl --stats --progress /home/bob/source bob@192.168.0.2:/home/bob/destination
  • Copy multiple files at the same time

Consider a case where you have multiple image files (e.g., foo-1.jpg, foo-2.jpg, and foo-3.jpg) and you wish to copy them from a source host to a destination host. You can you standard globing like so:

$ rsync -e 'ssh -p 22' -avl --stats --progress /home/source/foo-{1..3}.jpg bob@192.168.0.2:/home/destination
  • Exclude certain directories:
rsync -e ssh -a --exclude 'dev' --exclude '/proc' --exclude '/sys' / bob@192.168.0.2:/arc/2010-03-24

rsyncd.conf

The rsyncd.conf file is the runtime configuration file for rsync when run as an rsync daemon. The rsyncd.conf file controls authentication, access, logging, and available modules. See man rsyncd.conf for more information.

  • Sample rsyncd.conf file:
motd file = /etc/rsyncd.motd
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid
lock file = /var/run/rsync.lock

[simple_path_name]
   path = /rsync_files_here
   comment = My Very Own Rsync Server
   uid = nobody
   gid = nobody
   read only = no
   list = yes
   auth users = username
   secrets file = /etc/rsyncd.scrt

Examples (extended)

Note: The following were taken directly from the rsync website (with some modifications).

Backup to a central backup server with 7 day incremental

#!/bin/sh

# This script does personal backups to a rsync backup server. You will end up
# with a 7 day rotating incremental backup. The incrementals will go
# into subdirectories named after the day of the week, and the current
# full backup goes into a directory called "current"
# tridge@linuxcare.com

# directory to backup
BDIR=/home/$USER

# excludes file - this contains a wildcard pattern per line of files to exclude
EXCLUDES=$HOME/cron/excludes

# the name of the backup machine
BSERVER=owl

# your password on the backup server
export RSYNC_PASSWORD=XXXXXX

########################################################################

BACKUPDIR=`date +%A`
OPTS="--force --ignore-errors --delete-excluded --exclude-from=$EXCLUDES 
      --delete --backup --backup-dir=/$BACKUPDIR -a"

export PATH=$PATH:/bin:/usr/bin:/usr/local/bin

# the following line clears the last weeks incremental directory
[ -d $HOME/emptydir ] || mkdir $HOME/emptydir
rsync --delete -a $HOME/emptydir/ $BSERVER::$USER/$BACKUPDIR/
rmdir $HOME/emptydir

# now the actual transfer
rsync $OPTS $BDIR $BSERVER::$USER/current

Backup to a spare disk

I do local backups on several of my machines using rsync. I have an extra disk installed that can hold all the contents of the main disk. I then have a nightly cron job that backs up the main disk to the backup. This is the script I use on one of those machines.

#!/bin/sh

export PATH=/usr/local/bin:/usr/bin:/bin

LIST="rootfs usr data data2"

for d in $LIST; do
    mount /backup/$d
    rsync -ax --exclude fstab --delete /$d/ /backup/$d/
    umount /backup/$d
done

DAY=`date "+%A"`
    
rsync -a --delete /usr/local/apache /data2/backups/$DAY
rsync -a --delete /data/solid /data2/backups/$DAY

The first part does the backup on the spare disk. The second part backs up the critical parts to daily directories. I also backup the critical parts using a rsync over ssh to a remote machine.

Mirroring vger CVS tree

The vger.rutgers.edu cvs tree is mirrored onto cvs.samba.org via anonymous rsync using the following script.

#!/bin/bash

cd /var/www/cvs/vger/
PATH=/usr/local/bin:/usr/freeware/bin:/usr/bin:/bin

RUN=`lps x | grep rsync | grep -v grep | wc -l`
if [ "$RUN" -gt 0 ]; then
    echo already running
    exit 1
fi

rsync -az vger.rutgers.edu::cvs/CVSROOT/ChangeLog $HOME/ChangeLog

sum1=`sum $HOME/ChangeLog`
sum2=`sum /var/www/cvs/vger/CVSROOT/ChangeLog`

if [ "$sum1" = "$sum2" ]; then
    echo nothing to do
    exit 0
fi

rsync -az --delete --force vger.rutgers.edu::cvs/ /var/www/cvs/vger/
exit 0

Note in particular the initial rsync of the ChangeLog to determine if anything has changed. This could be omitted but it would mean that the rsyncd on vger would have to build a complete listing of the cvs area at each run. As most of the time nothing will have changed, I wanted to save the time on vger by only doing a full rsync if the ChangeLog has changed. This helped quite a lot because vger is low on memory and generally quite heavily loaded, so doing a listing on such a large tree every hour would have been excessive.

Automated backup at home

The cron job looks like this:

#!/bin/sh
cd ~stine
{
    echo
    date
    dest=~/backup/`date +%A`
    mkdir $dest.new
    find . -xdev -type f \( -mtime 0 -or -mtime 1 \) -exec cp -aPv "{}"
    $dest.new \;
    cnt=`find $dest.new -type f | wc -l`
    if [ $cnt -gt 0 ]; then
        rm -rf $dest
        mv $dest.new $dest
    fi
    rm -rf $dest.new
    rsync -Cavze ssh . samba:backup
} >> ~/backup/backup.log 2>&1

Note that most of this script isn't anything to do with rsync, it just creates a daily backup of Stine's work in a ~stine/backup/ directory so she can retrieve any version from the last week. The last line does the rsync of her directory across the modem link to the host samba. Note that I am using the -C option which allows me to add entries to .cvsignore for stuff that doesn't need to be backed up.

Fancy footwork with remote file lists

One little known feature of rsync is the fact that when run over a remote shell (such as rsh or ssh) you can give any shell command as the remote file list. The shell command is expanded by your remote shell before rsync is called. For example, see if you can work out what this does:

rsync -avR remote:'`find /home -name "*.[ch]"`' /tmp/

note that that is backquotes enclosed by quotes (some browsers don't show that correctly).

Rsync exit values

Note: Obtained via the man page. Capture exit value with `$?`.

0      Success
1      Syntax or usage error
2      Protocol incompatibility
3      Errors selecting input/output files, dirs 
4      Requested action not supported: an attempt was made to manipulate 64-bit 
       files on a platform that cannot support them; or an option was specified
       that is supported by the client and not by the server.
5      Error starting client-server protocol
6      Daemon unable to append to log-file
10     Error in socket I/O
11     Error in file I/O
12     Error in rsync protocol data stream
13     Errors with program diagnostics
14     Error in IPC code 
20     Received SIGUSR1 or SIGINT
21     Some error returned by waitpid()
22     Error allocating core memory buffers
23     Partial transfer due to error
24     Partial transfer due to vanished source files
25     The --max-delete limit stopped deletions
30     Timeout in data send/receive
35     Timeout waiting for daemon connection

For example, if one were using the "--max-delete" option for rsync(1), one could check a call's return value to see whether rsync(1) hit the threshold for deleted file count and write a message to a logfile appropriately:

$ rsync --archive --delete --max-delete=5 source destination
$ if (($? == 25)); then
     printf '%s\n' 'Deletion limit was reached' >"$logfile"
  fi

Variations

rdiff and rdiff-backup

There also exists a utility called rdiff, which uses the rsync algorithm to generate delta files with the difference from file A to file B (like the utility diff, but in a different delta format). The delta file can then be applied to file A, turning it into file B (similar to the patch utility).

Unlike diff, the process of creating a delta file has two steps: first a signature file is created from file A, and then this (relatively small) signature and file B is used to create the delta file. Also unlike diff, rdiff works well with binary files.

Using rdiff, a utility called rdiff-backup has been created, capable of maintaining a backup mirror of a file or directory over the network, on another server. rdiff-backup stores incremental rdiff deltas with the backup, with which it is possible to recreate any backup point.

See Automated Backups With rdiff-backup for example usage.

See also

  • mt
  • ssh
  • Unison — allows bidirectional synchronization
  • Xdelta — alternative implementation of file differencing and delta encoding
  • duplicity — encrypted bandwidth-efficient backup using the rsync algorithm

References

  1. Tridgell A, Paul Mackerras P (1998). "The rsync algorithm". Department of Computer Science, Australian National University, Canberra, ACT 0200, Australia.

External links

Examples