Rsync

From Christoph's Personal Wiki
Revision as of 22:44, 9 January 2014 by Christoph (Talk | contribs) (Examples)

Jump to: navigation, search

rsync is a command line tool which synchronizes files and directories from one location to another while minimizing data transfer using delta encoding when appropriate. An important feature of rsync not found in most similar programs/protocols is that the mirroring takes place with only one transmission in each direction.

rsync can copy or display directory contents and copy files, optionally using compression and recursion.

rsync has the default TCP port of 873.

The rsync algorithm

Abstract: This report presents an algorithm for updating a file on one machine to be identical to a file on another machine. We assume that the two machines are connected by a low-bandwidth high-latency bi-directional communications link. The algorithm identifies parts of the source file which are identical to some part of the destination file, and only sends those parts which cannot be matched in this way. Effectively, the algorithm computes a set of differences without having both files on the same machine. The algorithm works best when the files are similar, but will also function correctly and reasonably efficiently when the files are quite different.[1]

Some features of rsync include:

  • can update whole directory trees and filesystems
  • optionally preserves symbolic links, hard links, file ownership, permissions, devices and times
  • requires no special privileges to install
  • internal pipelining reduces latency for multiple files
  • can use rsh, ssh, or direct sockets as the transport
  • supports anonymous rsync which is ideal for mirroring

Usage

Examples

Let us say you wish to rsync your home directory (e.g. /home/bob) with a backup directory/disk (e.g. /backup). The following command will accomplish this:

$ rsync -avz --delete /home/bob/ /backup/

where a means archive, v means do it verbosely, z means compress the data, and delete means to delete the backup file if the original file (i.e. in /home/bob) has been deleted since the last rsync.

The same can be done where the backup disk is on a remote machine via ssh. See here for more information.

  • Copy files to/from computers in your local area network

Consider a case where you have two computers plugged into your home router. To copy files/directories between the two, first find out their local IPv4 addresses (e.g., configured to use eth0) using `ifconfig` or `ip a`. The following command will copy files between the two machines (make sure your firewall rules allow connections via port 22 for SSH):

$ rsync -e 'ssh -p 22' -avl --stats --progress /home/bob/source bob@192.168.0.2:/home/bob/destination
  • Copy multiple files at the same time

Consider a case where you have multiple image files (e.g., foo-1.jpg, foo-2.jpg, and foo-3.jpg) and you wish to copy them from a source host to a destination host. You can you standard globing like so:

$ rsync -e 'ssh -p 22' -avl --stats --progress /home/source/foo-{1..3}.jpg bob@192.168.0.2:/home/destination

rsyncd.conf

The rsyncd.conf file is the runtime configuration file for rsync when run as an rsync daemon. The rsyncd.conf file controls authentication, access, logging, and available modules. See man rsyncd.conf for more information.

  • Sample rsyncd.conf file:
motd file = /etc/rsyncd.motd
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid
lock file = /var/run/rsync.lock

[simple_path_name]
   path = /rsync_files_here
   comment = My Very Own Rsync Server
   uid = nobody
   gid = nobody
   read only = no
   list = yes
   auth users = username
   secrets file = /etc/rsyncd.scrt

Examples (extended)

Note: The following were taken directly from the rsync website (with some modifications).

Backup to a central backup server with 7 day incremental

#!/bin/sh

# This script does personal backups to a rsync backup server. You will end up
# with a 7 day rotating incremental backup. The incrementals will go
# into subdirectories named after the day of the week, and the current
# full backup goes into a directory called "current"
# tridge@linuxcare.com

# directory to backup
BDIR=/home/$USER

# excludes file - this contains a wildcard pattern per line of files to exclude
EXCLUDES=$HOME/cron/excludes

# the name of the backup machine
BSERVER=owl

# your password on the backup server
export RSYNC_PASSWORD=XXXXXX

########################################################################

BACKUPDIR=`date +%A`
OPTS="--force --ignore-errors --delete-excluded --exclude-from=$EXCLUDES 
      --delete --backup --backup-dir=/$BACKUPDIR -a"

export PATH=$PATH:/bin:/usr/bin:/usr/local/bin

# the following line clears the last weeks incremental directory
[ -d $HOME/emptydir ] || mkdir $HOME/emptydir
rsync --delete -a $HOME/emptydir/ $BSERVER::$USER/$BACKUPDIR/
rmdir $HOME/emptydir

# now the actual transfer
rsync $OPTS $BDIR $BSERVER::$USER/current

Backup to a spare disk

I do local backups on several of my machines using rsync. I have an extra disk installed that can hold all the contents of the main disk. I then have a nightly cron job that backs up the main disk to the backup. This is the script I use on one of those machines.

#!/bin/sh

export PATH=/usr/local/bin:/usr/bin:/bin

LIST="rootfs usr data data2"

for d in $LIST; do
    mount /backup/$d
    rsync -ax --exclude fstab --delete /$d/ /backup/$d/
    umount /backup/$d
done

DAY=`date "+%A"`
    
rsync -a --delete /usr/local/apache /data2/backups/$DAY
rsync -a --delete /data/solid /data2/backups/$DAY

The first part does the backup on the spare disk. The second part backs up the critical parts to daily directories. I also backup the critical parts using a rsync over ssh to a remote machine.

Mirroring vger CVS tree

The vger.rutgers.edu cvs tree is mirrored onto cvs.samba.org via anonymous rsync using the following script.

#!/bin/bash

cd /var/www/cvs/vger/
PATH=/usr/local/bin:/usr/freeware/bin:/usr/bin:/bin

RUN=`lps x | grep rsync | grep -v grep | wc -l`
if [ "$RUN" -gt 0 ]; then
    echo already running
    exit 1
fi

rsync -az vger.rutgers.edu::cvs/CVSROOT/ChangeLog $HOME/ChangeLog

sum1=`sum $HOME/ChangeLog`
sum2=`sum /var/www/cvs/vger/CVSROOT/ChangeLog`

if [ "$sum1" = "$sum2" ]; then
    echo nothing to do
    exit 0
fi

rsync -az --delete --force vger.rutgers.edu::cvs/ /var/www/cvs/vger/
exit 0

Note in particular the initial rsync of the ChangeLog to determine if anything has changed. This could be omitted but it would mean that the rsyncd on vger would have to build a complete listing of the cvs area at each run. As most of the time nothing will have changed I wanted to save the time on vger by only doing a full rsync if the ChangeLog has changed. This helped quite a lot because vger is low on memory and generally quite heavily loaded, so doing a listing on such a large tree every hour would have been excessive.

Automated backup at home

The cron job looks like this:

#!/bin/sh
cd ~stine
{
    echo
    date
    dest=~/backup/`date +%A`
    mkdir $dest.new
    find . -xdev -type f \( -mtime 0 -or -mtime 1 \) -exec cp -aPv "{}"
    $dest.new \;
    cnt=`find $dest.new -type f | wc -l`
    if [ $cnt -gt 0 ]; then
        rm -rf $dest
        mv $dest.new $dest
    fi
    rm -rf $dest.new
    rsync -Cavze ssh . samba:backup
} >> ~/backup/backup.log 2>&1

Note that most of this script isn't anything to do with rsync, it just creates a daily backup of Stine's work in a ~stine/backup/ directory so she can retrieve any version from the last week. The last line does the rsync of her directory across the modem link to the host samba. Note that I am using the -C option which allows me to add entries to .cvsignore for stuff that doesn't need to be backed up.

Fancy footwork with remote file lists

One little known feature of rsync is the fact that when run over a remote shell (such as rsh or ssh) you can give any shell command as the remote file list. The shell command is expanded by your remote shell before rsync is called. For example, see if you can work out what this does:

rsync -avR remote:'`find /home -name "*.[ch]"`' /tmp/

note that that is backquotes enclosed by quotes (some browsers don't show that correctly).

Variations

rdiff and rdiff-backup

There also exists a utility called rdiff, which uses the rsync algorithm to generate delta files with the difference from file A to file B (like the utility diff, but in a different delta format). The delta file can then be applied to file A, turning it into file B (similar to the patch utility).

Unlike diff, the process of creating a delta file has two steps: first a signature file is created from file A, and then this (relatively small) signature and file B is used to create the delta file. Also unlike diff, rdiff works well with binary files.

Using rdiff, a utility called rdiff-backup has been created, capable of maintaining a backup mirror of a file or directory over the network, on another server. rdiff-backup stores incremental rdiff deltas with the backup, with which it is possible to recreate any backup point.

See Automated Backups With rdiff-backup for example usage.

See also

  • mt
  • ssh
  • Unison — allows bidirectional synchronization
  • Xdelta — alternative implementation of file differencing and delta encoding
  • duplicity — encrypted bandwidth-efficient backup using the rsync algorithm

References

  1. Tridgell A, Paul Mackerras P (1998). "The rsync algorithm". Department of Computer Science, Australian National University, Canberra, ACT 0200, Australia.

External links

Examples