Difference between revisions of "Rsync"

From Christoph's Personal Wiki
Jump to: navigation, search
m (Rsync (command) moved to Rsync)
Line 4: Line 4:
  
 
<tt>rsync</tt> has the default [[Transmission Control Protocol|TCP]] port of 873.
 
<tt>rsync</tt> has the default [[Transmission Control Protocol|TCP]] port of 873.
 +
 +
==The rsync algorithm==
 +
<div style="padding: 1em; margin: 10px; border: 2px dotted #18e;">
 +
Abstract: This report presents an algorithm for updating a file on one machine to be identical to a file on another machine. We assume that the two machines are connected by a low-bandwidth high-latency bi-directional communications link. The algorithm identifies parts of the source file which are identical to some part of the destination file, and only sends those parts which cannot be matched in this way. Effectively, the algorithm computes a set of differences without having both files on the same machine. The algorithm works best when the files are similar, but will also function correctly and reasonably efficiently when the files are quite different.<ref name="Tridgell1998">Tridgell A, Paul Mackerras P (1998). "[http://rsync.samba.org/tech_report/tech_report.html The rsync algorithm]". ''Department of Computer Science, Australian National University, Canberra, ACT 0200, Australia''.</ref>
 +
</div>
  
 
==Usage==
 
==Usage==
Line 12: Line 17:
  
 
The same can be done where the backup disk is on a remote machine via [[ssh]]. See [http://www.howtoforge.com/rsync_incremental_snapshot_backups here] for more information.
 
The same can be done where the backup disk is on a remote machine via [[ssh]]. See [http://www.howtoforge.com/rsync_incremental_snapshot_backups here] for more information.
 +
 +
==Examples (extended)==
 +
''Note: The following were taken directly from the [http://rsync.samba.org/examples.html rsync website] (with some modifications).''
 +
 +
===Backup to a central backup server with 7 day incremental===
 +
<pre>
 +
#!/bin/sh
 +
 +
# This script does personal backups to a rsync backup server. You will end up
 +
# with a 7 day rotating incremental backup. The incrementals will go
 +
# into subdirectories named after the day of the week, and the current
 +
# full backup goes into a directory called "current"
 +
# tridge@linuxcare.com
 +
 +
# directory to backup
 +
BDIR=/home/$USER
 +
 +
# excludes file - this contains a wildcard pattern per line of files to exclude
 +
EXCLUDES=$HOME/cron/excludes
 +
 +
# the name of the backup machine
 +
BSERVER=owl
 +
 +
# your password on the backup server
 +
export RSYNC_PASSWORD=XXXXXX
 +
 +
########################################################################
 +
 +
BACKUPDIR=`date +%A`
 +
OPTS="--force --ignore-errors --delete-excluded --exclude-from=$EXCLUDES
 +
      --delete --backup --backup-dir=/$BACKUPDIR -a"
 +
 +
export PATH=$PATH:/bin:/usr/bin:/usr/local/bin
 +
 +
# the following line clears the last weeks incremental directory
 +
[ -d $HOME/emptydir ] || mkdir $HOME/emptydir
 +
rsync --delete -a $HOME/emptydir/ $BSERVER::$USER/$BACKUPDIR/
 +
rmdir $HOME/emptydir
 +
 +
# now the actual transfer
 +
rsync $OPTS $BDIR $BSERVER::$USER/current
 +
</pre>
 +
 +
===Backup to a spare disk===
 +
I do local backups on several of my machines using rsync. I have an extra disk installed that can hold all the contents of the main
 +
disk. I then have a nightly cron job that backs up the main disk to the backup. This is the script I use on one of those machines.
 +
<pre>
 +
#!/bin/sh
 +
 +
export PATH=/usr/local/bin:/usr/bin:/bin
 +
 +
LIST="rootfs usr data data2"
 +
 +
for d in $LIST; do
 +
    mount /backup/$d
 +
    rsync -ax --exclude fstab --delete /$d/ /backup/$d/
 +
    umount /backup/$d
 +
done
 +
 +
DAY=`date "+%A"`
 +
   
 +
rsync -a --delete /usr/local/apache /data2/backups/$DAY
 +
rsync -a --delete /data/solid /data2/backups/$DAY
 +
</pre>
 +
 
 +
The first part does the backup on the spare disk. The second part backs up the critical parts to daily directories. I also backup the critical parts using a rsync over ssh to a remote machine.
 +
 +
===Mirroring vger CVS tree===
 +
The vger.rutgers.edu cvs tree is mirrored onto cvs.samba.org via anonymous rsync using the following script.
 +
<pre>
 +
#!/bin/bash
 +
 +
cd /var/www/cvs/vger/
 +
PATH=/usr/local/bin:/usr/freeware/bin:/usr/bin:/bin
 +
 +
RUN=`lps x | grep rsync | grep -v grep | wc -l`
 +
if [ "$RUN" -gt 0 ]; then
 +
    echo already running
 +
    exit 1
 +
fi
 +
 +
rsync -az vger.rutgers.edu::cvs/CVSROOT/ChangeLog $HOME/ChangeLog
 +
 +
sum1=`sum $HOME/ChangeLog`
 +
sum2=`sum /var/www/cvs/vger/CVSROOT/ChangeLog`
 +
 +
if [ "$sum1" = "$sum2" ]; then
 +
    echo nothing to do
 +
    exit 0
 +
fi
 +
 +
rsync -az --delete --force vger.rutgers.edu::cvs/ /var/www/cvs/vger/
 +
exit 0
 +
</pre>
 +
 +
Note in particular the initial rsync of the ChangeLog to determine if anything has changed. This could be omitted but it would mean that the rsyncd on vger would have to build a complete listing of the cvs area at each run. As most of the time nothing will have changed I wanted to save the time on vger by only doing a full rsync if the ChangeLog has changed. This helped quite a lot because vger is low on memory and generally quite heavily loaded, so doing a listing on such a large tree every hour would have been excessive.
 +
 +
===Automated backup at home===
 +
The cron job looks like this:
 +
<pre>
 +
#!/bin/sh
 +
cd ~stine
 +
{
 +
    echo
 +
    date
 +
    dest=~/backup/`date +%A`
 +
    mkdir $dest.new
 +
    find . -xdev -type f \( -mtime 0 -or -mtime 1 \) -exec cp -aPv "{}"
 +
    $dest.new \;
 +
    cnt=`find $dest.new -type f | wc -l`
 +
    if [ $cnt -gt 0 ]; then
 +
        rm -rf $dest
 +
        mv $dest.new $dest
 +
    fi
 +
    rm -rf $dest.new
 +
    rsync -Cavze ssh . samba:backup
 +
} >> ~/backup/backup.log 2>&1
 +
</pre>
 +
 +
Note that most of this script isn't anything to do with rsync, it just creates a daily backup of Stine's work in a ~stine/backup/ directory so she can retrieve any version from the last week. The last line does the rsync of her directory across the modem link to the host samba. Note that I am using the -C option which allows me to add entries to .cvsignore for stuff that doesn't need to be backed up.
 +
 +
===Fancy footwork with remote file lists===
 +
One little known feature of rsync is the fact that when run over a remote shell (such as rsh or ssh) you can give any shell command as the remote file list. The shell command is expanded by your remote shell before rsync is called. For example, see if you can work out what this does:
 +
rsync -avR remote:'`find /home -name "*.[ch]"`' /tmp/
 +
 +
note that that is backquotes enclosed by quotes (some browsers don't show that correctly).
  
 
==Variations==
 
==Variations==
Line 23: Line 154:
 
See [http://www.howtoforge.com/linux_rdiff_backup Automated Backups With rdiff-backup] for example usage.
 
See [http://www.howtoforge.com/linux_rdiff_backup Automated Backups With rdiff-backup] for example usage.
  
== See also ==
+
==See also==
* [[Mt (command)|mt]]
+
*[[Mt (command)|mt]]
* [[Secure Shell|ssh]]
+
*[[Secure Shell|ssh]]
 +
*[http://www.cis.upenn.edu/~bcpierce/unison/ Unison] &mdash; allows bidirectional synchronization
 +
*[http://xdelta.org/ Xdelta] &mdash; alternative implementation of file differencing and delta encoding
  
== External links ==
+
==References==
* [http://en.wikipedia.org/wiki/Rsync Wikipedia article on '''rsync''']
+
<references/>
* [http://rsync.samba.org rsync homepage]
+
==External links==
* [http://everythinglinux.org/rsync/ A useful tutorial]
+
*[http://rsync.samba.org rsync homepage]
* [http://rsync.samba.org/tech_report/node2.html rsync algorithm]
+
*[http://everythinglinux.org/rsync/ Tutorial: Using rsync]
* [http://xdelta.org/ Xdelta] – alternative implementation of file differencing and delta encoding
+
*[http://www.howtoforge.com/mirroring_with_rsync Tutorial: Mirroring with rsync]
* [http://www.cis.upenn.edu/~bcpierce/unison/ Unison, allows bidirectional synchronisation]
+
*[http://www.linux.com/article.pl?sid=04/09/15/1931240 Tutorial: Backing up files with rsync]
 +
*[http://www.mikerubel.org/computers/rsync_snapshots/ Easy Automated Snapshot-Style Backups with Linux and Rsync]
 +
*[http://rsync.samba.org/tech_report/node2.html rsync algorithm]
 +
*[http://archive.macosxlabs.org/rsyncx/rsyncx.html RsyncX] - Frontend for rsync under [[Mac OS X]]
 +
*[[wikipedia:rsync]]
  
 
[[Category:Linux Command Line Tools]]
 
[[Category:Linux Command Line Tools]]

Revision as of 22:11, 26 April 2007

rsync is a command line tool which synchronizes files and directories from one location to another while minimizing data transfer using delta encoding when appropriate. An important feature of rsync not found in most similar programs/protocols is that the mirroring takes place with only one transmission in each direction.

rsync can copy or display directory contents and copy files, optionally using compression and recursion.

rsync has the default TCP port of 873.

The rsync algorithm

Abstract: This report presents an algorithm for updating a file on one machine to be identical to a file on another machine. We assume that the two machines are connected by a low-bandwidth high-latency bi-directional communications link. The algorithm identifies parts of the source file which are identical to some part of the destination file, and only sends those parts which cannot be matched in this way. Effectively, the algorithm computes a set of differences without having both files on the same machine. The algorithm works best when the files are similar, but will also function correctly and reasonably efficiently when the files are quite different.[1]

Usage

Examples

Let's say you wish to rsync your home directory (e.g. /home/bob) with a backup directory/disk (e.g. /backup). The following command will accomplish this:

rsync -avz --delete /home/bob/ /backup/

where a means archive, v mean do it verbosely, z mean compress the data, and delete means to delete the backup file if the original file (i.e. in /home/bob) has been deleted since the last rsync.

The same can be done where the backup disk is on a remote machine via ssh. See here for more information.

Examples (extended)

Note: The following were taken directly from the rsync website (with some modifications).

Backup to a central backup server with 7 day incremental

#!/bin/sh

# This script does personal backups to a rsync backup server. You will end up
# with a 7 day rotating incremental backup. The incrementals will go
# into subdirectories named after the day of the week, and the current
# full backup goes into a directory called "current"
# tridge@linuxcare.com

# directory to backup
BDIR=/home/$USER

# excludes file - this contains a wildcard pattern per line of files to exclude
EXCLUDES=$HOME/cron/excludes

# the name of the backup machine
BSERVER=owl

# your password on the backup server
export RSYNC_PASSWORD=XXXXXX

########################################################################

BACKUPDIR=`date +%A`
OPTS="--force --ignore-errors --delete-excluded --exclude-from=$EXCLUDES 
      --delete --backup --backup-dir=/$BACKUPDIR -a"

export PATH=$PATH:/bin:/usr/bin:/usr/local/bin

# the following line clears the last weeks incremental directory
[ -d $HOME/emptydir ] || mkdir $HOME/emptydir
rsync --delete -a $HOME/emptydir/ $BSERVER::$USER/$BACKUPDIR/
rmdir $HOME/emptydir

# now the actual transfer
rsync $OPTS $BDIR $BSERVER::$USER/current

Backup to a spare disk

I do local backups on several of my machines using rsync. I have an extra disk installed that can hold all the contents of the main disk. I then have a nightly cron job that backs up the main disk to the backup. This is the script I use on one of those machines.

#!/bin/sh

export PATH=/usr/local/bin:/usr/bin:/bin

LIST="rootfs usr data data2"

for d in $LIST; do
    mount /backup/$d
    rsync -ax --exclude fstab --delete /$d/ /backup/$d/
    umount /backup/$d
done

DAY=`date "+%A"`
    
rsync -a --delete /usr/local/apache /data2/backups/$DAY
rsync -a --delete /data/solid /data2/backups/$DAY

The first part does the backup on the spare disk. The second part backs up the critical parts to daily directories. I also backup the critical parts using a rsync over ssh to a remote machine.

Mirroring vger CVS tree

The vger.rutgers.edu cvs tree is mirrored onto cvs.samba.org via anonymous rsync using the following script.

#!/bin/bash

cd /var/www/cvs/vger/
PATH=/usr/local/bin:/usr/freeware/bin:/usr/bin:/bin

RUN=`lps x | grep rsync | grep -v grep | wc -l`
if [ "$RUN" -gt 0 ]; then
    echo already running
    exit 1
fi

rsync -az vger.rutgers.edu::cvs/CVSROOT/ChangeLog $HOME/ChangeLog

sum1=`sum $HOME/ChangeLog`
sum2=`sum /var/www/cvs/vger/CVSROOT/ChangeLog`

if [ "$sum1" = "$sum2" ]; then
    echo nothing to do
    exit 0
fi

rsync -az --delete --force vger.rutgers.edu::cvs/ /var/www/cvs/vger/
exit 0

Note in particular the initial rsync of the ChangeLog to determine if anything has changed. This could be omitted but it would mean that the rsyncd on vger would have to build a complete listing of the cvs area at each run. As most of the time nothing will have changed I wanted to save the time on vger by only doing a full rsync if the ChangeLog has changed. This helped quite a lot because vger is low on memory and generally quite heavily loaded, so doing a listing on such a large tree every hour would have been excessive.

Automated backup at home

The cron job looks like this:

#!/bin/sh
cd ~stine
{
    echo
    date
    dest=~/backup/`date +%A`
    mkdir $dest.new
    find . -xdev -type f \( -mtime 0 -or -mtime 1 \) -exec cp -aPv "{}"
    $dest.new \;
    cnt=`find $dest.new -type f | wc -l`
    if [ $cnt -gt 0 ]; then
        rm -rf $dest
        mv $dest.new $dest
    fi
    rm -rf $dest.new
    rsync -Cavze ssh . samba:backup
} >> ~/backup/backup.log 2>&1

Note that most of this script isn't anything to do with rsync, it just creates a daily backup of Stine's work in a ~stine/backup/ directory so she can retrieve any version from the last week. The last line does the rsync of her directory across the modem link to the host samba. Note that I am using the -C option which allows me to add entries to .cvsignore for stuff that doesn't need to be backed up.

Fancy footwork with remote file lists

One little known feature of rsync is the fact that when run over a remote shell (such as rsh or ssh) you can give any shell command as the remote file list. The shell command is expanded by your remote shell before rsync is called. For example, see if you can work out what this does:

rsync -avR remote:'`find /home -name "*.[ch]"`' /tmp/

note that that is backquotes enclosed by quotes (some browsers don't show that correctly).

Variations

rdiff and rdiff-backup

There also exists a utility called rdiff, which uses the rsync algorithm to generate delta files with the difference from file A to file B (like the utility diff, but in a different delta format). The delta file can then be applied to file A, turning it into file B (similar to the patch utility).

Unlike diff, the process of creating a delta file has two steps: first a signature file is created from file A, and then this (relatively small) signature and file B is used to create the delta file. Also unlike diff, rdiff works well with binary files.

Using rdiff, a utility called rdiff-backup has been created, capable of maintaining a backup mirror of a file or directory over the network, on another server. rdiff-backup stores incremental rdiff deltas with the backup, with which it is possible to recreate any backup point.

See Automated Backups With rdiff-backup for example usage.

See also

  • mt
  • ssh
  • Unison — allows bidirectional synchronization
  • Xdelta — alternative implementation of file differencing and delta encoding

References

  1. Tridgell A, Paul Mackerras P (1998). "The rsync algorithm". Department of Computer Science, Australian National University, Canberra, ACT 0200, Australia.

External links