Fdupes

From Christoph's Personal Wiki
Jump to: navigation, search
The correct title of this article is fdupes. The initial letter is capitalized due to technical restrictions.

fdupes is a (command line tool) used to find duplicate files in a given set of directories in linux.

It searches the given path for duplicate files. Such files are found by comparing file sizes and MD5 signatures, followed by a byte-by-byte comparison. That is, it ignores filenames and locations; if the contents of the file are identical, it will find these files to be identical. It was written by Adrian Lopez.

Options

-r --recurse 
include files residing in subdirectories
-s --symlinks 
follow symlinked directories
-H --hardlinks 
normally, when two or more files point to the same disk area they are treated as non-duplicates; this option will change this behaviour
-n --noempty 
exclude zero-length files from consideration
-f --omitfirst 
omit the first file in each set of matches
-1 --sameline 
list each set of matches on a single line
-S --size 
show size of duplicate files
-q --quiet 
hide progress indicator
-d --delete 
prompt user for files to preserve, deleting all others (see CAVEATS below)
-v --version 
display fdupes version
-h --help 
displays help

Notes

Unless -1 or --sameline is specified, duplicate files are listed together in groups, each file displayed on a separate line. The groups are then separated from each other by blank lines.

When -1 or --sameline is specified, spaces and backslash characters (\) appearing in a filename are preceded by a backslash character.

Caveats

If fdupes returns with an error message such as fdupes: error invoking md5sum it means the program has been compiled to use an external program to calculate MD5 signatures (otherwise, fdupes uses interal routines for this purpose), and an error has occurred while attempting to execute it. If this is the case, the specified program should be properly installed prior to running fdupes.

When using -d or --delete, care should be taken to insure against accidental data loss.

When used together with options -s or --symlink, a user could accidentally preserve a symlink while deleting the file it points to.

Furthermore, when specifying a particular directory more than once, all files within that directory will be listed as their own duplicates, leading to data loss should a user preserve a file without its "duplicate" (the file itself!).

See also

External links