Difference between revisions of "Findutils"

From Christoph's Personal Wiki
Jump to: navigation, search
(Misc)
 
(26 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
* <tt>find</tt> - search for files in a directory hierarchy
 
* <tt>find</tt> - search for files in a directory hierarchy
 
* <tt>locate</tt> - list files in databases that match a pattern
 
* <tt>locate</tt> - list files in databases that match a pattern
 +
** [http://slocate.trakker.ca/ slocate] (secure locate; [http://www.linuxmanpages.com/man1/slocate.1.php man page])
 +
** [http://carolina.mff.cuni.cz/~trmac/blog/mlocate/ mlocate] - faster updates
 
* <tt>updatedb</tt> - update a file name database
 
* <tt>updatedb</tt> - update a file name database
* <tt>xargs</tt> - build and execute command lines from standard input  
+
* <tt>xargs</tt> - build and execute command lines from standard input
  
 
==Examples==
 
==Examples==
''Note: Adapted from the GNU Project homepage.''
+
''Note: Adapted from the GNU Project homepage with extra additions by me.''
  
 
* Here is an example operation to make all HTML files in the subdirectory <code>htdocs</code> readable by all using <tt>find</tt> and <tt>xargs</tt>. This is a typical example of how find and xargs are used with other utilities to provide powerful directory traversal capability:
 
* Here is an example operation to make all HTML files in the subdirectory <code>htdocs</code> readable by all using <tt>find</tt> and <tt>xargs</tt>. This is a typical example of how find and xargs are used with other utilities to provide powerful directory traversal capability:
 
  find htdocs -name '*.html' -print0 | xargs -0 chmod a+r  
 
  find htdocs -name '*.html' -print0 | xargs -0 chmod a+r  
 +
 +
*Find all files in a given path a print them to the STDOUT (one file/line):
 +
find . -name \*.htm -printf %f\\n
  
 
* find all files in a given path containing the word "<code>EXAMPLE</code>" in them (time the difference):
 
* find all files in a given path containing the word "<code>EXAMPLE</code>" in them (time the difference):
 
  time find /home/foo/bar -type f -exec grep EXAMPLE {} \; -print
 
  time find /home/foo/bar -type f -exec grep EXAMPLE {} \; -print
 
  time find /home/foo/bar -type f|xargs grep EXAMPLE  # <- faster
 
  time find /home/foo/bar -type f|xargs grep EXAMPLE  # <- faster
 +
 +
* find all <code>foo.pdb</code> files in a given path and execute a command ''within'' the directory they are found:
 +
find /path/to/files/ -name foo.pdb -execdir some_command \;
  
 
* find (list) all files in a given path having "<code>Fo*</code>" in their filename:
 
* find (list) all files in a given path having "<code>Fo*</code>" in their filename:
Line 23: Line 31:
 
* search for files modified within the last day that match a certain filenaming pattern and remove them (i.e. execute <tt>rm</tt>):
 
* search for files modified within the last day that match a certain filenaming pattern and remove them (i.e. execute <tt>rm</tt>):
 
  find . -name "[Tt][Ee][Ss][Tt]*" -mtime -1 -exec rm {} \;  # CAREFUL!!
 
  find . -name "[Tt][Ee][Ss][Tt]*" -mtime -1 -exec rm {} \;  # CAREFUL!!
 +
 +
* find all PNG files and place them into a new tarball:
 +
find . -name '*.png' | tar -c --files-from=- | bzip2 > foo.tar.bz2
 +
 +
*Find by inode number:
 +
% ls -il /bin/gzip
 +
  971575 -rwxr-xr-x 3 root root 58712 2006-11-25 13:57 /bin/gzip
 +
% find / -inum 971575 -xdev -ls 2>/dev/null
 +
  971575  64 -rwxr-xr-x  3 root    root        58712 Nov 25  2006 /bin/zcat
 +
  971575  64 -rwxr-xr-x  3 root    root        58712 Nov 25  2006 /bin/gzip
 +
  971575  64 -rwxr-xr-x  3 root    root        58712 Nov 25  2006 /bin/gunzip
 +
The last results reveal that the three files, <code>/bin/gzip</code>, <code>/bin/gunzip</code>, and <code>/bin/zcat</code> are three different filenames, pointing to the ''same'' actual file on disk, <code>inode</code> number <code>971575</code> (i.e., <code>/bin/gzip</code>).
  
 
===Bulk image resize===
 
===Bulk image resize===
Line 37: Line 57:
 
Note that the programme <tt>convert</tt> is part of the [[ImageMagick]] suite and you will need to have it installed to use the above commands (it is, by default, in [[SuSE]] Linux).
 
Note that the programme <tt>convert</tt> is part of the [[ImageMagick]] suite and you will need to have it installed to use the above commands (it is, by default, in [[SuSE]] Linux).
  
=== Tracking down large files ===
+
===Tracking down large files===
 
Sometimes it is necessary to find files over a certain size and it can be somewhat tedious <tt>ls</tt>-ing through your many directories. The following command will list only those files over a certain size and only within the specified directory (and sub-directories):
 
Sometimes it is necessary to find files over a certain size and it can be somewhat tedious <tt>ls</tt>-ing through your many directories. The following command will list only those files over a certain size and only within the specified directory (and sub-directories):
  
Line 44: Line 64:
 
which will only list files over 2000 kb (2 MB).
 
which will only list files over 2000 kb (2 MB).
  
=== Finding files containing a string in a directory hierarchy ===
+
===Finding files containing a string in a directory hierarchy===
In this example, all <code>.php</code> files will be searched for the string "<code>MySQL</code>" (case-insensitive with <tt>-i</tt>) and the line numbers will also be returned (using <code>-n</code>):
+
In this example, all <code>.php</code> files will be searched for the string "<code>MySQL</code>" (case-insensitive with <code>-i</code>) and the line numbers will also be returned (using <code>-n</code>):
 +
find . -name '*.php' -type f | xargs grep -n -i 'MySQL'
 +
 
 +
===Permissions on (sub)directories and files===
 +
*Find files with a given permission
 +
find /path/ -type f -perm 755
 +
*Change the permissions of all sub-directories:
 +
find /path/ -type d -exec chmod 755 {} \;
 +
#~OR~
 +
find /path/ -type d -exec chmod u=rwx,go=rx {} \;
 +
*Change the permissions of all files in sub-directories:
 +
find /path/ -type f -name "*.txt" -exec chmod 644 {} \;
 +
#~OR~
 +
find /path/ -type f -exec chmod u=rw,go=r {} \;
 +
 
 +
where,
 +
-nouser: shows output that's not associated with an existing userid
 +
-nogroup: shows output not associated with an existing groupid
 +
-links n: file has n links
 +
-newer file: file was modified more recently than file.
 +
-perm mode: file has mode permissions.
 +
-type c
 +
        File is of type c:
 +
        b  block (buffered) special
 +
        c  character (unbuffered) special
 +
        d  directory
 +
        p  named pipe (FIFO)
 +
        f  regular file
 +
        l  symbolic link; this is never true if the -L option or the -follow option is
 +
          in effect, unless the symbolic link is broken. If you want to search for
 +
          symbolic links when -L is in effect, use -xtype.
 +
        s  socket
 +
        D  door (Solaris)
 +
 
 +
===Log rotate===
 +
If you do not want run-away logging to fill up your <code>/var</code> partition, you can archive ([[tar]]) old logs (e.g., older than one day):
 +
find /var/log/ -name "*.log" -mtime +1 -exec bzip2 -z '{}' \;
 +
and then delete old tars (e.g., older than 30 days):
 +
find /var/log -name "*.bz2" -mtime +30 -exec rm '{}' \;
 +
 
 +
===Misc===
 +
$ find . -name "rc.conf" -print
 +
$ find . -name "rc.conf" -exec chmod o+r '{}' \;
 +
$ find /usr/src -not \( -name "*,v" -o -name ".*,v" \) '{}' \; -print
 +
$ find . -exec grep "bob and alice" '{}' \; -print
 +
$ find . -exec grep -q "bob and alice" '{}' \; -print
 +
$ find /path/ \( -name "foo*" -or -name "bar*" \) -type f -ls
 +
 
 +
* Find all files having a specific number of characters in their files names or more (e.g., 64+ characters):
 +
$ find /path/you/wish/to/search -regextype posix-extended -regex '\./[^/]{64,}'
 +
 
 +
* Find all files/directories at any depth in a given path that have spaces and replace them with underscores:
 +
$ find /foo/bar/ -depth -name "* *" -execdir rename 's/ /_/g' "{}" \;
 +
#~OR~
 +
$ find . -depth -name '* *' \
 +
    | while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; done
 +
#~OR~
 +
$ for f in *\ *; do mv "$f" "${f// /_}"; done  # not recursive, but fast
 +
 
 +
* Find all files with ampersands ("&") in their filenames and replace with "and":
 +
$ find . -type f -name "*&*" -execdir rename 's/\&/and/g' "{}" \;
 +
 
 +
* Find all files with strange (i.e., non-standard Linux) filenames and rename them:
 +
<pre>
 +
# Update: if this doesn't work, use read -d '' instead
 +
find . -print0 | while IFS= read -d '$\000' f ;
 +
do
 +
  orig_f="$f"
 +
  # Below is pure bash. You can replace with tr if you like
 +
  # f="$( echo $f | tr -d ,\' | tr "$'&'@- " "ya__" )"
 +
  f="${f// /_}"  # Replace spaces with _
 +
  f="${f//\'}"  # Remove single quote
 +
  f="${f//-/_}"  # Replace - with _
 +
  f="${f//,}"    # Remove commas
 +
  f="${f//&/y}"  # Replace ampersand with y
 +
  f="${f//@/a}"  # Replace at sign with a
 +
  f=$( iconv -f UTF8 -t ASCII//TRANSLIT <<< "$f" )
 +
  new_dir="$(dirname $f)"
 +
  new_f="$(basename $f)"
 +
  mkdir -p "$new_dir"
 +
  mv -i "$orig_f" "$new_dir/$new_f"
 +
done
 +
</pre>
 +
 
 +
* Find files that do ''not'' contain a given extension/suffix (e.g., ".py"):
 +
 
 +
$ find . -type f -not -name '*.py'
 +
$ find . -type f -not -name '*.py' -not -name '*.pyc'
 +
$ find . -type f ! \( -name '*.py' -o -name '*.pyc' \)
 +
 
 +
* Find all files in a given directory that do not contain extensions/suffixes:
 +
 
 +
$ ls -1 !(*.*)  # <- does not work with subdirectories
 +
$ find . -type f ! -name "*.*"
 +
 
 +
* Find all files of a given extension/suffix and return just the basename/prefix (i.e., without the full path or suffix. E.g., "/path/to/file.txt" => "file"):
 +
 
 +
$ find /path -type f -name '*.txt' -exec basename {} .txt \;
 +
#~OR~
 +
$ find /path -type f -name '*.txt' -exec basename -s .txt {} \;
 +
 
 +
* Find executable files without extensions:
 +
$ find $HOME/bin/ -type f ! -name "*.*" -perm -og+rx
 +
 
 +
* Remove all zero size files from current directory (not recursive):
 +
$ find . -maxdepth 1 -size 0c -delete
 +
#~OR~
 +
$ find . -maxdepth 1 -empty -delete
 +
 
 +
===Using find to fix corrupted timestamps===
 +
I had an <code>ext4</code> partition (<code>/dev/sda7</code>) get corrupted by a dead motherboard battery. This cause a forced <code>fsck.ext4 -a -C0 /dev/sda7</code> on this partition at reboot.
 +
 
 +
Also note that:
 +
dmesg |grep -i battery
 +
yielded
 +
[  24.226281] ACPI: Deprecated procfs I/F for battery is loaded, please retry with CONFIG_ACPI_PROCFS_POWER cleared
 +
[  24.226294] ACPI: Battery Slot [C198] (battery absent)
 +
 
 +
Unfortunately, all of this caused _some_ of the files on my <code>/dev/sda7</code> to have a timestamp way in the future (the year 2037). I was able to use <code>fsck.ext4</code> (without the <code>-a</code> option) and then <code>debugfs</code>, etc. to fix/clean this partition and mount it (as <code>/home</code>), etc. However, the timestamps were still wrong (i.e., still set at the year 2037).
 +
 
 +
I found the following commands to do the trick nicely:
 +
touch --date "2020-01-01" /tmp/foo
 +
find /home -newer /tmp/foo -exec touch {} \;
 +
The two commands do the following:
 +
#Create a temporary file with a timestamp in the future, but not all the way into the year 2037; and
 +
#Find all files that are newer than this temporary file and re-timestamp them to the present.
  
<pre>find . -name '*.php' -type f | xargs grep -n -i 'MySQL'</pre>
+
==See also==
 +
*[[Coreutils]]
 +
*[[Binutils]]
 +
*[[Textutils]]
 +
*[[Diffutils]]
 +
*[[xargs]]
 +
*[[GNU parallel]]
  
 
==External links==
 
==External links==

Latest revision as of 23:28, 21 May 2015

The GNU Find Utilities (or Findutils) are the basic directory searching utilities of the GNU operating system. These programs are typically used in conjunction with other programs to provide modular and powerful directory search and file locating capabilities to other commands.

Findutils tools

The tools supplied with this package are:

  • find - search for files in a directory hierarchy
  • locate - list files in databases that match a pattern
  • updatedb - update a file name database
  • xargs - build and execute command lines from standard input

Examples

Note: Adapted from the GNU Project homepage with extra additions by me.

  • Here is an example operation to make all HTML files in the subdirectory htdocs readable by all using find and xargs. This is a typical example of how find and xargs are used with other utilities to provide powerful directory traversal capability:
find htdocs -name '*.html' -print0 | xargs -0 chmod a+r 
  • Find all files in a given path a print them to the STDOUT (one file/line):
find . -name \*.htm -printf %f\\n
  • find all files in a given path containing the word "EXAMPLE" in them (time the difference):
time find /home/foo/bar -type f -exec grep EXAMPLE {} \; -print
time find /home/foo/bar -type f|xargs grep EXAMPLE   # <- faster
  • find all foo.pdb files in a given path and execute a command within the directory they are found:
find /path/to/files/ -name foo.pdb -execdir some_command \;
  • find (list) all files in a given path having "Fo*" in their filename:
find bar/ -name 'Fo*' -exec ls -l {} \+ 
  • search for files modified within the last day that match a certain filenaming pattern and remove them (i.e. execute rm):
find . -name "[Tt][Ee][Ss][Tt]*" -mtime -1 -exec rm {} \;  # CAREFUL!!
  • find all PNG files and place them into a new tarball:
find . -name '*.png' | tar -c --files-from=- | bzip2 > foo.tar.bz2
  • Find by inode number:
% ls -il /bin/gzip
  971575 -rwxr-xr-x 3 root root 58712 2006-11-25 13:57 /bin/gzip
% find / -inum 971575 -xdev -ls 2>/dev/null
  971575   64 -rwxr-xr-x   3 root     root        58712 Nov 25  2006 /bin/zcat
  971575   64 -rwxr-xr-x   3 root     root        58712 Nov 25  2006 /bin/gzip
  971575   64 -rwxr-xr-x   3 root     root        58712 Nov 25  2006 /bin/gunzip

The last results reveal that the three files, /bin/gzip, /bin/gunzip, and /bin/zcat are three different filenames, pointing to the same actual file on disk, inode number 971575 (i.e., /bin/gzip).

Bulk image resize

If you are like me and have a high resolution digital camera, it is often necessary to resize the images before emailing them to friends and family. It is, of course, possible to manually resize them using Adobe Photoshop, The Gimp, or any other image editing programme. However, it is possible to automate this task using simple command line tools.

For an example, say you want to resize all of the jpeg images in your current directory to 800x600 and place them in a sub-directory called, "resized". Then you would execute the following commands:

find . -maxdepth 1 -name '*.jpg' -type f -exec convert -resize 800x600 {} resized/{} \;

It is also possible to have the above commands run recursively through a directory and its sub-directories like so:

find . -follow -name '*.jpg' -type f -exec convert -resize 800x600 {} ../resized/{} \;

Note that the programme convert is part of the ImageMagick suite and you will need to have it installed to use the above commands (it is, by default, in SuSE Linux).

Tracking down large files

Sometimes it is necessary to find files over a certain size and it can be somewhat tedious ls-ing through your many directories. The following command will list only those files over a certain size and only within the specified directory (and sub-directories):

find some_directory/ -size +2000k -ls

which will only list files over 2000 kb (2 MB).

Finding files containing a string in a directory hierarchy

In this example, all .php files will be searched for the string "MySQL" (case-insensitive with -i) and the line numbers will also be returned (using -n):

find . -name '*.php' -type f | xargs grep -n -i 'MySQL'

Permissions on (sub)directories and files

  • Find files with a given permission
find /path/ -type f -perm 755
  • Change the permissions of all sub-directories:
find /path/ -type d -exec chmod 755 {} \;
#~OR~
find /path/ -type d -exec chmod u=rwx,go=rx {} \;
  • Change the permissions of all files in sub-directories:
find /path/ -type f -name "*.txt" -exec chmod 644 {} \;
#~OR~
find /path/ -type f -exec chmod u=rw,go=r {} \;

where,

-nouser: shows output that's not associated with an existing userid
-nogroup: shows output not associated with an existing groupid
-links n: file has n links
-newer file: file was modified more recently than file.
-perm mode: file has mode permissions.
-type c
       File is of type c:
       b  block (buffered) special
       c  character (unbuffered) special
       d  directory
       p  named pipe (FIFO)
       f  regular file
       l  symbolic link; this is never true if the -L option or the -follow option is
          in effect, unless the symbolic link is broken. If you want to search for 
          symbolic links when -L is in effect, use -xtype.
       s  socket
       D  door (Solaris)

Log rotate

If you do not want run-away logging to fill up your /var partition, you can archive (tar) old logs (e.g., older than one day):

find /var/log/ -name "*.log" -mtime +1 -exec bzip2 -z '{}' \;

and then delete old tars (e.g., older than 30 days):

find /var/log -name "*.bz2" -mtime +30 -exec rm '{}' \;

Misc

$ find . -name "rc.conf" -print
$ find . -name "rc.conf" -exec chmod o+r '{}' \;
$ find /usr/src -not \( -name "*,v" -o -name ".*,v" \) '{}' \; -print
$ find . -exec grep "bob and alice" '{}' \; -print
$ find . -exec grep -q "bob and alice" '{}' \; -print
$ find /path/ \( -name "foo*" -or -name "bar*" \) -type f -ls
  • Find all files having a specific number of characters in their files names or more (e.g., 64+ characters):
$ find /path/you/wish/to/search -regextype posix-extended -regex '\./[^/]{64,}'
  • Find all files/directories at any depth in a given path that have spaces and replace them with underscores:
$ find /foo/bar/ -depth -name "* *" -execdir rename 's/ /_/g' "{}" \;
#~OR~
$ find . -depth -name '* *' \
    | while IFS= read -r f ; do mv -i "$f" "$(dirname "$f")/$(basename "$f"|tr ' ' _)" ; done
#~OR~
$ for f in *\ *; do mv "$f" "${f// /_}"; done  # not recursive, but fast
  • Find all files with ampersands ("&") in their filenames and replace with "and":
$ find . -type f -name "*&*" -execdir rename 's/\&/and/g' "{}" \;
  • Find all files with strange (i.e., non-standard Linux) filenames and rename them:
# Update: if this doesn't work, use read -d '' instead
find . -print0 | while IFS= read -d '$\000' f ;
do 
  orig_f="$f"
  # Below is pure bash. You can replace with tr if you like
  # f="$( echo $f | tr -d ,\' | tr "$'&'@- " "ya__" )"
  f="${f// /_}"  # Replace spaces with _
  f="${f//\'}"   # Remove single quote
  f="${f//-/_}"  # Replace - with _
  f="${f//,}"    # Remove commas
  f="${f//&/y}"  # Replace ampersand with y
  f="${f//@/a}"  # Replace at sign with a
  f=$( iconv -f UTF8 -t ASCII//TRANSLIT <<< "$f" )
  new_dir="$(dirname $f)"
  new_f="$(basename $f)"
  mkdir -p "$new_dir"
  mv -i "$orig_f" "$new_dir/$new_f"
done
  • Find files that do not contain a given extension/suffix (e.g., ".py"):
$ find . -type f -not -name '*.py'
$ find . -type f -not -name '*.py' -not -name '*.pyc'
$ find . -type f ! \( -name '*.py' -o -name '*.pyc' \)
  • Find all files in a given directory that do not contain extensions/suffixes:
$ ls -1 !(*.*)  # <- does not work with subdirectories
$ find . -type f ! -name "*.*"
  • Find all files of a given extension/suffix and return just the basename/prefix (i.e., without the full path or suffix. E.g., "/path/to/file.txt" => "file"):
$ find /path -type f -name '*.txt' -exec basename {} .txt \;
#~OR~
$ find /path -type f -name '*.txt' -exec basename -s .txt {} \;
  • Find executable files without extensions:
$ find $HOME/bin/ -type f ! -name "*.*" -perm -og+rx
  • Remove all zero size files from current directory (not recursive):
$ find . -maxdepth 1 -size 0c -delete
#~OR~
$ find . -maxdepth 1 -empty -delete

Using find to fix corrupted timestamps

I had an ext4 partition (/dev/sda7) get corrupted by a dead motherboard battery. This cause a forced fsck.ext4 -a -C0 /dev/sda7 on this partition at reboot.

Also note that:

dmesg |grep -i battery

yielded

[   24.226281] ACPI: Deprecated procfs I/F for battery is loaded, please retry with CONFIG_ACPI_PROCFS_POWER cleared
[   24.226294] ACPI: Battery Slot [C198] (battery absent)

Unfortunately, all of this caused _some_ of the files on my /dev/sda7 to have a timestamp way in the future (the year 2037). I was able to use fsck.ext4 (without the -a option) and then debugfs, etc. to fix/clean this partition and mount it (as /home), etc. However, the timestamps were still wrong (i.e., still set at the year 2037).

I found the following commands to do the trick nicely:

touch --date "2020-01-01" /tmp/foo
find /home -newer /tmp/foo -exec touch {} \;

The two commands do the following:

  1. Create a temporary file with a timestamp in the future, but not all the way into the year 2037; and
  2. Find all files that are newer than this temporary file and re-timestamp them to the present.

See also

External links