Difference between revisions of "GNU parallel"

From Christoph's Personal Wiki
Jump to: navigation, search
(Usage examples / tutorial)
(Usage examples / tutorial)
 
(2 intermediate revisions by the same user not shown)
Line 6: Line 6:
 
  $ wget pi.dk/3 -qO - | bash -x
 
  $ wget pi.dk/3 -qO - | bash -x
  
 +
* Basic usage:
 +
$ find . -name "*.foo" | parallel grep bar
 +
 +
The above is the parallel equivalent to:
 +
 +
$ find . -name "*.foo" -exec grep bar {} +
 +
 +
This searches in all files in the current directory and its subdirectories whose name end in <code>.foo</code> for occurrences of the string <code>bar</code>. The parallel command will work as expected unless a file name contains a newline. In order to avoid this limitation one may use:
 +
 +
$ find . -name "*.foo" -print0 | parallel -0 grep bar
 +
 +
The above command uses the null character to delimit file names.
 +
 +
$ find . -name "*.foo" | parallel -X mv {} /tmp/trash
 +
 +
The above command uses <code>{}</code> to tell <code>parallel</code> to replace <code>{}</code> with the argument list.
 +
 +
$ find . -maxdepth 1 -type f -name "*.ogg" | parallel -X -r cp -v -p {} /home/media
 +
 +
The command above does the same as:
 +
 +
$ cp -v -p *.ogg /home/media
 +
 +
however, the former command which uses <code>find</code>/<code>parallel</code>/<code>cp</code> is more resource efficient and will not halt with an error if the expansion of *.ogg is too large for the shell.
 +
 +
* Multiple commands as arguments:
 +
$ cat a.txt | xargs -I % sh -c 'command1; command2; ...'
 +
# ~OR~
 +
$ cat a.txt | parallel 'command1 {}; command2 {}; ...; '
 +
 +
* Example of using "pipes" and "records" to separate STDIN/STDOUT:
 
  $ cat foo.fasta
 
  $ cat foo.fasta
 
  >RECORD1
 
  >RECORD1
Line 39: Line 70:
 
  $ rm -f /tmp/*.par
 
  $ rm -f /tmp/*.par
 
  $ seq 1 10 |shuf | parallel --pipe --files -N 3 sort -n | parallel -mj1 sort -nm {} ";"rm {}
 
  $ seq 1 10 |shuf | parallel --pipe --files -N 3 sort -n | parallel -mj1 sort -nm {} ";"rm {}
 +
 +
==See also==
 +
* [[find]]
 +
* [[xargs]]
  
 
==External links==
 
==External links==

Latest revision as of 02:36, 20 March 2015

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.

Usage examples / tutorial

  • Install GNU parallel from the CLI (or, just use your distro's repo):
$ wget pi.dk/3 -qO - | bash -x
  • Basic usage:
$ find . -name "*.foo" | parallel grep bar

The above is the parallel equivalent to:

$ find . -name "*.foo" -exec grep bar {} +

This searches in all files in the current directory and its subdirectories whose name end in .foo for occurrences of the string bar. The parallel command will work as expected unless a file name contains a newline. In order to avoid this limitation one may use:

$ find . -name "*.foo" -print0 | parallel -0 grep bar

The above command uses the null character to delimit file names.

$ find . -name "*.foo" | parallel -X mv {} /tmp/trash

The above command uses {} to tell parallel to replace {} with the argument list.

$ find . -maxdepth 1 -type f -name "*.ogg" | parallel -X -r cp -v -p {} /home/media

The command above does the same as:

$ cp -v -p *.ogg /home/media

however, the former command which uses find/parallel/cp is more resource efficient and will not halt with an error if the expansion of *.ogg is too large for the shell.

  • Multiple commands as arguments:
$ cat a.txt | xargs -I % sh -c 'command1; command2; ...'
# ~OR~
$ cat a.txt | parallel 'command1 {}; command2 {}; ...; '
  • Example of using "pipes" and "records" to separate STDIN/STDOUT:
$ cat foo.fasta
>RECORD1
ATGGCTGTCTTCTTGCTTGCCACTTCCACCATAATGTTCCCAACGAAGATAGAAGCAGCA
GATTGTAATGGTGCATGTTCACCTTTCGAGGTGCCACCGTGCCGCTCAAGTGATTGTCGT
TGTGTCCCTATAGGACTATTTGTTGGTTTCTGCATACATCCAACTGGACTTTCATCTGTT
>RECORD2
GCGAAGATGGTCGACGAACATCCCAACTTATGTCAATCTGATGATGAATGCATGAAGAAA
GGAAGTGGCAATTTTTGCGCTCGTTACCCTAATAATTATATCGATTATGGATGGTGTTTT
GACTCTGATTCTGAAGCACTGAAAGGCTTCTTGGCCATGCCTAGGGCAACCACCAAGTAA
$ cat foo.fasta | parallel --pipe --recstart '>' -N1 cat';' echo =====
>RECORD1
ATGGCTGTCTTCTTGCTTGCCACTTCCACCATAATGTTCCCAACGAAGATAGAAGCAGCA
GATTGTAATGGTGCATGTTCACCTTTCGAGGTGCCACCGTGCCGCTCAAGTGATTGTCGT
TGTGTCCCTATAGGACTATTTGTTGGTTTCTGCATACATCCAACTGGACTTTCATCTGTT
=====
>RECORD2
GCGAAGATGGTCGACGAACATCCCAACTTATGTCAATCTGATGATGAATGCATGAAGAAA
GGAAGTGGCAATTTTTGCGCTCGTTACCCTAATAATTATATCGATTATGGATGGTGTTTT
GACTCTGATTCTGAAGCACTGAAAGGCTTCTTGGCCATGCCTAGGGCAACCACCAAGTAA
=====
$ printf '=%.0s' {1..79}
$ printf %79s | tr " " "="
$ seq -s= 79 | tr -d '[:digit:]'
$ perl -E 'say "=" x 79'
$ head -c 79 < /dev/zero | tr '\0' '='
$ cat /usr/share/dict/words | parallel --pipe --blocksize 500k wc
$ seq 1 10 |shuf | parallel --pipe --files -N 3 sort -n | parallel -mj1 sort -nm
$ rm -f /tmp/*.par
$ seq 1 10 |shuf | parallel --pipe --files -N 3 sort -n | parallel -mj1 sort -nm {} ";"rm {}

See also

External links