GNU parallel

From Christoph's Personal Wiki
Jump to: navigation, search

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.

Usage examples / tutorial

  • Install GNU parallel from the CLI (or, just use your distro's repo):
$ wget pi.dk/3 -qO - | bash -x
  • Basic usage:
$ find . -name "*.foo" | parallel grep bar

The above is the parallel equivalent to:

$ find . -name "*.foo" -exec grep bar {} +

This searches in all files in the current directory and its subdirectories whose name end in .foo for occurrences of the string bar. The parallel command will work as expected unless a file name contains a newline. In order to avoid this limitation one may use:

$ find . -name "*.foo" -print0 | parallel -0 grep bar

The above command uses the null character to delimit file names.

$ find . -name "*.foo" | parallel -X mv {} /tmp/trash

The above command uses {} to tell parallel to replace {} with the argument list.

$ find . -maxdepth 1 -type f -name "*.ogg" | parallel -X -r cp -v -p {} /home/media

The command above does the same as:

$ cp -v -p *.ogg /home/media

however, the former command which uses find/parallel/cp is more resource efficient and will not halt with an error if the expansion of *.ogg is too large for the shell.

  • Multiple commands as arguments:
$ cat a.txt | xargs -I % sh -c 'command1; command2; ...'
# ~OR~
$ cat a.txt | parallel 'command1 {}; command2 {}; ...; '
  • Example of using "pipes" and "records" to separate STDIN/STDOUT:
$ cat foo.fasta
>RECORD1
ATGGCTGTCTTCTTGCTTGCCACTTCCACCATAATGTTCCCAACGAAGATAGAAGCAGCA
GATTGTAATGGTGCATGTTCACCTTTCGAGGTGCCACCGTGCCGCTCAAGTGATTGTCGT
TGTGTCCCTATAGGACTATTTGTTGGTTTCTGCATACATCCAACTGGACTTTCATCTGTT
>RECORD2
GCGAAGATGGTCGACGAACATCCCAACTTATGTCAATCTGATGATGAATGCATGAAGAAA
GGAAGTGGCAATTTTTGCGCTCGTTACCCTAATAATTATATCGATTATGGATGGTGTTTT
GACTCTGATTCTGAAGCACTGAAAGGCTTCTTGGCCATGCCTAGGGCAACCACCAAGTAA
$ cat foo.fasta | parallel --pipe --recstart '>' -N1 cat';' echo =====
>RECORD1
ATGGCTGTCTTCTTGCTTGCCACTTCCACCATAATGTTCCCAACGAAGATAGAAGCAGCA
GATTGTAATGGTGCATGTTCACCTTTCGAGGTGCCACCGTGCCGCTCAAGTGATTGTCGT
TGTGTCCCTATAGGACTATTTGTTGGTTTCTGCATACATCCAACTGGACTTTCATCTGTT
=====
>RECORD2
GCGAAGATGGTCGACGAACATCCCAACTTATGTCAATCTGATGATGAATGCATGAAGAAA
GGAAGTGGCAATTTTTGCGCTCGTTACCCTAATAATTATATCGATTATGGATGGTGTTTT
GACTCTGATTCTGAAGCACTGAAAGGCTTCTTGGCCATGCCTAGGGCAACCACCAAGTAA
=====
$ printf '=%.0s' {1..79}
$ printf %79s | tr " " "="
$ seq -s= 79 | tr -d '[:digit:]'
$ perl -E 'say "=" x 79'
$ head -c 79 < /dev/zero | tr '\0' '='
$ cat /usr/share/dict/words | parallel --pipe --blocksize 500k wc
$ seq 1 10 |shuf | parallel --pipe --files -N 3 sort -n | parallel -mj1 sort -nm
$ rm -f /tmp/*.par
$ seq 1 10 |shuf | parallel --pipe --files -N 3 sort -n | parallel -mj1 sort -nm {} ";"rm {}

See also

External links