Difference between revisions of "GNU parallel"
(New page: '''GNU parallel''' is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in t...) |
(→Usage examples / tutorial) |
||
(3 intermediate revisions by the same user not shown) | |||
Line 6: | Line 6: | ||
$ wget pi.dk/3 -qO - | bash -x | $ wget pi.dk/3 -qO - | bash -x | ||
+ | * Basic usage: | ||
+ | $ find . -name "*.foo" | parallel grep bar | ||
+ | |||
+ | The above is the parallel equivalent to: | ||
+ | |||
+ | $ find . -name "*.foo" -exec grep bar {} + | ||
+ | |||
+ | This searches in all files in the current directory and its subdirectories whose name end in <code>.foo</code> for occurrences of the string <code>bar</code>. The parallel command will work as expected unless a file name contains a newline. In order to avoid this limitation one may use: | ||
+ | |||
+ | $ find . -name "*.foo" -print0 | parallel -0 grep bar | ||
+ | |||
+ | The above command uses the null character to delimit file names. | ||
+ | |||
+ | $ find . -name "*.foo" | parallel -X mv {} /tmp/trash | ||
+ | |||
+ | The above command uses <code>{}</code> to tell <code>parallel</code> to replace <code>{}</code> with the argument list. | ||
+ | |||
+ | $ find . -maxdepth 1 -type f -name "*.ogg" | parallel -X -r cp -v -p {} /home/media | ||
+ | |||
+ | The command above does the same as: | ||
+ | |||
+ | $ cp -v -p *.ogg /home/media | ||
+ | |||
+ | however, the former command which uses <code>find</code>/<code>parallel</code>/<code>cp</code> is more resource efficient and will not halt with an error if the expansion of *.ogg is too large for the shell. | ||
+ | |||
+ | * Multiple commands as arguments: | ||
+ | $ cat a.txt | xargs -I % sh -c 'command1; command2; ...' | ||
+ | # ~OR~ | ||
+ | $ cat a.txt | parallel 'command1 {}; command2 {}; ...; ' | ||
+ | |||
+ | * Example of using "pipes" and "records" to separate STDIN/STDOUT: | ||
$ cat foo.fasta | $ cat foo.fasta | ||
>RECORD1 | >RECORD1 | ||
Line 32: | Line 63: | ||
$ seq -s= 79 | tr -d '[:digit:]' | $ seq -s= 79 | tr -d '[:digit:]' | ||
$ perl -E 'say "=" x 79' | $ perl -E 'say "=" x 79' | ||
+ | $ head -c 79 < /dev/zero | tr '\0' '=' | ||
$ cat /usr/share/dict/words | parallel --pipe --blocksize 500k wc | $ cat /usr/share/dict/words | parallel --pipe --blocksize 500k wc | ||
Line 38: | Line 70: | ||
$ rm -f /tmp/*.par | $ rm -f /tmp/*.par | ||
$ seq 1 10 |shuf | parallel --pipe --files -N 3 sort -n | parallel -mj1 sort -nm {} ";"rm {} | $ seq 1 10 |shuf | parallel --pipe --files -N 3 sort -n | parallel -mj1 sort -nm {} ";"rm {} | ||
+ | |||
+ | ==See also== | ||
+ | * [[find]] | ||
+ | * [[xargs]] | ||
==External links== | ==External links== |
Latest revision as of 02:36, 20 March 2015
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.
Usage examples / tutorial
- Install GNU parallel from the CLI (or, just use your distro's repo):
$ wget pi.dk/3 -qO - | bash -x
- Basic usage:
$ find . -name "*.foo" | parallel grep bar
The above is the parallel equivalent to:
$ find . -name "*.foo" -exec grep bar {} +
This searches in all files in the current directory and its subdirectories whose name end in .foo
for occurrences of the string bar
. The parallel command will work as expected unless a file name contains a newline. In order to avoid this limitation one may use:
$ find . -name "*.foo" -print0 | parallel -0 grep bar
The above command uses the null character to delimit file names.
$ find . -name "*.foo" | parallel -X mv {} /tmp/trash
The above command uses {}
to tell parallel
to replace {}
with the argument list.
$ find . -maxdepth 1 -type f -name "*.ogg" | parallel -X -r cp -v -p {} /home/media
The command above does the same as:
$ cp -v -p *.ogg /home/media
however, the former command which uses find
/parallel
/cp
is more resource efficient and will not halt with an error if the expansion of *.ogg is too large for the shell.
- Multiple commands as arguments:
$ cat a.txt | xargs -I % sh -c 'command1; command2; ...' # ~OR~ $ cat a.txt | parallel 'command1 {}; command2 {}; ...; '
- Example of using "pipes" and "records" to separate STDIN/STDOUT:
$ cat foo.fasta >RECORD1 ATGGCTGTCTTCTTGCTTGCCACTTCCACCATAATGTTCCCAACGAAGATAGAAGCAGCA GATTGTAATGGTGCATGTTCACCTTTCGAGGTGCCACCGTGCCGCTCAAGTGATTGTCGT TGTGTCCCTATAGGACTATTTGTTGGTTTCTGCATACATCCAACTGGACTTTCATCTGTT >RECORD2 GCGAAGATGGTCGACGAACATCCCAACTTATGTCAATCTGATGATGAATGCATGAAGAAA GGAAGTGGCAATTTTTGCGCTCGTTACCCTAATAATTATATCGATTATGGATGGTGTTTT GACTCTGATTCTGAAGCACTGAAAGGCTTCTTGGCCATGCCTAGGGCAACCACCAAGTAA
$ cat foo.fasta | parallel --pipe --recstart '>' -N1 cat';' echo ===== >RECORD1 ATGGCTGTCTTCTTGCTTGCCACTTCCACCATAATGTTCCCAACGAAGATAGAAGCAGCA GATTGTAATGGTGCATGTTCACCTTTCGAGGTGCCACCGTGCCGCTCAAGTGATTGTCGT TGTGTCCCTATAGGACTATTTGTTGGTTTCTGCATACATCCAACTGGACTTTCATCTGTT ===== >RECORD2 GCGAAGATGGTCGACGAACATCCCAACTTATGTCAATCTGATGATGAATGCATGAAGAAA GGAAGTGGCAATTTTTGCGCTCGTTACCCTAATAATTATATCGATTATGGATGGTGTTTT GACTCTGATTCTGAAGCACTGAAAGGCTTCTTGGCCATGCCTAGGGCAACCACCAAGTAA =====
$ printf '=%.0s' {1..79} $ printf %79s | tr " " "=" $ seq -s= 79 | tr -d '[:digit:]' $ perl -E 'say "=" x 79' $ head -c 79 < /dev/zero | tr '\0' '='
$ cat /usr/share/dict/words | parallel --pipe --blocksize 500k wc
$ seq 1 10 |shuf | parallel --pipe --files -N 3 sort -n | parallel -mj1 sort -nm $ rm -f /tmp/*.par $ seq 1 10 |shuf | parallel --pipe --files -N 3 sort -n | parallel -mj1 sort -nm {} ";"rm {}
See also
External links
- Official website
- GNU parallel tutorial (same as running:
`man parallel_tutorial`
) - GNU Parallel videos on YouTube