GNU parallel

From Christoph's Personal Wiki
Revision as of 02:28, 20 March 2015 by Christoph (Talk | contribs) (Usage examples / tutorial)

Jump to: navigation, search

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.

Usage examples / tutorial

  • Install GNU parallel from the CLI (or, just use your distro's repo):
$ wget pi.dk/3 -qO - | bash -x
$ cat foo.fasta
>RECORD1
ATGGCTGTCTTCTTGCTTGCCACTTCCACCATAATGTTCCCAACGAAGATAGAAGCAGCA
GATTGTAATGGTGCATGTTCACCTTTCGAGGTGCCACCGTGCCGCTCAAGTGATTGTCGT
TGTGTCCCTATAGGACTATTTGTTGGTTTCTGCATACATCCAACTGGACTTTCATCTGTT
>RECORD2
GCGAAGATGGTCGACGAACATCCCAACTTATGTCAATCTGATGATGAATGCATGAAGAAA
GGAAGTGGCAATTTTTGCGCTCGTTACCCTAATAATTATATCGATTATGGATGGTGTTTT
GACTCTGATTCTGAAGCACTGAAAGGCTTCTTGGCCATGCCTAGGGCAACCACCAAGTAA
$ cat foo.fasta | parallel --pipe --recstart '>' -N1 cat';' echo =====
>RECORD1
ATGGCTGTCTTCTTGCTTGCCACTTCCACCATAATGTTCCCAACGAAGATAGAAGCAGCA
GATTGTAATGGTGCATGTTCACCTTTCGAGGTGCCACCGTGCCGCTCAAGTGATTGTCGT
TGTGTCCCTATAGGACTATTTGTTGGTTTCTGCATACATCCAACTGGACTTTCATCTGTT
=====
>RECORD2
GCGAAGATGGTCGACGAACATCCCAACTTATGTCAATCTGATGATGAATGCATGAAGAAA
GGAAGTGGCAATTTTTGCGCTCGTTACCCTAATAATTATATCGATTATGGATGGTGTTTT
GACTCTGATTCTGAAGCACTGAAAGGCTTCTTGGCCATGCCTAGGGCAACCACCAAGTAA
=====
$ printf '=%.0s' {1..79}
$ printf %79s | tr " " "="
$ seq -s= 79 | tr -d '[:digit:]'
$ perl -E 'say "=" x 79'
$ head -c 79 < /dev/zero | tr '\0' '='
$ cat /usr/share/dict/words | parallel --pipe --blocksize 500k wc
$ seq 1 10 |shuf | parallel --pipe --files -N 3 sort -n | parallel -mj1 sort -nm
$ rm -f /tmp/*.par
$ seq 1 10 |shuf | parallel --pipe --files -N 3 sort -n | parallel -mj1 sort -nm {} ";"rm {}

External links