BLAST+

In bioinformatics, Basic Local Alignment Search Tool (or BLAST), is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences.

This article focuses on the NCBI "new" BLAST, or blast+ (and starting from version 2.2.26+, released on 3 March 2012).

The latest stable version is: 2.2.26+ (2012-03-03)

see: BLAST for legacy ("old") versions.

Utilities

Programs contained in blast+ package:

blastdbcheck: Checks database integrity
blastdbcmd: Retrieves sequences or other information from a BLAST database
blastdb_aliastool: Creates database alias
Blastn: Searches a nucleotide query against a nucleotide database
blastp: Searches a protein query against a protein database
blastx: Searches a nucleotide query, dynamically translated in all six frames, against a protein database
blast_formatter: Formats a web blast result using its assigned request ID (RID)
convert2blastmask: Converts lowercase masking into makeblastdb readable data
dustmasker: Masks the low complexity regions in the input nucleotide sequences
legacy_blast.pl: Converts a legacy blast search command line into blast+ counterpart and execute it
makeblastdb: Formats input FASTA file(s) into a BLAST database
makembindex: Indexes an existing nucleotide database for use with megablast
psiblast: Finds members of a protein family, identifies proteins distantly related to the query, or builds position specific scoring matrix for the query
rpsblast: Searches a protein against a conserved domain database (CDD) to identify functional domains present in the query
rpstblastn: Searches a nucleotide query, by dynamically translated it in all six-frames first, against a conserved domain database (CDD)
segmasker: Masks the low complexity regions in input protein sequences
tblastn: Searches a protein query against a nucleotide database dynamically translated in all six frames
tblastx: Searches a nucleotide query, dynamically translated in all six frames, against a nucleotide database similarly translated
update_blastdb.pl: Downloads preformatted blast databases from NCBI
windowmasker: Masks repeats found in input nucleotide sequences

Legacy utilities

Programs contained in the legacy blast package:

bl2seq [1]: Directly comparing two FASTA sequences
blastall [1]: legacy blast containing the subfunction of blastn, blastp, blastx, tblastn, and tblastx
blastclust [2]: Clusters input FASTA sequences into related groups
blastpgp [1]: Standalone PSI-BLAST for search of distantly related protein sequences and generate position-specific matrices
copymat [2]: Copies blastpgp output for input to makemat
fastacmd [1]: Retrieves specific sequence or dumps the sequences from a formatted blast database
formatdb [1]: Convert FASTA formatted seqeucne file into BLAST database
formatrpsdb [2]: Format scoremat files into an RPSBLAST database
impala [2]: protein profile search program, mostly replaced by rpsblast
makemat [2]: Convert the copymat files into scoremat format, no loger needed by new blastpgp output
megablast [1]: Faster batch blastn program that uses greedy-algorithm. Works in contiguous or more sensitive discontiguous mode
rpsblast [1]: reverse PSI-BLAST program for searching against conserved domain database
seedtop [2]: Pattern search program

Note:

Those programs are re-organized into blastn, blastp, blastx, tblastn, tblastx, rpsblast, rpsblastx, psiblast, blastdbcmd and makeblastdb
Those programs have no blast+ counterpart at this time.

The commands for legacy blast, comparable to those given for blast+ in section 6, are:

blastall -
fastacmd -d refseq_rna -s nm_000249 -o test_query.fa
blastall -p blastn -i test_query.fa -d refseq_rna -F F -m 9 -b 2 -v 2

Example usage

Extract all human sequences from the nr database

Although one cannot select GIs by taxonomy from a database, a combination of Linux command line tools will accomplish this:

$ blastdbcmd -db nr -entry all -outfmt "%g %T" | \
   awk ' { if ($2 == 9606) { print $1 } } ' | \
   blastdbcmd -db nr -entry_batch - -out human_sequences.txt

The first blastdbcmd invocation produces 2 entries per sequence (GI and taxonomy ID), the awk command selects from the output of that command those sequences which have a taxonomy ID of '9606' (i.e., human) and prints its GIs, and finally the second blastdbcmd invocation uses those GIs to print the sequence data for the human sequences in the nr database.

External links

Official website
BLAST executables — free source downloads
Standalone BLAST Setup for Unix

BLAST+

Contents

Utilities

Legacy utilities

Example usage

Extract all human sequences from the nr database

See also

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools