Difference between revisions of "BLAST+"

From Christoph's Personal Wiki
Jump to: navigation, search
(New page: In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different protein...)
(No difference)

Revision as of 22:13, 9 July 2012

In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences.

This article focuses on the NCBI "new" BLAST, or blast+ (and starting from version 2.2.26+, released on 3 March 2012).

The latest stable version is: 2.2.26+ (2012-03-03)

see: BLAST for legacy ("old") versions.

Utilities

  • Programs contained in blast+ package:
blastdbcheck 
Checks database integrity
blastdbcmd 
Retrieves sequences or other information from a BLAST database
blastdb_aliastool 
Creates database alias
Blastn 
Searches a nucleotide query against a nucleotide database
blastp 
Searches a protein query against a protein database
blastx 
Searches a nucleotide query, dynamically translated in all six frames, against a protein database
blast_formatter 
Formats a web blast result using its assigned request ID (RID)
convert2blastmask 
Converts lowercase masking into makeblastdb readable data
dustmasker 
Masks the low complexity regions in the input nucleotide sequences
legacy_blast.pl 
Converts a legacy blast search command line into blast+ counterpart and execute it
makeblastdb 
Formats input FASTA file(s) into a BLAST database
makembindex 
Indexes an existing nucleotide database for use with megablast
psiblast 
Finds members of a protein family, identifies proteins distantly related to the query, or builds position specific scoring matrix for the query
rpsblast 
Searches a protein against a conserved domain database (CDD) to identify functional domains present in the query
rpstblastn 
Searches a nucleotide query, by dynamically translated it in all six-frames first, against a conserved domain database (CDD)
segmasker 
Masks the low complexity regions in input protein sequences
tblastn 
Searches a protein query against a nucleotide database dynamically translated in all six frames
tblastx 
Searches a nucleotide query, dynamically translated in all six frames, against a nucleotide database similarly translated
update_blastdb.pl 
Downloads preformatted blast databases from NCBI
windowmasker 
Masks repeats found in input nucleotide sequences

Legacy utilities

  • Programs contained in the legacy blast package:
bl2seq [1] 
Directly comparing two FASTA sequences
blastall [1] 
legacy blast containing the subfunction of blastn, blastp, blastx, tblastn, and tblastx
blastclust [2] 
Clusters input FASTA sequences into related groups
blastpgp [1] 
Standalone PSI-BLAST for search of distantly related protein sequences and generate position-specific matrices
copymat [2] 
Copies blastpgp output for input to makemat
fastacmd [1] 
Retrieves specific sequence or dumps the sequences from a formatted blast database
formatdb [1] 
Convert FASTA formatted seqeucne file into BLAST database
formatrpsdb [2] 
Format scoremat files into an RPSBLAST database
impala [2] 
protein profile search program, mostly replaced by rpsblast
makemat [2] 
Convert the copymat files into scoremat format, no loger needed by new blastpgp output
megablast [1] 
Faster batch blastn program that uses greedy-algorithm. Works in contiguous or more sensitive discontiguous mode
rpsblast [1] 
reverse PSI-BLAST program for searching against conserved domain database
seedtop [2] 
Pattern search program

Note:

  1. Those programs are re-organized into blastn, blastp, blastx, tblastn, tblastx, rpsblast, rpsblastx, psiblast, blastdbcmd and makeblastdb
  2. Those programs have no blast+ counterpart at this time.

The commands for legacy blast, comparable to those given for blast+ in section 6, are:

blastall -
fastacmd -d refseq_rna -s nm_000249 -o test_query.fa
blastall -p blastn -i test_query.fa -d refseq_rna -F F -m 9 -b 2 -v 2

External links