List of EMBOSS programs

From Christoph's Personal Wiki
Jump to: navigation, search

Contents

Acd file utilities

acdc 
ACD compiler
acdpretty 
ACD pretty printing utility
acdtable 
Creates an HTML table from an ACD file
acdtrace 
ACD compiler on-screen trace
acdvalid 
ACD file validation

Merging sequences to make a consensus

cons 
Creates a consensus from multiple alignments
megamerger 
Merge two large overlapping nucleic acid sequences
merger 
Merge two overlapping sequences

Finding differences between sequences

diffseq 
Find differences between nearly identical sequences

Dot plot sequence comparisons

dotmatcher 
Displays a thresholded dotplot of two sequences
dotpath 
Non-overlapping wordmatch dotplot of two sequences
dottup 
Displays a wordmatch dotplot of two sequences
polydot 
Displays all-against-all dotplots of a set of sequences

Global sequence alignment

est2genome 
Align EST and genomic DNA sequences
needle 
Needleman-Wunsch global alignment
stretcher 
Finds the best global alignment between two sequences
esim4 
Align an mRNA to a genomic DNA sequence

Local sequence alignment

matcher 
Finds the best local alignments between two sequences
seqmatchall 
All-against-all comparison of a set of sequences
supermatcher 
Match large sequences against one or more other sequences
water 
Smith-Waterman local alignment
wordfinder 
Match large sequences against one or more other sequences
wordmatch 
Finds all exact matches of a given size between 2 sequences

Multiple sequence alignment

edialign 
Local multiple alignment of sequences
emma 
Multiple alignment program - interface to ClustalW program
infoalign 
Information on a multiple sequence alignment
plotcon 
Plot quality of conservation of a sequence alignment
prettyplot 
Displays aligned sequences, with colouring and boxing
showalign 
Displays a multiple sequence alignment
tranalign 
Align nucleic coding regions given the aligned proteins
mse 
Multiple Sequence Editor

Publication-quality display

abiview 
Reads ABI file and display the trace
cirdna 
Draws circular maps of DNA constructs
lindna 
Draws linear maps of DNA constructs
pepnet 
Displays proteins as a helical net
pepwheel 
Shows protein sequences as helices
prettyplot 
Displays aligned sequences, with colouring and boxing
prettyseq 
Output sequence with translated ranges
remap 
Display sequence with restriction sites, translation etc
seealso 
Finds programs sharing group names
showalign 
Displays a multiple sequence alignment
showdb 
Displays information on the currently available databases
showfeat 
Show features of a sequence
showseq 
Display a sequence with features, translation etc
sixpack 
Display a DNA sequence with 6-frame translation and ORFs
textsearch 
Search sequence documentation. Slow, use SRS and Entrez!

Enzyme kinetics calculations

findkm 
Find Km and Vmax for an enzyme reaction

Manipulation and display of sequence annotation

coderet 
Extract CDS, mRNA and translations from feature tables
extractfeat 
Extract features from a sequence
maskfeat 
Mask off features of a sequence
showfeat 
Show features of a sequence
twofeat 
Finds neighbouring pairs of features in sequences

Hidden markov model analysis

oalistat 
Statistics for multiple alignment files
ohmmalign 
Align sequences with an HMM
ohmmbuild 
Build HMM
ohmmcalibrate 
Calibrate a hidden Markov model
ohmmconvert 
Convert between HMM formats
ohmmemit 
Extract HMM sequences
ohmmfetch 
Extract HMM from a database
ohmmindex 
Index an HMM database
ohmmpfam 
Align single sequence with an HMM
ohmmsearch 
Search sequence database with an HMM
ehmmalign 
Align sequences to an HMM profile
ehmmbuild 
Build a profile HMM from an alignment
ehmmcalibrate 
Calibrate HMM search statistics
ehmmconvert 
Convert between profile HMM file formats
ehmmemit 
Generate sequences from a profile HMM
ehmmfetch 
Retrieve an HMM from an HMM database
ehmmindex 
Create a binary SSI index for an HMM database
ehmmpfam 
Search one or more sequences against an HMM database
ehmmsearch 
Search a sequence database with a profile HMM

Information and general help for users

infoalign 
Information on a multiple sequence alignment
infoseq 
Displays some simple information about sequences
seealso 
Finds programs sharing group names
showdb 
Displays information on the currently available databases
textsearch 
Search sequence documentation. Slow, use SRS and Entrez!
tfm 
Displays a program's help documentation manual
whichdb 
Search all databases for an entry
wossname 
Finds programs by keywords in their one-line documentation

Menu interface(s)

emnu 
Simple menu of EMBOSS applications

Nucleic acid secondary structure

einverted 
Finds DNA inverted repeats
vrnaalifold 
RNA alignment folding
vrnaalifoldpf 
RNA alignment folding with partition
vrnacofold 
RNA cofolding
vrnacofoldconc 
RNA cofolding with concentrations
vrnacofoldpf 
RNA cofolding with partitioning
vrnadistance 
RNA distances
vrnaduplex 
RNA duplex calculation
vrnaeval 
RNA eval
vrnaevalpair 
RNA eval with cofold
vrnafold 
Calculate secondary structures of RNAs
vrnafoldpf 
Secondary structures of RNAs with partition
vrnaheat 
RNA melting
vrnainverse 
RNA sequences matching a structure
vrnalfold 
Calculate locally stable secondary structures of RNAs
vrnaplot 
Plot vrnafold output
vrnasubopt 
Calculate RNA suboptimals

Codon usage analysis

cai 
CAI codon adaptation index
chips 
Codon usage statistics
codcmp 
Codon usage table comparison
cusp 
Create a codon usage table
syco 
Synonymous codon usage Gribskov statistic plot

Composition of nucleotide sequences

banana 
Bending and curvature plot in B-DNA
btwisted 
Calculates the twisting in a B-DNA sequence
chaos 
Create a chaos game representation plot for a sequence
compseq 
Count composition of dimer/trimer/etc words in a sequence
dan 
Calculates DNA RNA/DNA melting temperature
freak 
Residue/base frequency table or plot
isochore 
Plots isochores in large DNA sequences
sirna 
Finds siRNA duplexes in mRNA
wordcount 
Counts words of a specified size in a DNA sequence

CpG island detection and analysis

cpgplot 
Plot CpG rich areas
cpgreport 
Reports all CpG rich regions
geecee 
Calculates fractional GC content of nucleic acid sequences
newcpgreport 
Report CpG rich areas
newcpgseek 
Reports CpG rich regions

Predictions of genes and other genomic features

getorf 
Finds and extracts open reading frames (ORFs)
marscan 
Finds MAR/SAR sites in nucleic sequences
plotorf 
Plot potential open reading frames
showorf 
Pretty output of DNA translations
sixpack 
Display a DNA sequence with 6-frame translation and ORFs
syco 
Synonymous codon usage Gribskov statistic plot
tcode 
Fickett TESTCODE statistic to identify protein-coding DNA
wobble 
Wobble base plot

Nucleic acid motif searches

dreg 
Regular expression search of a nucleotide sequence
fuzznuc 
Nucleic acid pattern search
fuzztran 
Protein pattern search after translation
marscan 
Finds MAR/SAR sites in nucleic sequences

Nucleic acid sequence mutation

msbar 
Mutate sequence beyond all recognition
shuffleseq 
Shuffles a set of sequences maintaining composition

Primer prediction

eprimer3 
Picks PCR primers and hybridization oligos
primersearch 
Searches DNA sequences for matches with primer pairs
stssearch 
Search a DNA database for matches with a set of STS primers

Nucleic acid profile generation and searching

profit 
Scan a sequence or database with a matrix or profile
prophecy 
Creates matrices/profiles from multiple alignments
prophet 
Gapped alignment for profiles

Nucleic acid repeat detection

einverted 
Finds DNA inverted repeats
equicktandem 
Finds tandem repeats
etandem 
Looks for tandem repeats in a nucleotide sequence
palindrome 
Looks for inverted repeats in a nucleotide sequence

Restriction enzyme sites in nucleotide sequences

recoder 
Remove restriction sites but maintain same translation
redata 
Search REBASE for enzyme name, references, suppliers etc
remap 
Display sequence with restriction sites, translation etc
restover 
Find restriction enzymes producing specific overhang
restrict 
Finds restriction enzyme cleavage sites
showseq 
Display a sequence with features, translation etc
silent 
Silent mutation restriction enzyme scan

Transcription factors, promoters and terminator prediction

tfscan 
Scans DNA sequences for transcription factors

Translation of nucleotide sequence to protein sequence

backtranambig 
Back translate a protein sequence to ambiguous codons
backtranseq 
Back translate a protein sequence
coderet 
Extract CDS, mRNA and translations from feature tables
plotorf 
Plot potential open reading frames
prettyseq 
Output sequence with translated ranges
remap 
Display sequence with restriction sites, translation etc
showorf 
Pretty output of DNA translations
showseq 
Display a sequence with features, translation etc
sixpack 
Display a DNA sequence with 6-frame translation and ORFs
transeq 
Translate nucleic acid sequences

Phylogenetic consensus methods

econsense 
Majority-rule and strict consensus tree
fconsense 
Majority-rule and strict consensus tree
ftreedist 
Distances between trees
ftreedistpair 
Distances between two sets of trees

Phylogenetic continuous character methods

econtml 
Continuous character Maximum Likelihood method
econtrast 
Continuous character Contrasts
fcontrast 
Continuous character Contrasts

Phylogenetic discrete character methods

eclique 
Largest clique program
edollop 
Dollo and polymorphism parsimony algorithm
edolpenny 
Penny algorithm Dollo or polymorphism
efactor 
Multistate to binary recoding program
emix 
Mixed parsimony algorithm
epenny 
Penny algorithm, branch-and-bound
fclique 
Largest clique program
fdollop 
Dollo and polymorphism parsimony algorithm
fdolpenny 
Penny algorithm Dollo or polymorphism
ffactor 
Multistate to binary recoding program
fmix 
Mixed parsimony algorithm
fmove 
Interactive mixed method parsimony
fpars 
Discrete character parsimony
fpenny 
Penny algorithm, branch-and-bound

Phylogenetic distance matrix methods

efitch 
Fitch-Margoliash and Least-Squares Distance Methods
ekitsch 
Fitch-Margoliash method with contemporary tips
eneighbor 
Phylogenies from distance matrix by N-J or UPGMA method
ffitch 
Fitch-Margoliash and Least-Squares Distance Methods
fkitsch 
Fitch-Margoliash method with contemporary tips
fneighbor 
Phylogenies from distance matrix by N-J or UPGMA method

Phylogenetic gene frequency methods

egendist 
Genetic Distance Matrix program
fcontml 
Gene frequency and continuous character Maximum Likelihood
fgendist 
Compute genetic distances from gene frequencies

Phylogenetic tree drawing methods

distmat 
Creates a distance matrix from multiple alignments
ednacomp 
DNA compatibility algorithm
ednadist 
Nucleic acid sequence Distance Matrix program
ednainvar 
Nucleic acid sequence Invariants method
ednaml 
Phylogenies from nucleic acid Maximum Likelihood
ednamlk 
Phylogenies from nucleic acid Maximum Likelihood with clock
ednapars 
DNA parsimony algorithm
ednapenny 
Penny algorithm for DNA
eprotdist 
Protein distance algorithm
eprotpars 
Protein parsimony algorithm
erestml 
Restriction site Maximum Likelihood method
eseqboot 
Bootstrapped sequences algorithm
fdiscboot 
Bootstrapped discrete sites algorithm
fdnacomp 
DNA compatibility algorithm
fdnadist 
Nucleic acid sequence Distance Matrix program
fdnainvar 
Nucleic acid sequence Invariants method
fdnaml 
Estimates nucleotide phylogeny by maximum likelihood
fdnamlk 
Estimates nucleotide phylogeny by maximum likelihood
fdnamove 
Interactive DNA parsimony
fdnapars 
DNA parsimony algorithm
fdnapenny 
Penny algorithm for DNA
fdolmove 
Interactive Dollo or Polymorphism Parsimony
ffreqboot 
Bootstrapped genetic frequencies algorithm
fproml 
Protein phylogeny by maximum likelihood
fpromlk 
Protein phylogeny by maximum likelihood
fprotdist 
Protein distance algorithm
fprotpars 
Protein parsimony algorithm
frestboot 
Bootstrapped restriction sites algorithm
frestdist 
Distance matrix from restriction sites or fragments
frestml 
Restriction site maximum Likelihood method
fseqboot 
Bootstrapped sequences algorithm
fseqbootall 
Bootstrapped sequences algorithm

Phylogenetic molecular sequence methods

fdrawgram 
Plots a cladogram- or phenogram-like rooted tree diagram
fdrawtree 
Plots an unrooted tree diagram
fretree 
Interactive tree rearrangement

Protein secondary structure

garnier 
Predicts protein secondary structure
helixturnhelix 
Report nucleic acid binding motifs
hmoment 
Hydrophobic moment calculation
pepcoil 
Predicts coiled coil regions
pepnet 
Displays proteins as a helical net
pepwheel 
Shows protein sequences as helices
tmap 
Displays membrane spanning regions
topo 
Draws an image of a transmembrane protein

Protein tertiary structure

psiphi 
Phi and psi torsion angles from protein coordinates
domainreso 
Remove low resolution domains from a DCF file
domainalign 
Generate alignments (DAF file) for nodes in a DCF file
domainrep 
Reorder DCF file to identify representative structures
seqalign 
Extend alignments (DAF file) with sequences (DHF file)
seqfraggle 
Removes fragment sequences from DHF files
seqsearch 
Generate PSI-BLAST hits (DHF file) from a DAF file
seqsort 
Remove ambiguous classified sequences from DHF files
seqwords 
Generates DHF files from keyword search of UniProt
libgen 
Generate discriminating elements from alignments
matgen3d 
Generate a 3D-1D scoring matrix from CCF files
rocon 
Generates a hits file from comparing two DHF files
rocplot 
Performs ROC analysis on hits files
siggen 
Generates a sparse protein signature from an alignment
siggenlig 
Generate ligand-binding signatures from a CON file
sigscan 
Generate hits (DHF file) from a signature search
sigscanlig 
Search ligand-signature library & write hits (LHF file)
contacts 
Generate intra-chain CON files from CCF files
interface 
Generate inter-chain CON files from CCF files

Composition of protein sequences

backtranambig 
Back translate a protein sequence to ambiguous codons
backtranseq 
Back translate a protein sequence
charge 
Protein charge plot
checktrans 
Reports STOP codons and ORF statistics of a protein
compseq 
Count composition of dimer/trimer/etc words in a sequence
emowse 
Protein identification by mass spectrometry
freak 
Residue/base frequency table or plot
iep 
Calculates the isoelectric point of a protein
mwcontam 
Shows molwts that match across a set of files
mwfilter 
Filter noisy molwts from mass spec output
octanol 
Displays protein hydropathy
pepinfo 
Plots simple amino acid properties in parallel
pepstats 
Protein statistics
pepwindow 
Displays protein hydropathy
pepwindowall 
Displays protein hydropathy of a set of sequences

Protein motif searches

antigenic 
Finds antigenic sites in proteins
digest 
Protein proteolytic enzyme or reagent cleavage digest
epestfind 
Finds PEST motifs as potential proteolytic cleavage sites
fuzzpro 
Protein pattern search
fuzztran 
Protein pattern search after translation
helixturnhelix 
Report nucleic acid binding motifs
oddcomp 
Find protein sequence regions with a biased composition
patmatdb 
Search a protein sequence with a motif
patmatmotifs 
Search a PROSITE motif database with a protein sequence
pepcoil 
Predicts coiled coil regions
preg 
Regular expression search of a protein sequence
pscan 
Scans proteins using PRINTS
sigcleave 
Reports protein signal cleavage sites
omeme 
Motif detection
emast 
Motif detection
ememe 
Motif detection

Protein sequence mutation

msbar 
Mutate sequence beyond all recognition
shuffleseq 
Shuffles a set of sequences maintaining composition

Protein profile generation and searching

profit 
Scan a sequence or database with a matrix or profile
prophecy 
Creates matrices/profiles from multiple alignments
prophet 
Gapped alignment for profiles

Testing tools, not for general use

crystalball 
Answers every drug discovery question about a sequence
myseq 
Demonstration of sequence reading
mytest 
Demonstration of sequence reading

Database installation

aaindexextract 
Extract data from AAINDEX
cutgextract 
Extract data from CUTG
printsextract 
Extract data from PRINTS
prosextract 
Build the PROSITE motif database for use by patmatmotifs
rebaseextract 
Extract data from REBASE
tfextract 
Extract data from TRANSFAC
cathparse 
Generates DCF file from raw CATH files
domainnr 
Removes redundant domains from a DCF file
domainseqs 
Adds sequence records to a DCF file
domainsse 
Add secondary structure records to a DCF file
scopparse 
Generate DCF file from raw SCOP files
ssematch 
Search a DCF file for secondary structure matches
allversusall 
Sequence similarity data from all-versus-all comparison
seqnr 
Removes redundancy from DHF files
domainer 
Generates domain CCF files from protein CCF files
hetparse 
Converts heterogen group dictionary to EMBL-like format
pdbparse 
Parses PDB files and writes protein CCF files
pdbplus 
Add accessibility & secondary structure to a CCF file
pdbtosp 
Convert swissprot:PDB codes file to EMBL-like format
sites 
Generate residue-ligand CON files from CCF files

Database indexing

dbiblast 
Index a BLAST database
dbifasta 
Database indexing for fasta file databases
dbiflat 
Index a flat file database
dbigcg 
Index a GCG formatted database
dbxfasta 
Database b+tree indexing for fasta file databases
dbxflat 
Database b+tree indexing for flat file databases
dbxgcg 
Database b+tree indexing for GCG formatted databases

Utility tools

embossdata 
Finds or fetches data files read by EMBOSS programs
embossversion 
Writes the current EMBOSS version number

External / contributed packages

CBSTOOLS
The CBSTOOLS package is a set of wrappers to selected applications from the CBS group in Denmark.
CLUSTALOMEGA
A wrapper for the clustal omega application.
DOMAINATRIX
The DOMAINATRIX programs were developed by Jon Ison and colleagues at MRC HGMP for their protein domain research. They are included as an EMBASSY package as a work in progress.
DOMALIGN
The DOMALIGN programs were developed by Jon Ison and colleagues at MRC HGMP for their protein domain research. They are included as an EMBASSY package as a work in progress.
DOMSEARCH
The DOMSEARCH programs were developed by Jon Ison and colleagues at MRC HGMP for their protein domain research. They are included as an EMBASSY package as a work in progress.
EMBOSS
The latest stable version of EMBOSS (excluding bugfix patches from the 'fixes' directory hierarchy)
EMNU
The EMNU package is a simple EMBOSS menu system written by Gary Williams at HGMP.
ESIM4
The ESIM4 package is an EMBOSS conversion of the SIM4 package from Liliana Florea.
HMMER
A suite of application wrappers to the original hmmer v2.3.2 applications written by Sean Eddy. hmmer v2.3.2 must be installed on the same system as EMBOSS and the location of the hmmer executables must be defined in your path for EMBASSY HMMER to work.
IPRSCAN
The IPRSCAN package is a wrapper for the interproscan program.
MEME
The EMBASSY MEME package contains 'wrapper' applications providing an EMBOSS-style interface to the applications in the original MEME package version 4.0.0 developed by Timothy L. Bailey.
MIRA
The MIRA package is a fragment assembly program from Bastien Chevreux. The program was converted to EMBOSS by Alan Bleasby as two applications, one for EST assembly and one for shotgun fragment assembly
MSE
The MSE package is a multiple sequence editor. The program was contributed to the EMBOSS package by the author, Will Gilbert, as one of the first EMBASSY programs.
MYEMBOSS
A package for your own software developments.
MYEMBOSSDEMO
The MYEMBOSSDEMO package contains example applications using EMBOSS data types
PHYLIPNEW
The PHYLIPNEW programs are EMBOSS conversions of the programs in Joe Felsenstein's PHYLIP package, version 3.69.
SIGNATURE
The SIGNATURE programs were developed by Jon Ison and colleagues at MRC HGMP for their protein domain research. They are included as an EMBASSY package as a work in progress.
STRUCTURE
The STRUCTURE programs were developed by Jon Ison and colleagues at MRC HGMP for their protein domain research. They are included as an EMBASSY package as a work in progress.
TOPO
The TOPO package is a graphics program to display membrane protein topology by Susan Jean Johns.
VIENNA
These programs are adapted from the VIENNA RNA package.