List of EMBOSS programs
From Christoph's Personal Wiki
Contents
- 1 Acd file utilities
- 2 Merging sequences to make a consensus
- 3 Finding differences between sequences
- 4 Dot plot sequence comparisons
- 5 Global sequence alignment
- 6 Local sequence alignment
- 7 Multiple sequence alignment
- 8 Publication-quality display
- 9 Enzyme kinetics calculations
- 10 Manipulation and display of sequence annotation
- 11 Hidden markov model analysis
- 12 Information and general help for users
- 13 Menu interface(s)
- 14 Nucleic acid secondary structure
- 15 Codon usage analysis
- 16 Composition of nucleotide sequences
- 17 CpG island detection and analysis
- 18 Predictions of genes and other genomic features
- 19 Nucleic acid motif searches
- 20 Nucleic acid sequence mutation
- 21 Primer prediction
- 22 Nucleic acid profile generation and searching
- 23 Nucleic acid repeat detection
- 24 Restriction enzyme sites in nucleotide sequences
- 25 Transcription factors, promoters and terminator prediction
- 26 Translation of nucleotide sequence to protein sequence
- 27 Phylogenetic consensus methods
- 28 Phylogenetic continuous character methods
- 29 Phylogenetic discrete character methods
- 30 Phylogenetic distance matrix methods
- 31 Phylogenetic gene frequency methods
- 32 Phylogenetic tree drawing methods
- 33 Phylogenetic molecular sequence methods
- 34 Protein secondary structure
- 35 Protein tertiary structure
- 36 Composition of protein sequences
- 37 Protein motif searches
- 38 Protein sequence mutation
- 39 Protein profile generation and searching
- 40 Testing tools, not for general use
- 41 Database installation
- 42 Database indexing
- 43 Utility tools
- 44 External / contributed packages
Acd file utilities
- acdc
- ACD compiler
- acdpretty
- ACD pretty printing utility
- acdtable
- Creates an HTML table from an ACD file
- acdtrace
- ACD compiler on-screen trace
- acdvalid
- ACD file validation
Merging sequences to make a consensus
- cons
- Creates a consensus from multiple alignments
- megamerger
- Merge two large overlapping nucleic acid sequences
- merger
- Merge two overlapping sequences
Finding differences between sequences
- diffseq
- Find differences between nearly identical sequences
Dot plot sequence comparisons
- dotmatcher
- Displays a thresholded dotplot of two sequences
- dotpath
- Non-overlapping wordmatch dotplot of two sequences
- dottup
- Displays a wordmatch dotplot of two sequences
- polydot
- Displays all-against-all dotplots of a set of sequences
Global sequence alignment
- est2genome
- Align EST and genomic DNA sequences
- needle
- Needleman-Wunsch global alignment
- stretcher
- Finds the best global alignment between two sequences
- esim4
- Align an mRNA to a genomic DNA sequence
Local sequence alignment
- matcher
- Finds the best local alignments between two sequences
- seqmatchall
- All-against-all comparison of a set of sequences
- supermatcher
- Match large sequences against one or more other sequences
- water
- Smith-Waterman local alignment
- wordfinder
- Match large sequences against one or more other sequences
- wordmatch
- Finds all exact matches of a given size between 2 sequences
Multiple sequence alignment
- edialign
- Local multiple alignment of sequences
- emma
- Multiple alignment program - interface to ClustalW program
- infoalign
- Information on a multiple sequence alignment
- plotcon
- Plot quality of conservation of a sequence alignment
- prettyplot
- Displays aligned sequences, with colouring and boxing
- showalign
- Displays a multiple sequence alignment
- tranalign
- Align nucleic coding regions given the aligned proteins
- mse
- Multiple Sequence Editor
Publication-quality display
- abiview
- Reads ABI file and display the trace
- cirdna
- Draws circular maps of DNA constructs
- lindna
- Draws linear maps of DNA constructs
- pepnet
- Displays proteins as a helical net
- pepwheel
- Shows protein sequences as helices
- prettyplot
- Displays aligned sequences, with colouring and boxing
- prettyseq
- Output sequence with translated ranges
- remap
- Display sequence with restriction sites, translation etc
- seealso
- Finds programs sharing group names
- showalign
- Displays a multiple sequence alignment
- showdb
- Displays information on the currently available databases
- showfeat
- Show features of a sequence
- showseq
- Display a sequence with features, translation etc
- sixpack
- Display a DNA sequence with 6-frame translation and ORFs
- textsearch
- Search sequence documentation. Slow, use SRS and Entrez!
Enzyme kinetics calculations
- findkm
- Find Km and Vmax for an enzyme reaction
Manipulation and display of sequence annotation
- coderet
- Extract CDS, mRNA and translations from feature tables
- extractfeat
- Extract features from a sequence
- maskfeat
- Mask off features of a sequence
- showfeat
- Show features of a sequence
- twofeat
- Finds neighbouring pairs of features in sequences
Hidden markov model analysis
- oalistat
- Statistics for multiple alignment files
- ohmmalign
- Align sequences with an HMM
- ohmmbuild
- Build HMM
- ohmmcalibrate
- Calibrate a hidden Markov model
- ohmmconvert
- Convert between HMM formats
- ohmmemit
- Extract HMM sequences
- ohmmfetch
- Extract HMM from a database
- ohmmindex
- Index an HMM database
- ohmmpfam
- Align single sequence with an HMM
- ohmmsearch
- Search sequence database with an HMM
- ehmmalign
- Align sequences to an HMM profile
- ehmmbuild
- Build a profile HMM from an alignment
- ehmmcalibrate
- Calibrate HMM search statistics
- ehmmconvert
- Convert between profile HMM file formats
- ehmmemit
- Generate sequences from a profile HMM
- ehmmfetch
- Retrieve an HMM from an HMM database
- ehmmindex
- Create a binary SSI index for an HMM database
- ehmmpfam
- Search one or more sequences against an HMM database
- ehmmsearch
- Search a sequence database with a profile HMM
Information and general help for users
- infoalign
- Information on a multiple sequence alignment
- infoseq
- Displays some simple information about sequences
- seealso
- Finds programs sharing group names
- showdb
- Displays information on the currently available databases
- textsearch
- Search sequence documentation. Slow, use SRS and Entrez!
- tfm
- Displays a program's help documentation manual
- whichdb
- Search all databases for an entry
- wossname
- Finds programs by keywords in their one-line documentation
Menu interface(s)
- emnu
- Simple menu of EMBOSS applications
Nucleic acid secondary structure
- einverted
- Finds DNA inverted repeats
- vrnaalifold
- RNA alignment folding
- vrnaalifoldpf
- RNA alignment folding with partition
- vrnacofold
- RNA cofolding
- vrnacofoldconc
- RNA cofolding with concentrations
- vrnacofoldpf
- RNA cofolding with partitioning
- vrnadistance
- RNA distances
- vrnaduplex
- RNA duplex calculation
- vrnaeval
- RNA eval
- vrnaevalpair
- RNA eval with cofold
- vrnafold
- Calculate secondary structures of RNAs
- vrnafoldpf
- Secondary structures of RNAs with partition
- vrnaheat
- RNA melting
- vrnainverse
- RNA sequences matching a structure
- vrnalfold
- Calculate locally stable secondary structures of RNAs
- vrnaplot
- Plot vrnafold output
- vrnasubopt
- Calculate RNA suboptimals
Codon usage analysis
- cai
- CAI codon adaptation index
- chips
- Codon usage statistics
- codcmp
- Codon usage table comparison
- cusp
- Create a codon usage table
- syco
- Synonymous codon usage Gribskov statistic plot
Composition of nucleotide sequences
- banana
- Bending and curvature plot in B-DNA
- btwisted
- Calculates the twisting in a B-DNA sequence
- chaos
- Create a chaos game representation plot for a sequence
- compseq
- Count composition of dimer/trimer/etc words in a sequence
- dan
- Calculates DNA RNA/DNA melting temperature
- freak
- Residue/base frequency table or plot
- isochore
- Plots isochores in large DNA sequences
- sirna
- Finds siRNA duplexes in mRNA
- wordcount
- Counts words of a specified size in a DNA sequence
CpG island detection and analysis
- cpgplot
- Plot CpG rich areas
- cpgreport
- Reports all CpG rich regions
- geecee
- Calculates fractional GC content of nucleic acid sequences
- newcpgreport
- Report CpG rich areas
- newcpgseek
- Reports CpG rich regions
Predictions of genes and other genomic features
- getorf
- Finds and extracts open reading frames (ORFs)
- marscan
- Finds MAR/SAR sites in nucleic sequences
- plotorf
- Plot potential open reading frames
- showorf
- Pretty output of DNA translations
- sixpack
- Display a DNA sequence with 6-frame translation and ORFs
- syco
- Synonymous codon usage Gribskov statistic plot
- tcode
- Fickett TESTCODE statistic to identify protein-coding DNA
- wobble
- Wobble base plot
Nucleic acid motif searches
- dreg
- Regular expression search of a nucleotide sequence
- fuzznuc
- Nucleic acid pattern search
- fuzztran
- Protein pattern search after translation
- marscan
- Finds MAR/SAR sites in nucleic sequences
Nucleic acid sequence mutation
- msbar
- Mutate sequence beyond all recognition
- shuffleseq
- Shuffles a set of sequences maintaining composition
Primer prediction
- eprimer3
- Picks PCR primers and hybridization oligos
- primersearch
- Searches DNA sequences for matches with primer pairs
- stssearch
- Search a DNA database for matches with a set of STS primers
Nucleic acid profile generation and searching
- profit
- Scan a sequence or database with a matrix or profile
- prophecy
- Creates matrices/profiles from multiple alignments
- prophet
- Gapped alignment for profiles
Nucleic acid repeat detection
- einverted
- Finds DNA inverted repeats
- equicktandem
- Finds tandem repeats
- etandem
- Looks for tandem repeats in a nucleotide sequence
- palindrome
- Looks for inverted repeats in a nucleotide sequence
Restriction enzyme sites in nucleotide sequences
- recoder
- Remove restriction sites but maintain same translation
- redata
- Search REBASE for enzyme name, references, suppliers etc
- remap
- Display sequence with restriction sites, translation etc
- restover
- Find restriction enzymes producing specific overhang
- restrict
- Finds restriction enzyme cleavage sites
- showseq
- Display a sequence with features, translation etc
- silent
- Silent mutation restriction enzyme scan
Transcription factors, promoters and terminator prediction
- tfscan
- Scans DNA sequences for transcription factors
Translation of nucleotide sequence to protein sequence
- backtranambig
- Back translate a protein sequence to ambiguous codons
- backtranseq
- Back translate a protein sequence
- coderet
- Extract CDS, mRNA and translations from feature tables
- plotorf
- Plot potential open reading frames
- prettyseq
- Output sequence with translated ranges
- remap
- Display sequence with restriction sites, translation etc
- showorf
- Pretty output of DNA translations
- showseq
- Display a sequence with features, translation etc
- sixpack
- Display a DNA sequence with 6-frame translation and ORFs
- transeq
- Translate nucleic acid sequences
Phylogenetic consensus methods
- econsense
- Majority-rule and strict consensus tree
- fconsense
- Majority-rule and strict consensus tree
- ftreedist
- Distances between trees
- ftreedistpair
- Distances between two sets of trees
Phylogenetic continuous character methods
- econtml
- Continuous character Maximum Likelihood method
- econtrast
- Continuous character Contrasts
- fcontrast
- Continuous character Contrasts
Phylogenetic discrete character methods
- eclique
- Largest clique program
- edollop
- Dollo and polymorphism parsimony algorithm
- edolpenny
- Penny algorithm Dollo or polymorphism
- efactor
- Multistate to binary recoding program
- emix
- Mixed parsimony algorithm
- epenny
- Penny algorithm, branch-and-bound
- fclique
- Largest clique program
- fdollop
- Dollo and polymorphism parsimony algorithm
- fdolpenny
- Penny algorithm Dollo or polymorphism
- ffactor
- Multistate to binary recoding program
- fmix
- Mixed parsimony algorithm
- fmove
- Interactive mixed method parsimony
- fpars
- Discrete character parsimony
- fpenny
- Penny algorithm, branch-and-bound
Phylogenetic distance matrix methods
- efitch
- Fitch-Margoliash and Least-Squares Distance Methods
- ekitsch
- Fitch-Margoliash method with contemporary tips
- eneighbor
- Phylogenies from distance matrix by N-J or UPGMA method
- ffitch
- Fitch-Margoliash and Least-Squares Distance Methods
- fkitsch
- Fitch-Margoliash method with contemporary tips
- fneighbor
- Phylogenies from distance matrix by N-J or UPGMA method
Phylogenetic gene frequency methods
- egendist
- Genetic Distance Matrix program
- fcontml
- Gene frequency and continuous character Maximum Likelihood
- fgendist
- Compute genetic distances from gene frequencies
Phylogenetic tree drawing methods
- distmat
- Creates a distance matrix from multiple alignments
- ednacomp
- DNA compatibility algorithm
- ednadist
- Nucleic acid sequence Distance Matrix program
- ednainvar
- Nucleic acid sequence Invariants method
- ednaml
- Phylogenies from nucleic acid Maximum Likelihood
- ednamlk
- Phylogenies from nucleic acid Maximum Likelihood with clock
- ednapars
- DNA parsimony algorithm
- ednapenny
- Penny algorithm for DNA
- eprotdist
- Protein distance algorithm
- eprotpars
- Protein parsimony algorithm
- erestml
- Restriction site Maximum Likelihood method
- eseqboot
- Bootstrapped sequences algorithm
- fdiscboot
- Bootstrapped discrete sites algorithm
- fdnacomp
- DNA compatibility algorithm
- fdnadist
- Nucleic acid sequence Distance Matrix program
- fdnainvar
- Nucleic acid sequence Invariants method
- fdnaml
- Estimates nucleotide phylogeny by maximum likelihood
- fdnamlk
- Estimates nucleotide phylogeny by maximum likelihood
- fdnamove
- Interactive DNA parsimony
- fdnapars
- DNA parsimony algorithm
- fdnapenny
- Penny algorithm for DNA
- fdolmove
- Interactive Dollo or Polymorphism Parsimony
- ffreqboot
- Bootstrapped genetic frequencies algorithm
- fproml
- Protein phylogeny by maximum likelihood
- fpromlk
- Protein phylogeny by maximum likelihood
- fprotdist
- Protein distance algorithm
- fprotpars
- Protein parsimony algorithm
- frestboot
- Bootstrapped restriction sites algorithm
- frestdist
- Distance matrix from restriction sites or fragments
- frestml
- Restriction site maximum Likelihood method
- fseqboot
- Bootstrapped sequences algorithm
- fseqbootall
- Bootstrapped sequences algorithm
Phylogenetic molecular sequence methods
- fdrawgram
- Plots a cladogram- or phenogram-like rooted tree diagram
- fdrawtree
- Plots an unrooted tree diagram
- fretree
- Interactive tree rearrangement
Protein secondary structure
- garnier
- Predicts protein secondary structure
- helixturnhelix
- Report nucleic acid binding motifs
- hmoment
- Hydrophobic moment calculation
- pepcoil
- Predicts coiled coil regions
- pepnet
- Displays proteins as a helical net
- pepwheel
- Shows protein sequences as helices
- tmap
- Displays membrane spanning regions
- topo
- Draws an image of a transmembrane protein
Protein tertiary structure
- psiphi
- Phi and psi torsion angles from protein coordinates
- domainreso
- Remove low resolution domains from a DCF file
- domainalign
- Generate alignments (DAF file) for nodes in a DCF file
- domainrep
- Reorder DCF file to identify representative structures
- seqalign
- Extend alignments (DAF file) with sequences (DHF file)
- seqfraggle
- Removes fragment sequences from DHF files
- seqsearch
- Generate PSI-BLAST hits (DHF file) from a DAF file
- seqsort
- Remove ambiguous classified sequences from DHF files
- seqwords
- Generates DHF files from keyword search of UniProt
- libgen
- Generate discriminating elements from alignments
- matgen3d
- Generate a 3D-1D scoring matrix from CCF files
- rocon
- Generates a hits file from comparing two DHF files
- rocplot
- Performs ROC analysis on hits files
- siggen
- Generates a sparse protein signature from an alignment
- siggenlig
- Generate ligand-binding signatures from a CON file
- sigscan
- Generate hits (DHF file) from a signature search
- sigscanlig
- Search ligand-signature library & write hits (LHF file)
- contacts
- Generate intra-chain CON files from CCF files
- interface
- Generate inter-chain CON files from CCF files
Composition of protein sequences
- backtranambig
- Back translate a protein sequence to ambiguous codons
- backtranseq
- Back translate a protein sequence
- charge
- Protein charge plot
- checktrans
- Reports STOP codons and ORF statistics of a protein
- compseq
- Count composition of dimer/trimer/etc words in a sequence
- emowse
- Protein identification by mass spectrometry
- freak
- Residue/base frequency table or plot
- iep
- Calculates the isoelectric point of a protein
- mwcontam
- Shows molwts that match across a set of files
- mwfilter
- Filter noisy molwts from mass spec output
- octanol
- Displays protein hydropathy
- pepinfo
- Plots simple amino acid properties in parallel
- pepstats
- Protein statistics
- pepwindow
- Displays protein hydropathy
- pepwindowall
- Displays protein hydropathy of a set of sequences
Protein motif searches
- antigenic
- Finds antigenic sites in proteins
- digest
- Protein proteolytic enzyme or reagent cleavage digest
- epestfind
- Finds PEST motifs as potential proteolytic cleavage sites
- fuzzpro
- Protein pattern search
- fuzztran
- Protein pattern search after translation
- helixturnhelix
- Report nucleic acid binding motifs
- oddcomp
- Find protein sequence regions with a biased composition
- patmatdb
- Search a protein sequence with a motif
- patmatmotifs
- Search a PROSITE motif database with a protein sequence
- pepcoil
- Predicts coiled coil regions
- preg
- Regular expression search of a protein sequence
- pscan
- Scans proteins using PRINTS
- sigcleave
- Reports protein signal cleavage sites
- omeme
- Motif detection
- emast
- Motif detection
- ememe
- Motif detection
Protein sequence mutation
- msbar
- Mutate sequence beyond all recognition
- shuffleseq
- Shuffles a set of sequences maintaining composition
Protein profile generation and searching
- profit
- Scan a sequence or database with a matrix or profile
- prophecy
- Creates matrices/profiles from multiple alignments
- prophet
- Gapped alignment for profiles
Testing tools, not for general use
- crystalball
- Answers every drug discovery question about a sequence
- myseq
- Demonstration of sequence reading
- mytest
- Demonstration of sequence reading
Database installation
- aaindexextract
- Extract data from AAINDEX
- cutgextract
- Extract data from CUTG
- printsextract
- Extract data from PRINTS
- prosextract
- Build the PROSITE motif database for use by patmatmotifs
- rebaseextract
- Extract data from REBASE
- tfextract
- Extract data from TRANSFAC
- cathparse
- Generates DCF file from raw CATH files
- domainnr
- Removes redundant domains from a DCF file
- domainseqs
- Adds sequence records to a DCF file
- domainsse
- Add secondary structure records to a DCF file
- scopparse
- Generate DCF file from raw SCOP files
- ssematch
- Search a DCF file for secondary structure matches
- allversusall
- Sequence similarity data from all-versus-all comparison
- seqnr
- Removes redundancy from DHF files
- domainer
- Generates domain CCF files from protein CCF files
- hetparse
- Converts heterogen group dictionary to EMBL-like format
- pdbparse
- Parses PDB files and writes protein CCF files
- pdbplus
- Add accessibility & secondary structure to a CCF file
- pdbtosp
- Convert swissprot:PDB codes file to EMBL-like format
- sites
- Generate residue-ligand CON files from CCF files
Database indexing
- dbiblast
- Index a BLAST database
- dbifasta
- Database indexing for fasta file databases
- dbiflat
- Index a flat file database
- dbigcg
- Index a GCG formatted database
- dbxfasta
- Database b+tree indexing for fasta file databases
- dbxflat
- Database b+tree indexing for flat file databases
- dbxgcg
- Database b+tree indexing for GCG formatted databases
Utility tools
- embossdata
- Finds or fetches data files read by EMBOSS programs
- embossversion
- Writes the current EMBOSS version number
External / contributed packages
- CBSTOOLS
- The CBSTOOLS package is a set of wrappers to selected applications from the CBS group in Denmark.
- CLUSTALOMEGA
- A wrapper for the clustal omega application.
- DOMAINATRIX
- The DOMAINATRIX programs were developed by Jon Ison and colleagues at MRC HGMP for their protein domain research. They are included as an EMBASSY package as a work in progress.
- DOMALIGN
- The DOMALIGN programs were developed by Jon Ison and colleagues at MRC HGMP for their protein domain research. They are included as an EMBASSY package as a work in progress.
- DOMSEARCH
- The DOMSEARCH programs were developed by Jon Ison and colleagues at MRC HGMP for their protein domain research. They are included as an EMBASSY package as a work in progress.
- EMBOSS
- The latest stable version of EMBOSS (excluding bugfix patches from the 'fixes' directory hierarchy)
- EMNU
- The EMNU package is a simple EMBOSS menu system written by Gary Williams at HGMP.
- ESIM4
- The ESIM4 package is an EMBOSS conversion of the SIM4 package from Liliana Florea.
- HMMER
- A suite of application wrappers to the original hmmer v2.3.2 applications written by Sean Eddy. hmmer v2.3.2 must be installed on the same system as EMBOSS and the location of the hmmer executables must be defined in your path for EMBASSY HMMER to work.
- IPRSCAN
- The IPRSCAN package is a wrapper for the interproscan program.
- MEME
- The EMBASSY MEME package contains 'wrapper' applications providing an EMBOSS-style interface to the applications in the original MEME package version 4.0.0 developed by Timothy L. Bailey.
- MIRA
- The MIRA package is a fragment assembly program from Bastien Chevreux. The program was converted to EMBOSS by Alan Bleasby as two applications, one for EST assembly and one for shotgun fragment assembly
- MSE
- The MSE package is a multiple sequence editor. The program was contributed to the EMBOSS package by the author, Will Gilbert, as one of the first EMBASSY programs.
- MYEMBOSS
- A package for your own software developments.
- MYEMBOSSDEMO
- The MYEMBOSSDEMO package contains example applications using EMBOSS data types
- PHYLIPNEW
- The PHYLIPNEW programs are EMBOSS conversions of the programs in Joe Felsenstein's PHYLIP package, version 3.69.
- SIGNATURE
- The SIGNATURE programs were developed by Jon Ison and colleagues at MRC HGMP for their protein domain research. They are included as an EMBASSY package as a work in progress.
- STRUCTURE
- The STRUCTURE programs were developed by Jon Ison and colleagues at MRC HGMP for their protein domain research. They are included as an EMBASSY package as a work in progress.
- TOPO
- The TOPO package is a graphics program to display membrane protein topology by Susan Jean Johns.
- VIENNA
- These programs are adapted from the VIENNA RNA package.