Jinjiang: June 2012

Tools for Sequence Analysis

Ref: http://bioinformatics.igc.gulbenkian.pt/resources/tools/sequenceanalysis

List of exhaustive tools with links to sequence analysis

Sequence Manipulation

tool	description
SMS	Sequence Manipulation Suite - Here you can find a collection of programs for generating, formatting, and analyzing DNA and protein sequences.
MERGER	Tool from the EMBOSS package joins two overlapping nucleic acid sequences into one merged sequence.
Reverse	Tool to convert a DNA sequence into its reverse, complement, or reverse-complement .

Restriction Analysis

tool	description
REBASE	Restriction Enzyme Database - search for restriction enzymes by name, species, recognition sequence, companies that sell restriction enzymes or by authors and citations associated with each enzyme.
NEBcutter	A nice site for generating retriction maps and identification of non-overlapping ORFs.
Mapper	Generate several types of graphics and text-based maps for restriction enzymes.
WebCut	An on-line tool for restriction analysis, silent mutation scanning, and SNP-RFLP analysis.

Primer Design

Rules to design primers

tool	description
Primer3	Primer3 is a widely used program for designing PCR primers.
Primaclade	Application that accepts a multiple species nucleotide alignment file as input and identifies a set of PCR primers that will bind across the alignment. The program iteratively runs the Primer3 application for each alignment sequence and collates the results.
ProbeFinder	Design intron-spanning assays for your target gene. You can select the organism of interest and enter the target-gene name, gene ID or nucleotide sequence.
RT-Primer Design	Real Time PCR primer design.
CODEHOP	The Consensus-degenerate hybrid oligonucleotide primers program designs PCR primers from protein multiple-sequence alignments and is intended for cases where the protein sequences are distant from each other and degenerate primers are needed Help.
MEME for primer design	Method for designing degenerate primers based on multiple local alignments employing the MEME algorithm supported with electronic PCR.

Finding Genes

tool	description
GENSCAN	Gene identification program which analyzes genomic DNA sequences from a variety of organisms including human, other vertebrates, invertebrates and plants.
GeneMark	Package of programs for gene prediction in Bacteria, Archaea and Metagenomes; Eukaryotes; Viruses, Phages and Plasmids and EST.
Softberry	Gene finding in Eukaryote, Bacteria and Virus.
GrailEXP	Software that predicts exons, genes, promoters, polyAs, CpG islands, EST similarities, and repetitive elements within DNA sequence.
Generation	Software that performs gene predictions on microbial and model organisms and produce a set of data which can be used by GrailEXP v3.0 to recognize genes in these organisms.
DragonGSF	Prediction of gene start location in mammalian genomes, by combining information about CpG islands, transcription start sites (TSSs), and signals downstream of the predicted TSSs.
GeneWise	Software thar compares a protein sequence to a genomic DNA sequence, allowing for introns and frameshifting errors.
Link	A list of gene prediction programs for both eukaryotic and prokaryotic organisms.

Finding Promoters and Regulatory Elements

tool	description
TFSEARCH	Searching DNA for eukaryotic transcription Factor Binding Sites and DNA-binding profiles (searches TransFAC).
ConSite	Tool for finding cis-regulatory elements in genomic sequences. Predictions are based on the integration of binding site prediction generated with high-quality transcription factor models and cross-species comparison filtering (phylogenetic footprinting).
TESS	Web tool for predicting transcription factor binding sites in DNA sequences. It can identify binding sites using site or consensus strings and positional weight matrices from the TRANSFAC, JASPAR, IMD, CBIL-GibbsMat database. You can use TESS to search a few of your own sequences or for user-defined CRMs genome-wide near genes throughout genomes of interest.
Softberry	Gene finding in Eukaryote, Bacteria and Virus. Go to Test on Line on the left side, and seach on search Motifs menu.
NNPP	Neural Network Promoter Prediction - Promoter Prediction by Neural Network for prokaryotes or eukaryotes.
PromoterScan	Predicts Promoter regions based on scoring homologies with putative eukaryotic Pol II promoter sequences.
Promoter	Predicts transcription start sites of vertebrate PolII promoters in DNA sequences.

Identify Splice Junctions

tool	description
NNSPLICE	Splice Site Prediction by Neural Network for drosophila and human/other Help.
NetGene2	The NetGene2 server is a service producing neural network predictions of splice sites in human, C. elegans and A. thaliana DNA Help.
MaxEntScan	MaxEntScan was used to score the splice site signals of each exon-intron junction. MaxEntScan is based on the approach for modeling the sequences of short sequence motifs such as those involved in RNA splicing which simultaneously accounts for non-adjacent as well as adjacent dependencies between positions.
SROOGLE	The Splicing RegulatiOn Online Graphical Engine combines: 1)Availability of data - accessibility to large sets of published data; 2) Integration of data - integrative overview of the signals characterizing exons of interest; 3) Intuitive statistical measures -many algorithms provide output which are not directly interpretable (e.g. delta-G scores, PSSM log odd scores), etc.. 4) User friendliness - intuitive, interactive, graphical user interface and on dynamic java-script programming, enabling users to interactively modify their input. Help.
HSF	The Human Splicing Finder is an online bioinformatics tool to predict the effects of mutations on splicing signals or to identify splicing motifs in any human sequence. Help.
SplicePort	SplicePort is a splice-site analysis tool that makes splice-site predictions for submitted sequences, and allows browsing of predictive signals and motif exploration. This collection of signals is capable of achieving high classification accuracy on human splice sites. Help.
ASSP	Prediction of putative alternative exon isoform, cryptic, and constitutive splice sites of internal (coding) exons Help.

Sequence repeat finders

tool	description
RepeatMasker	Program that screens DNA sequences for low complexity DNA sequences and interspersed repeats. The masked out sequence can be used for example BLAST searches. Repeats are stored in the datbase Repbase update.

Sequence Motif Finders

tool	description
Sequence Motif Finder	Scan Nucleotide or Protein Sequences for Matching Patterns.
ELPH	Estimated Locations of Pattern Hits - Find motifs in a set of DNA or protein sequences Tutorial.

Translation Tools

tool	description
Transeq	Tool from the EMBOSS package, translates nucleic acid sequences to the corresponding peptide sequence. It has option for which Genetic Code Table to use.
Translate	This tool allows the translation of a nucleotide (DNA/RNA) sequence to a protein sequence using the standard genetic code.
ORF Finder	Open Reading Frame Finder is a graphical analysis tool which finds all open reading frames of a selectable minimum size in a user’s sequence or in a sequence already in the database using the standard or alternative genetic codes.

Post-Translational Modifications

tool	description
PTM Tutorial	Post-translationam modification tutorial.
PTM prediction tools	A survey of publicly available PTM web resources, databases and classification/prediction servers.
GlycoMod	Tool that can predict the possible oligosaccharide structures that occur on proteins from their experimentally determined masses. The program can be used for free or derivatized oligosaccharides and for glycopeptides.
Myristoylator	This tool predicts N-terminal myristoylation of proteins by neural networks.Only N-terminal glycines are myristoylated (leading methionines are cleaved prior to myristoylation).
GPS	Group-based Phosphorylation Scoring method is a tool for in silico prediction of phosphorylation sites with their specific kinases.
NetPhos	Tool that produces neural network predictions for serine, threonine and tyrosine phosphorylation sites in eukaryotic proteins.
SUMOsp	Tool for in silico sumoylation sites prediction. SUMOylation, a reversible post-translational modification of proteins by the small ubiquitin-related modifiers (SUMO), is crucial in a variety of biological processes.

Align Two Sequences

tool	description
bl2seq	This tool produces the alignment of two given sequences using the NCBI BLAST engine for local alignment. The output shows the similar region.
Needle	EMBOSS Pairwise Alignment Algorithms tool used to compare 2 sequences when you want an alignment that covers the whole length of both sequences.
Water	EMBOSS Pairwise Alignment Algorithms tool used when you are trying to find the best region of similarity between two sequences.

Multiple Sequence Alignment

tool	description
Multalin	Multiple sequence alignment for DNA or proteins with hierarchical clustering.
ClustalW	Multiple sequence alignment program for DNA or proteins sequences. It calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen. Evolutionary relationships can be seen via viewing Cladograms or Phylograms.
Tcoffee	Computes a multiple sequence alignment and the associated phylogenetic tree for a set of sequences (Proteins or DNA). T-Coffee allows the combination of a collection of multiple/pairwise, global or local alignments into a single model. It also allows to estimate the level of consistency of each position within the new alignment with the rest of the alignments.

BLAST

<><><><><> <> <><><> <> <><><> <><><><> <><><><> <> <><><> <> <><><> <><><><> <>

tool	description
BLAST	The Basic Local Alignment Search Tool (NCBI) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches Tutorial.
Blast@EBI	Here you can find a list of all the Blast´s available at the EBI including the Ensembl Multi BlastView to the annotated genomes.

UCSC Genome Browser Utilities


	Acknowledged the contribution of the UCSC Genome Bioinformatics Group. . Batch Coordinate Conversion (liftOver) - converts genome coordinates and genome annotation files between assemblies. The current version supports both forward and reverse conversions, as well as conversions between selected species. . DNA Duster - removes formatting characters and other non-sequence-related characters from an input sequence. Offers several configuration options for the output format, including translated protein. . Protein Duster - removes formatting characters and other non-sequence-related characters from an input sequence. Offers several configuration options for the output format. . Phylogenetic Tree Gif Maker - creates a gif image from the phylogenetic tree specification given. Offers several configuration options for branch lengths, normalized lengths, branch labels, legend etc. . Source Code Downloads - The Genome Browser, Blat and liftOver source code is freely downloadable for academic, noncommercial, and personal use.

Acknowledged the contribution of the UCSC Genome Bioinformatics Group.

. Batch Coordinate Conversion (liftOver) - converts genome coordinates and genome annotation files between assemblies. The current version supports both forward and reverse conversions, as well as conversions between selected species.

. DNA Duster - removes formatting characters and other non-sequence-related characters from an input sequence. Offers several configuration options for the output format, including translated protein.
. Protein Duster - removes formatting characters and other non-sequence-related characters from an input sequence. Offers several configuration options for the output format.
. Phylogenetic Tree Gif Maker - creates a gif image from the phylogenetic tree specification given. Offers several configuration options for branch lengths, normalized lengths, branch labels, legend etc.
. Source Code Downloads - The Genome Browser, Blat and liftOver source code is freely downloadable for academic, noncommercial, and personal use.

Gene prediction software

Ab initio approaches

Name	Description	Links	References
ATGpr	identifying translational initiation sites in cDNA sequences
AUGUSTUS	predicts genes in eukaryotic genomic sequences	webserver	^[1]
BGF	hidden Markov model (HMM) and dynamic programming based ab initio gene prediction program
DIOGENES	a system for fast detection of coding regions in short genomic sequences
Dragon Promoter Finder	software for recognition of vertebrate RNA Polymerase II promoters
EUGENE	gene finding for Arabidopsis thaliana
FRAMED	find genes and frameshift in G+C rich prokaryotic sequences	webserver	^[2]
GENIUS	linking ORFs in complete genomes to protein 3D structures
GENEID	program to predict genes, exons, splice sites and other signals along a DNA sequence
GENEPARSER	Parse a DNA sequence into introns and exons
GeneMark	family of gene prediction programs	webserver
GeneMark.hmm	gene prediction program for prokaryotes and eukaryotes	webserver	^[3]
GeneTack	prediction of genes with frameshifts in prokaryotic genomes	webserver	^[4]
NIX	web tool for combining results from different programs (GRAIL, FEX, HEXON, MZEF, GENEMARK, GENEFINDER, FGENE, BLAST, POLYAH, REPEATMASKER, TRNASCAN)
GLIMMER	finding genes in microbial DNA	sourcecode webserver
VEIL	hidden markov model for finding genes in vertebrate DNA Server
MORGAN	a decision tree system for finding genes in vertebrate DNA
SPLICEPREDICTOR	a method to identify potential splice sites in (plant) pre-mRNA by sequence inspection using Bayesian statistical models
GENESCAN	finding genes using Fourier transform	webserver	^[5]
NNPP	promoter prediction by neural network
NNSPLICE	splice site prediction by neural network
Regulatory Sequence Analysis Tools	provides a series of modular computer programs specifically designed for the detection of regulatory signals in non-coding sequences.
GENOMESCAN	predicts locations and exon-intron structures of genes in genomic sequences from a variety of organisms.	webserver
ORF FINDER	a graphical analysis tool which finds all open reading frames
GrailEXP	predicts exons, genes, promoters, polyas, CpG islands, EST similarities, and repetitive elements within DNA sequence

RNA structure prediction software

Single sequence secondary structure prediction

Name	Description	Knots	Links	References
CentroidFold	Secondary structure prediction based on generalized centroid estimator	no	sourcecode webserver	^[1]
CentroidHomfold	Secondary structure prediction by using homologous sequence information	no	sourcecode webserver	^[2]
CONTRAfold	Secondary structure prediction method based on conditional log-linear models (CLLMs), a flexible class of probabilistic models which generalize upon SCFGs by using discriminative training and feature-rich scoring.	no	sourcecode webserver	^[3]
CyloFold	Secondary structure prediction method based on placement of helices allowing complex pseudoknots.	yes	webserver	^[4]
KineFold	Folding kinetics of RNA sequences including pseudoknots by including an implementation of the partition function for knots.	yes	linuxbinary, webserver	^[5][6]
Mfold	MFE (Minimum Free Energy) RNA structure prediction algorithm.	no	sourcecode, webserver	^[7]
Pknots	A dynamic programming algorithm for optimal RNA pseudoknot prediction using the nearest neighbour energy model.	yes	sourcecode	^[8]
PknotsRG	A dynamic programming algorithm for the prediction of a restricted class of RNA pseudoknots.	yes	sourcecode, webserver	^[9]
RNA123	Secondary structure prediction via thermodynamic-based folding algorithms and novel structure-based sequence alignment specific for RNA.	yes	webserver
RNAfold	MFE RNA structure prediction algorithm. Includes an implementation of the partition function for computing basepair probabilities and circular RNA folding.	no	sourcecode, webserver	^{[7][10][11][12][13][14]}
RNAshapes	MFE RNA structure prediction based on abstract shapes. Shape abstraction retains adjacency and nesting of structural features, but disregards helix lengths, thus reduces the number of suboptimal solutions without losing significant information. Furthermore, shapes represent classes of structures for which probabilities based on Boltzmann-weighted energies can be computed.	no	source & binaries, webserver	^[15][16]
RNAstructure	A program to predict lowest free energy structures and base pair probabilities for RNA or DNA sequences. Programs are also available to predict Maximum Expected Accuracy structures and these can include pseudoknots. Structure prediction can be constrained using experimental data, including SHAPE, enzymatic cleavage, and chemical modification accessibility. Graphical user interfaces are available for Windows and for Mac OS-X/Linux. Programs are also available for use with Unix-style text interfaces. Additionally, a C++ class library is available.	yes	source & binaries	^[17][18]
Sfold	Statistical sampling of all possible structures. The sampling is weighted by partition function probabilities.	no	webserver	^{[19][20][21][22]}
UNAFold	The UNAFold software package is an integrated collection of programs that simulate folding, hybridization, and melting pathways for one or two single-stranded nucleic acid sequences.	no	sourcecode	^[23]
Crumple	Crumple is simple, cleanly written software for producing the full set of possible secondary structures for a single sequence, given optional constraints.	no	sourcecode	^[24]
Sliding Windows & Assembly	Sliding windows and assembly is a tool chain for folding long series of similar hairpins.	no	sourcecode	^[24]

Single sequence tertiary structure prediction

Name	Description	Knots	Links	References
BARNACLE	A Python library for the probabilistic sampling of RNA structures that are compatible with a given nucleotide sequence and that are RNA-like on a local length scale.	yes	sourcecode	^[25]
FARNA	Automated de novo prediction of native-like RNA tertiary structures .	yes	sourcecode	^[26]
iFoldRNA	three-dimensional RNA structure prediction and folding	yes	webserver	^[27]
MC-Fold MC-Sym Pipeline	Thermodynamics and Nucleotide cyclic motifs for RNA structure prediction algorithm. 2D and 3D structures.	yes	sourcecode, webserver	^[28]
NAST	Coarse-grained modeling of large RNA molecules with knowledge-based potentials and structural filters	?	sourcecode	^[29]
RNA123	An integrated platform for de novo and homology modeling of RNA 3D structures, where coordinate file input, sequence editing, sequence alignment, structure prediction and analysis features are all accessed from a single intuitive graphical user interface.	yes	webserver
*Knots: Pseudoknot prediction, <yes\|no>.

Comparative methods

The single sequence methods mentioned above have a difficult job detecting a small sample of reasonable secondary structures from a large space of possible structures. A good way to reduce the size of the space is to use evolutionary approaches. Structures that have been conserved by evolution are far more likely to be the functional form. The methods below use this approach.

Name	Description	Number of sequences	Alignment	Structure	Knots	Link	References
Carnac	Comparative analysis combined with MFE folding.	any	no	yes	no	sourcecode, webserver	^[30][31]
CentroidAlifold	Common secondary structure prediction based on generalized centroid estimator	any	yes	no	no	sourcecode webserver	^[32]
CentroidAlign	Fast and accurate multiple aligner for RNA sequences	any	yes	no	no	sourcecode	^[33]
CMfinder	an expectation maximization algorithm using covariance models for motif description. Uses heuristics for effective motif search, and a Bayesian framework for structure prediction combining folding energy and sequence covariation.		yes	yes	no	sourcecode, webserver, website	^[34]
CONSAN	implements a pinned Sankoff algorithm for simultaneous pairwise RNA alignment and consensus structure prediction.	2	yes	yes	no	sourcecode	^[35]
Dynalign	an algorithm that improves the accuracy of structure prediction by combining free energy minimization and comparative sequence analysis to find a low free energy structure common to two sequences without requiring any sequence identity.	2	yes	yes	no	sourcecode	^[36][37][38]
FoldalignM	A multiple RNA structural RNA alignment method, to a large extend based on the PMcomp program.	any	yes	yes	no	sourcecode	^[39]
KNetFold	Computes a consensus RNA secondary structure from an RNA sequence alignment based on machine learning.	any	input	yes	yes	linuxbinary, webserver	^[40]
LARA	Produce a global fold and alignment of ncRNA families using integer linear programming and Lagrangian relaxation.	any	yes	yes	no	sourcecode	^[41]
LocaRNA	LocaRNA is the successor of PMcomp with an improved time complexity. It is a variant of Sankoff's algorithm for simultaneous folding and alignment, which takes as input pre-computed base pair probability matrices from McCaskill's algorithm as produced by RNAfold -p. Thus the method can also be viewed as way to compare base pair probability matrices.	any	yes	yes	no	sourcecode	^[42]
MASTR	A sampling approach using Markov chain Monte Carlo in a simulated annealing framework, where both structure and alignment is optimized by making small local changes. The score combines the log-likelihood of the alignment, a covariation term and the basepair probabilities.	any	yes	yes	no	sourcecode	^[43][44]
Murlet	a multiple alignment tool for RNA sequences using iterative alignment based on Sankoff's algorithm with sharply reduced computational time and memory.	any	yes	yes	no	webserver	^[45]
MXSCARNA	a multiple alignment tool for RNA sequences using progressive alignment based on pairwise structural alignment algorithm of SCARNA.	any	yes	yes	no	webserver sourcecode	^[46]
PARTS	A method for joint prediction of alignment and common secondary structures of two RNA sequences using a probabilistic model based on pseudo free energies obtained from precomputed base pairing and alignment probabilities.	2	yes	yes	no	sourcecode	^[47]
Pfold	Folds alignments using a SCFG trained on rRNA alignments.		input	yes	no	webserver	^[48][49]
PETfold	Formally integrates both the energy-based and evolution-based approaches in one model to predict the folding of multiple aligned RNA sequences by a maximum expected accuracy scoring. The structural probabilities are calculated by RNAfold and Pfold.	any	input	yes	no	sourcecode	^[50]
PMcomp/PMmulti	PMcomp is a variant of Sankoff's algorithm for simultaneous folding and alignment, which takes as input pre-computed base pair probability matrices from McCaskill's algorithm as produced by RNAfold -p. Thus the method can also be viewed as way to compare base pair probability matrices. PMmulti is a wrapper program that does progressive multiple alignments by repeatedly calling pmcomp		yes	yes	no	sourcecode, webserver	^[51]
R-COFFEE	uses RNAlpfold to compute the secondary structure of the provided sequences. A modified version of T-Coffee is then used to compute the multiple sequence alignment having the best agreement with the sequences and the structures. R-Coffee can be combined with any existing sequence alignment method.	any	yes	yes	no	sourcecode, webserver	^[52][53]
RNA123	The structure based sequence alignment (SBSA) algorithm within RNA123 utilizes a novel suboptimal version of the Needleman-Wunsch global sequence alignment method that fully accounts for secondary structure in the template and query. It also utilizes two separate substitution matrices that are optimized for RNA helices and single stranded regions. The SBSA algorithm provides >90% accurate sequence alignments even for structures as large as bacterial 23S rRNA (~2800 nts).	any	yes	yes	yes	webserver
RNAalifold	Folds precomputed alignments using a combination of free-energy and a covariation measures. Ships with the Vienna package.	any	input	yes	no	homepage	^[10][54]
RNAcast	enumerates the near-optimal abstract shape space, and predicts as the consensus an abstract shape common to all sequences, and for each sequence, the thermodynamically best structure which has this abstract shape.	any	no	yes	no	sourcecode, webserver	^[55]
RNAforester	Compare and align RNA secondary structures via a "forest alignment" approach.	any	yes	input	no	sourcecode, webserver	^[56][57]
RNAmine	Frequent stem pattern miner from unaligned RNA sequences is a software tool to extract the structural motifs from a set of RNA sequences.	any	no	yes	no	webserver	^[58]
RNASampler	A probabilistic sampling approach that combines intrasequence base pairing probabilities with intersequence base alignment probabilities. This is used to sample possible stems for each sequence and compare these stems between all pairs of sequences to predict a consensus structure for two sequences. The method is extended to predict the common structure conserved among multiple sequences by using a consistency-based score that incorporates information from all the pairwise structural alignments.	any	yes	yes	yes	sourcecode	^[59]
SCARNA	Stem Candidate Aligner for RNA (Scarna) is a fast, convenient tool for structural alignment of a pair of RNA sequences. It aligns two RNA sequences and calculates the similarities of them, based on the estimated common secondary structures. It works even for pseudoknotted secondary structures.	2	yes	yes	no	webserver	^[60]
SimulFold	simultaneously inferring RNA structures including pseudoknots, alignments, and trees using a Bayesian MCMC framework.	any	yes	yes	yes	sourcecode	^[61]
Stemloc	a program for pairwise RNA structural alignment based on probabilistic models of RNA structure known as Pair stochastic context-free grammars.	any	yes	yes	no	sourcecode	^[62]
StrAl	an alignment tool designed to provide multiple alignments of non-coding RNAs following a fast progressive strategy. It combines the thermodynamic base pairing information derived from RNAfold calculations in the form of base pairing probability vectors with the information of the primary sequence.		yes	no	no	sourcecode, webserver	^[63]
TFold	A tool for predicting non-coding RNA secondary structures including pseudoknots. It takes in input an alignment of RNA sequences and returns the predicted secondary structure(s).It combines criteria of stability, conservation and covariation in order to search for stems and pseudoknots. Users can change different parameters values, set (or not) some known stems (if there are) which are taken into account by the system, choose to get several possible structures or only one, search for pseudoknots or not, etc.	any	yes	yes	yes	webserver	^[64]
WAR	a webserver that makes it possible to simultaneously use a number of state of the art methods for performing multiple alignment and secondary structure prediction for noncoding RNA sequences.		yes	yes	no	webserver	^[65]
Xrate	a program for analysis of multiple sequence alignments using phylogenetic grammars, that may be viewed as a flexible generalization of the "Pfold" program.	any	yes	yes	no	sourcecode	^[66]
* Number of sequences: <any\|num>. * Alignment: predicts an alignment, <input\|yes\|no>. * Structure: predicts structure, <input\|yes\|no>. * Knots: pseudoknot prediction, <yes\|no>.

Inter molecular interactions: RNA-RNA

Many ncRNAs function by binding to other RNAs. For example, miRNAs regulate protein coding gene expression by binding to 3' UTRs, small nucleolar RNAs guide post-transcriptional modifications by binding to rRNA, U4 spliceosomal RNA and U6 spliceosomal RNA bind to each other forming part of the spliceosome and many small bacterial RNAs regulate gene expression by antisense interactions E.g. GcvB, OxyS and RyhB.

Name	Description	Intra-molecular structure	Comparative	Link	References
GUUGle	A utility for fast determination of RNA-RNA matches with perfect hybridization via A-U, C-G, and G-U base pairing.	no	no	webserver	^[67]
IntaRNA	Efficient target prediction incorporating the accessibility of target sites	yes	no	sourcecode webserver	^[68][69][70]
NUPACK	Computes the full unpseudoknotted partition function of interacting strands in dilute solution. Calculates the concentrations, mfes, and base-pairing probabilities of the ordered complexes below a certain complexity. Also computes the partition function and basepairing of single strands including a class of pseudoknotted structures. Also enables design of ordered complexes.	yes	no	NUPACK	^[71]
OligoWalk/RNAstructure	Predicts bimolecular secondary structures with and without intramolecular structure. Also predicts the hybridization affinity of a short nucleic acid to an RNA target.	yes	no	[1]	^[72]
piRNA	calculates the partition function and thermodynamics of RNA-RNA interactions. It considers all possible joint secondary structure of two interacting nucleic acids that do not contain pseudoknots, interaction pseudoknots, or zigzags.	yes	no	linuxbinary	^[73]
RNAaliduplex	Based upon RNAduplex with bonuses for covarying sites	no	yes	sourcecode	^[10]
RNAcofold	works much like RNAfold, but allows to specify two RNA sequences which are then allowed to form a dimer structure.	yes	no	sourcecode	^[10][74]
RNAduplex	computes optimal and suboptimal secondary structures for hybridization. The calculation is simplified by allowing only inter-molecular base pairs.	no	no	sourcecode	^[10]
RNAhybrid	a tool for finding the minimum free energy hybridisation of a long and a short RNA.	no	no	sourcecode, webserver	^[75][76]
RNAup	calculates the thermodynamics of RNA-RNA interactions. RNA-RNA binding is decomposed into two stages. (1) First the probability that a sequence interval (e.g. a binding site) remains unpaired is computed. (2) Then the binding energy given that the binding site is unpaired is calculated as the optimum over all possible types of bindings.	yes	no	sourcecode	^[10][77]
*

Inter molecular interactions: MicroRNA:UTR

MicroRNAs regulate protein coding gene expression by binding to 3' UTRs, there are tools specifically designed for predicting these interactions. For an evaluation of target prediction methods on high-throughput experimental data see (Selbach et al., Nature 2008) ^[78] and (Alexiou et al., Bioinformatics 2009)^[79]

Name	Description	Species Specific	Intra-molecular structure	Comparative	Link	References
Diana-microT	DIANA-microT 3.0 is an algorithm based on several parameters calculated individually for each microRNA and it combines conserved and non-conserved microRNA recognition elements into a final prediction score.	human, mouse	no	yes	webserver	^[80]
MicroTar	An animal miRNA target prediction tool based on miRNA-target complementarity and thermodynamic data.	no	no	no	sourcecode	^[81]
miTarget	microRNA target gene prediction using a support vector machine.	no	no	no	webserver	^[82]
PicTar	Combinatorial microRNA target predictions.	8 vertebrates	no	yes	predictions	^[83]
PITA	Incorporates the role of target-site accessibility, as determined by base-pairing interactions within the mRNA, in microRNA target recognition.	no	yes	no	executable, webserver, predictions	^[84]
RNA22	The first link (predictions) provides RNA22 predictions for all protein coding transcripts in human, mouse, roundworm, and fruit fly. It allows you to visualize the predictions within a cDNA map and also find transcripts where multiple miR's of interest target. The second web-site link (custom) first finds putative microRNA binding sites in the sequence of interest, then identifies the targeted microRNA.	no	no	no	predictions custom	^[85]
RNAhybrid	a tool for finding the minimum free energy hybridisation of a long and a short RNA.	no	no	no	sourcecode, webserver	^[75][76]
Sylamer	Sylamer is a method for finding significantly over or under-represented words in sequences according to a sorted gene list. Typically it is used to find significant enrichment or depletion of microRNA or siRNA seed sequences from microarray expression data.	no	no	no	sourcecode webserver	^[86][87]
TAREF	TAREF stands for TARget REFiner. It predicts microRNA targets on the basis of multiple feature information derived from the flanking regions of the predicted target sites where traditional structure prediction approach may not be successful to assess the openness. It also provides an option to use encoded pattern to refine filtering.	Yes	no	no	server/sourcecode	^[88]
p-TAREF	p-TAREF stands for plant TARget REFiner. It identifies plant microRNA targets on the basis of multiple feature information derived from the flanking regions of the predicted target sites where traditional structure prediction approach may not be successful to assess the openness. It also provides an option to use encoded pattern to refine filtering. It first time employed power of machine learning approach with scoring scheme through Support Vector Regression(SVR) while considering structural and alignment aspects of targeting in plants with plant specific models. p-TAREF has been implemented in concurrent architecture in server as well as standalone form, making it one of the very few available target identification tools able to run concurrently on simple desktops while performing huge transcriptome level analysis accurately and fast. Besides this, it also provides an option to experimentally validate the predicted targets, on the spot, using expression data, which has been integrated in its back-end, to draw confidence on prediction along with SVR score.p-TAREF performance benchmarking has been done extensively through different tests and compared with other plant miRNA target identification tools. p-TAREF was found better performing.	Yes	no	no	server/standalone
TargetScan	Predicts biological targets of miRNAs by searching for the presence of conserved 8mer and 7mer sites that match the seed region of each miRNA. Predictions are ranked using site number, site type, and site context, which includes factors that influence target-site accessibility.	vertebrates, flies, nematodes	evaluated indirectly	yes	sourcecode, webserver	^{[89][90][91][92]}
*

ncRNA gene prediction software

Name	Description	Number of sequences	Alignment	Structure	Link	References
Alifoldz	Assessing a multiple sequence alignment for the existence of an unusual stable and conserved RNA secondary structure.	any	input	yes	sourcecode	^[93]
EvoFold	a comparative method for identifying functional RNA structures in multiple-sequence alignments. It is based on a probabilistic model-construction called a phylo-SCFG and exploits the characteristic differences of the substitution process in stem-pairing and unpaired regions to make its predictions.	any	input	yes	linuxbinary	^[94]
MSARi	heuristic search for statistically significant conservation of RNA secondary structure in deep multiple sequence alignments.	any	input	yes	sourcecode	^[95]
QRNA	This is the code from Elena Rivas that accompanies a submitted manuscript "Noncoding RNA gene detection using camparative sequence analysis". QRNA uses comparative genome sequence analysis to detect conserved RNA secondary structures, including both ncRNA genes and cis-regulatory RNA structures.	2	input	yes	sourcecode	^[96][97]
RNAz	program for predicting structurally conserved and thermodynamic stable RNA secondary structures in multiple sequence alignments. It can be used in genome wide screens to detect functional RNA structures, as found in noncoding RNAs and cis-acting regulatory elements of mRNAs.	any	input	yes	sourcecode, webserver RNAz 2	^{[98][99][100]}
Xrate	a program for analysis of multiple sequence alignments using phylogenetic grammars, that may be viewed as a flexible generalization of the "Evofold" program.	any	yes	yes	sourcecode	^[66]
* Number of sequences: <any\|num>. * Alignment: predicts an alignment, <input\|yes\|no>. * Structure: predicts structure, <input\|yes\|no>.

Family specific gene prediction software

Name	Description	Family	Link	References
ARAGORN	ARAGORN detects tRNA and tmRNA in nucleotide sequences.	tRNA tmRNA	webserver source	^[101]
miRNAminer	Given a search query, candidate homologs are identified using BLAST search and then tested for their known miRNA properties, such as secondary structure, energy, alignment and conservation, in order to assess their fidelity.	MicroRNA	webserver	^[102]
RISCbinder	Prediction of guide strand of microRNAs.	Mature miRNA	webserver	^[103]
RNAmicro	A SVM-based approach that, in conjunction with a non-stringent filter for consensus secondary structures, is capable of recognizing microRNA precursors in multiple sequence alignments.	MicroRNA	homepage	^[104]
RNAmmer	RNAmmer uses HMMER to annotate rRNA genes in genome sequences. Profiles were built using alignments from the European ribosomal RNA database^[105] and the 5S Ribosomal RNA Database.^[106]	rRNA	webserver source	^[107]
SnoReport	Uses a combination of RNA secondary structure prediction and machine learning that is designed to recognize the two major classes of snoRNAs, box C/D and box H/ACA snoRNAs, among ncRNA candidate sequences.	snoRNA	sourcecode	^[108]
SnoScan	Search for C/D box methylation guide snoRNA genes in a genomic sequence.	C/D box snoRNA	sourcecode, webserver	^[109][110]
snoSeeker	snoSeeker includes two snoRNA-searching programs, CDseeker and ACAseeker, specific to the detection of C/D snoRNAs and H/ACA snoRNAs, respectively. snoSeeker has been used to scan four human–mammal whole-genome alignment (WGA) sequences and identified 54 novel candidates including 26 orphan candidates as well as 266 known snoRNA genes.	snoRNA	webserver,stand-alone	^[111]
tRNAscan-SE	a program for the detection of transfer RNA genes in genomic sequence.	tRNA	sourcecode, webserver	^[110][112]
.

RNA homology search software

Name	Description	Link	References
ERPIN	"Easy RNA Profile IdentificatioN" is an RNA motif search program reads a sequence alignement and secondary structure, and automatically infers a statistical "secondary structure profile" (SSP). An original Dynamic Programming algorithm then matches this SSP onto any target database, finding solutions and their associated scores.	sourcecode webserver	^{[113][114][115]}
Infernal	"INFERence of RNA ALignment" is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs).	sourcecode	^{[116][117][118]}
PHMMTS	"pair hidden Markov models on tree structures" is an extension of pair hidden Markov models defined on alignments of trees.	sourcecode, webserver	^[119]
RaveNnA	A slow and rigorous or fast and heuristic sequence-based filter for covariance models.	sourcecode	^[120][121]
RSEARCH	Takes a single RNA sequence with its secondary structure and utilizes a local alignment algorithm to search a database for homologous RNAs.	sourcecode	^[122]
.

Benchmarks

<><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><>

Name	Description	Structure	Alignment	Phylogeny	Links	References
BRalibase I	A comprehensive comparison of comparative RNA structure prediction approaches	yes	no	no	data	^[123]
BRalibase II	A benchmark of multiple sequence alignment programs upon structural RNAs	no	yes	no	data	^[124]
BRalibase 2.1	A benchmark of multiple sequence alignment programs upon structural RNAs	no	yes	no	data	^[125]
BRalibase III	A critical assessment of the performance of homology search methods on noncoding RNA	no	yes	no	data	^[126]
CompaRNA	An independent comparison of single-sequence RNA secondary structure prediction programs	yes	no	no	CompaRNA
* Alignment: benchmarks alignment tools <yes\|no>. * Structure: benchmarks structure prediction tools <yes\|no>.

Alignment viewers/editors

Name	Description	Alignment	Structure	Link	References
4sale	A tool for Synchronous RNA Sequence and Secondary Structure Alignment and Editing	yes	yes	sourcecode	^[127]
Colorstock, SScolor, Raton	Colorstock, a command-line script using ANSI terminal color; SScolor, a Perl script that generates static HTML pages; and Raton, an AJAX web application generating dynamic HTML. Each tool can be used to color RNA alignments by secondary structure and to visually highlight compensatory mutations in stems.	yes	yes	sourcecode	^[128]
Integrated Genome Browser (IGB)	a multiple alignment viewer written in Java.	yes	no	sourcecode	^[129]
Jalview	a multiple alignment editor written in Java.	yes	no	sourcecode	^[130][131]
RALEE	a major mode for the Emacs text editor. It provides functionality to aid the viewing and editing of multiple sequence alignments of structured RNAs.	yes	yes	sourcecode	^[132]
SARSE	A graphical sequence editor for working with structural alignments of RNA.	yes	yes	sourcecode	^[133]
* Alignment: view and edit an alignment, <yes\|no>. * Structure: view and edit structure, <yes\|no>

Inverse Folding/RNA design

Name	Description	Link	References
ETeRNA	An RNA folding game that challenges players to come up with sequences that fold into a target RNA structure. The best sequences for a given puzzle are synthesized and their structures are probed through chemical mapping. The sequences are then scored by the data's agreement to the target structure and feedback is provided to the players.	home page	--
NUPACK	Although NUPACK can be used to get useful statistics and properties of an RNA's structure as mentioned above, it's main goal is design of new sequences that fold into a desired structure.	home page	^[71]
RNAInverse	The ViennaRNA package provides RNAInverse, an algorithm for designing sequences with desired structure.	help page	^[134]

Secondary structure viewers/editors

Name	Description	Link	References
PseudoViewer	Automatically visualizing RNA pseudoknot structures as planar graphs.	webapp/binary	^{[135][136][137][138]}
RNA Movies	browse sequential paths through RNA secondary structure landscapes	sourcecode	^[139][140]
RNA2D3D	a program for generating, viewing, and comparing 3-dimensional models of RNA	binary	^[141]
RNAView/RnamlView	Use RNAView to automatically identify and classify the types of base pairs that are formed in nucleic acid structures. Use RnamlView to arrange RNA structures.	sourcecode	^[142]
VARNA	a tool for the automated drawing, visualization and annotation of the secondary structure of RNA, designed as a companion software for web servers and databases	sourcecode	^[143]

A list of computer programs that are used for nucleic acids simulations

Min - Optimization, MD - Molecular dynamics, MC - Monte Carlo, REM - Replica exchange method,

Crt - Cartesian coordinates. Int - Internal coordinates Exp - Explicit water. Imp - Implicit water.

Lig - Ligands interactions. HA - Hardware accelerated.

Name	View 3D	Model Build	Min	MD	MC	REM	Crt	Int	Exp	Imp	Lig	HA	Comments	License	Homepage
Abalone	+	+	+	+	+	+	+		+	+	+	+	DNA, proteins, ligands	Free	Agile Molecule
AMBER ^[1]		+	+	+		+	+		+	+	+		AMBER Force Field	Commercial	ambermd.org
Ascalaph Designer	+	+	+	+			+		+	+	+		AMBER	GPL	biomolecular-modeling.com
CHARMM		+	+	+	+		+		+	+	+		CHARMM Force Field	Commercial	charmm.org
ICM^[2]	+	+	+		+			+		+			Global optimization	Commercial	Molsoft
JUMNA ^[3]		+	+					+		+				Commercial
MDynaMix ^[4]	+	+		+			+		+		+		Common MD	GPL	Stockholm University
MOE	+	+	+	+			+		+		+		Molecular Operating Environment	Commercial	Chemical Computing Group
NAB ^[5]		+											Nucleic Acid Builder	GPL	New Jersey University
NAMD	+		+	+			+		+		+	+	NAnoscale Molecular Dynamics	Free	University of Illinois

Jinjiang

Wednesday, 20 June 2012

Jinjiang's WebWatcher on Biology (8) - Sequence Analysis

Family specific gene prediction software

Alignment viewers/editors

Inverse Folding/RNA design

Secondary structure viewers/editors

About Me

Blog Archive