Wednesday, 20 June 2012

Jinjiang's WebWatcher on Biology (8) - Sequence Analysis

Tools for Sequence Analysis

Ref: http://bioinformatics.igc.gulbenkian.pt/resources/tools/sequenceanalysis

List of exhaustive tools with links to sequence analysis

Sequence Manipulation

tool
description
SMS
Sequence Manipulation Suite - Here you can find a collection of programs for generating, formatting, and analyzing DNA and protein sequences.
Tool from the EMBOSS package joins two overlapping nucleic acid sequences into one merged sequence.
Tool to convert a DNA sequence into its reverse, complement, or reverse-complement .
Restriction Analysis

tool
description
Restriction Enzyme Database - search for restriction enzymes by name, species, recognition sequence, companies that sell restriction enzymes or by authors and citations associated with each enzyme.
A nice site for generating retriction maps and identification of non-overlapping ORFs.
Generate several types of graphics and text-based maps for restriction enzymes.
An on-line tool for restriction analysis, silent mutation scanning, and SNP-RFLP analysis.
Primer Design


tool
description
Primer3 is a widely used program for designing PCR primers.
Application that accepts a multiple species nucleotide alignment file as input and identifies a set of PCR primers that will bind across the alignment. The program iteratively runs the Primer3 application for each alignment sequence and collates the results.
Design intron-spanning assays for your target gene. You can select the organism of interest and enter the target-gene name, gene ID or nucleotide sequence.
Real Time PCR primer design.
The Consensus-degenerate hybrid oligonucleotide primers program designs PCR primers from protein multiple-sequence alignments and is intended for cases where the protein sequences are distant from each other and degenerate primers are needed Help.
Method for designing degenerate primers based on multiple local alignments employing the MEME algorithm supported with electronic PCR.
Finding Genes

tool
description
Gene identification program which analyzes genomic DNA sequences from a variety of organisms including human, other vertebrates, invertebrates and plants.
Package of programs for gene prediction in Bacteria, Archaea and Metagenomes; Eukaryotes; Viruses, Phages and Plasmids and EST.
Gene finding in Eukaryote, Bacteria and Virus.
Software that predicts exons, genes, promoters, polyAs, CpG islands, EST similarities, and repetitive elements within DNA sequence.
Software that performs gene predictions on microbial and model organisms and produce a set of data which can be used by GrailEXP v3.0 to recognize genes in these organisms.
Prediction of gene start location in mammalian genomes, by combining information about CpG islands, transcription start sites (TSSs), and signals downstream of the predicted TSSs.
Software thar compares a protein sequence to a genomic DNA sequence, allowing for introns and frameshifting errors.
A list of gene prediction programs for both eukaryotic and prokaryotic organisms.
Finding Promoters and Regulatory Elements

tool
description
Searching DNA for eukaryotic transcription Factor Binding Sites and DNA-binding profiles (searches TransFAC).
Tool for finding cis-regulatory elements in genomic sequences. Predictions are based on the integration of binding site prediction generated with high-quality transcription factor models and cross-species comparison filtering (phylogenetic footprinting).
Web tool for predicting transcription factor binding sites in DNA sequences. It can identify binding sites using site or consensus strings and positional weight matrices from the TRANSFAC, JASPAR, IMD, CBIL-GibbsMat database. You can use TESS to search a few of your own sequences or for user-defined CRMs genome-wide near genes throughout genomes of interest.
Gene finding in Eukaryote, Bacteria and Virus. Go to Test on Line on the left side, and seach on search Motifs menu.
Neural Network Promoter Prediction - Promoter Prediction by Neural Network for prokaryotes or eukaryotes.
Predicts Promoter regions based on scoring homologies with putative eukaryotic Pol II promoter sequences.
Predicts transcription start sites of vertebrate PolII promoters in DNA sequences.
Identify Splice Junctions

tool
description
Splice Site Prediction by Neural Network for drosophila and human/other Help.
The NetGene2 server is a service producing neural network predictions of splice sites in human, C. elegans and A. thaliana DNA Help.
MaxEntScan was used to score the splice site signals of each exon-intron junction. MaxEntScan is based on the approach for modeling the sequences of short sequence motifs such as those involved in RNA splicing which simultaneously accounts for non-adjacent as well as adjacent dependencies between positions.
The Splicing RegulatiOn Online Graphical Engine combines: 1)Availability of data - accessibility to large sets of published data; 2) Integration of data - integrative overview of the signals characterizing exons of interest; 3) Intuitive statistical measures -many algorithms provide output which are not directly interpretable (e.g. delta-G scores, PSSM log odd scores), etc.. 4) User friendliness - intuitive, interactive, graphical user interface and on dynamic java-script programming, enabling users to interactively modify their input. Help.
HSF
The Human Splicing Finder is an online bioinformatics tool to predict the effects of mutations on splicing signals or to identify splicing motifs in any human sequence. Help.
SplicePort is a splice-site analysis tool that makes splice-site predictions for submitted sequences, and allows browsing of predictive signals and motif exploration. This collection of signals is capable of achieving high classification accuracy on human splice sites. Help.
Prediction of putative alternative exon isoform, cryptic, and constitutive splice sites of internal (coding) exons Help.
Sequence repeat finders

tool
description
Program that screens DNA sequences for low complexity DNA sequences and interspersed repeats. The masked out sequence can be used for example BLAST searches. Repeats are stored in the datbase Repbase update.
Sequence Motif Finders

tool
description
Scan Nucleotide or Protein Sequences for Matching Patterns.
Estimated Locations of Pattern Hits - Find motifs in a set of DNA or protein sequences Tutorial.
Translation Tools

tool
description
Tool from the EMBOSS package, translates nucleic acid sequences to the corresponding peptide sequence. It has option for which Genetic Code Table to use.
This tool allows the translation of a nucleotide (DNA/RNA) sequence to a protein sequence using the standard genetic code.
Open Reading Frame Finder is a graphical analysis tool which finds all open reading frames of a selectable minimum size in a user’s sequence or in a sequence already in the database using the standard or alternative genetic codes.
Post-Translational Modifications

tool
description
Post-translationam modification tutorial.
A survey of publicly available PTM web resources, databases and classification/prediction servers.
Tool that can predict the possible oligosaccharide structures that occur on proteins from their experimentally determined masses. The program can be used for free or derivatized oligosaccharides and for glycopeptides.
This tool predicts N-terminal myristoylation of proteins by neural networks.Only N-terminal glycines are myristoylated (leading methionines are cleaved prior to myristoylation).
Group-based Phosphorylation Scoring method is a tool for in silico prediction of phosphorylation sites with their specific kinases.
Tool that produces neural network predictions for serine, threonine and tyrosine phosphorylation sites in eukaryotic proteins.
Tool for in silico sumoylation sites prediction. SUMOylation, a reversible post-translational modification of proteins by the small ubiquitin-related modifiers (SUMO), is crucial in a variety of biological processes.
Align Two Sequences

tool
description
This tool produces the alignment of two given sequences using the NCBI BLAST engine for local alignment. The output shows the similar region.
EMBOSS Pairwise Alignment Algorithms tool used to compare 2 sequences when you want an alignment that covers the whole length of both sequences.
EMBOSS Pairwise Alignment Algorithms tool used when you are trying to find the best region of similarity between two sequences.
Multiple Sequence Alignment

tool
description
Multiple sequence alignment for DNA or proteins with hierarchical clustering.
Multiple sequence alignment program for DNA or proteins sequences. It calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen. Evolutionary relationships can be seen via viewing Cladograms or Phylograms.
Computes a multiple sequence alignment and the associated phylogenetic tree for a set of sequences (Proteins or DNA). T-Coffee allows the combination of a collection of multiple/pairwise, global or local alignments into a single model. It also allows to estimate the level of consistency of each position within the new alignment with the rest of the alignments.
BLAST

<><><><><> <> <><><> <> <><><>  <><><><>  <><><><> <> <><><> <> <><><>  <><><><>  <>
tool

description


The Basic Local Alignment Search Tool (NCBI) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches Tutorial.


Here you can find a list of all the Blast´s available at the EBI including the Ensembl Multi BlastView to the annotated genomes.

UCSC Genome Browser Utilities
Acknowledged the contribution of the UCSC Genome Bioinformatics Group.
. Batch Coordinate Conversion (liftOver) - converts genome coordinates and genome annotation files between assemblies. The current version supports both forward and reverse conversions, as well as conversions between selected species.
. DNA Duster - removes formatting characters and other non-sequence-related characters from an input sequence. Offers several configuration options for the output format, including translated protein.
. Protein Duster - removes formatting characters and other non-sequence-related characters from an input sequence. Offers several configuration options for the output format.
. Phylogenetic Tree Gif Maker - creates a gif image from the phylogenetic tree specification given. Offers several configuration options for branch lengths, normalized lengths, branch labels, legend etc.
. Source Code Downloads - The Genome Browser, Blat and liftOver source code is freely downloadable for academic, noncommercial, and personal use.

Gene prediction software
Ab initio approaches

Name
Description
Links
References
identifying translational initiation sites in cDNA sequences
predicts genes in eukaryotic genomic sequences
hidden Markov model (HMM) and dynamic programming based ab initio gene prediction program
a system for fast detection of coding regions in short genomic sequences
software for recognition of vertebrate RNA Polymerase II promoters
gene finding for Arabidopsis thaliana
find genes and frameshift in G+C rich prokaryotic sequences
linking ORFs in complete genomes to protein 3D structures
program to predict genes, exons, splice sites and other signals along a DNA sequence
Parse a DNA sequence into introns and exons
family of gene prediction programs
gene prediction program for prokaryotes and eukaryotes
prediction of genes with frameshifts in prokaryotic genomes
web tool for combining results from different programs (GRAIL, FEX, HEXON, MZEF, GENEMARK, GENEFINDER, FGENE, BLAST, POLYAH, REPEATMASKER, TRNASCAN)
finding genes in microbial DNA
hidden markov model for finding genes in vertebrate DNA Server
a decision tree system for finding genes in vertebrate DNA
a method to identify potential splice sites in (plant) pre-mRNA by sequence inspection using Bayesian statistical models
finding genes using Fourier transform
promoter prediction by neural network
splice site prediction by neural network
provides a series of modular computer programs specifically designed for the detection of regulatory signals in non-coding sequences.
predicts locations and exon-intron structures of genes in genomic sequences from a variety of organisms.
a graphical analysis tool which finds all open reading frames
predicts exons, genes, promoters, polyas, CpG islands, EST similarities, and repetitive elements within DNA sequence

RNA structure prediction software
Single sequence secondary structure prediction

Name
Description
Knots
Links
References
Secondary structure prediction based on generalized centroid estimator
no
Secondary structure prediction by using homologous sequence information
no
Secondary structure prediction method based on conditional log-linear models (CLLMs), a flexible class of probabilistic models which generalize upon SCFGs by using discriminative training and feature-rich scoring.
no
Secondary structure prediction method based on placement of helices allowing complex pseudoknots.
yes
Folding kinetics of RNA sequences including pseudoknots by including an implementation of the partition function for knots.
yes
MFE (Minimum Free Energy) RNA structure prediction algorithm.
no
A dynamic programming algorithm for optimal RNA pseudoknot prediction using the nearest neighbour energy model.
yes
A dynamic programming algorithm for the prediction of a restricted class of RNA pseudoknots.
yes
Secondary structure prediction via thermodynamic-based folding algorithms and novel structure-based sequence alignment specific for RNA.
yes
MFE RNA structure prediction algorithm. Includes an implementation of the partition function for computing basepair probabilities and circular RNA folding.
no
[7][10][11][12][13][14]
MFE RNA structure prediction based on abstract shapes. Shape abstraction retains adjacency and nesting of structural features, but disregards helix lengths, thus reduces the number of suboptimal solutions without losing significant information. Furthermore, shapes represent classes of structures for which probabilities based on Boltzmann-weighted energies can be computed.
no
A program to predict lowest free energy structures and base pair probabilities for RNA or DNA sequences. Programs are also available to predict Maximum Expected Accuracy structures and these can include pseudoknots. Structure prediction can be constrained using experimental data, including SHAPE, enzymatic cleavage, and chemical modification accessibility. Graphical user interfaces are available for Windows and for Mac OS-X/Linux. Programs are also available for use with Unix-style text interfaces. Additionally, a C++ class library is available.
yes
[17][18]
Statistical sampling of all possible structures. The sampling is weighted by partition function probabilities.
no
The UNAFold software package is an integrated collection of programs that simulate folding, hybridization, and melting pathways for one or two single-stranded nucleic acid sequences.
no
Crumple is simple, cleanly written software for producing the full set of possible secondary structures for a single sequence, given optional constraints.
no
Sliding windows and assembly is a tool chain for folding long series of similar hairpins.
no
Single sequence tertiary structure prediction

Name
Description
Knots
Links
References
A Python library for the probabilistic sampling of RNA structures that are compatible with a given nucleotide sequence and that are RNA-like on a local length scale.
yes
Automated de novo prediction of native-like RNA tertiary structures .
yes
three-dimensional RNA structure prediction and folding
yes
Thermodynamics and Nucleotide cyclic motifs for RNA structure prediction algorithm. 2D and 3D structures.
yes
Coarse-grained modeling of large RNA molecules with knowledge-based potentials and structural filters
?
An integrated platform for de novo and homology modeling of RNA 3D structures, where coordinate file input, sequence editing, sequence alignment, structure prediction and analysis features are all accessed from a single intuitive graphical user interface.
yes
*Knots: Pseudoknot prediction, <yes|no>.
Comparative methods

The single sequence methods mentioned above have a difficult job detecting a small sample of reasonable secondary structures from a large space of possible structures. A good way to reduce the size of the space is to use evolutionary approaches. Structures that have been conserved by evolution are far more likely to be the functional form. The methods below use this approach.

Name
Description
Number of sequences
Alignment
Structure
Knots
Link
References

Comparative analysis combined with MFE folding.
any
no
yes
no

Common secondary structure prediction based on generalized centroid estimator
any
yes
no
no

Fast and accurate multiple aligner for RNA sequences
any
yes
no
no

an expectation maximization algorithm using covariance models for motif description. Uses heuristics for effective motif search, and a Bayesian framework for structure prediction combining folding energy and sequence covariation.
yes
yes
no

implements a pinned Sankoff algorithm for simultaneous pairwise RNA alignment and consensus structure prediction.
2
yes
yes
no

an algorithm that improves the accuracy of structure prediction by combining free energy minimization and comparative sequence analysis to find a low free energy structure common to two sequences without requiring any sequence identity.
2
yes
yes
no

A multiple RNA structural RNA alignment method, to a large extend based on the PMcomp program.
any
yes
yes
no

Computes a consensus RNA secondary structure from an RNA sequence alignment based on machine learning.
any
input
yes
yes

Produce a global fold and alignment of ncRNA families using integer linear programming and Lagrangian relaxation.
any
yes
yes
no

LocaRNA is the successor of PMcomp with an improved time complexity. It is a variant of Sankoff's algorithm for simultaneous folding and alignment, which takes as input pre-computed base pair probability matrices from McCaskill's algorithm as produced by RNAfold -p. Thus the method can also be viewed as way to compare base pair probability matrices.
any
yes
yes
no

A sampling approach using Markov chain Monte Carlo in a simulated annealing framework, where both structure and alignment is optimized by making small local changes. The score combines the log-likelihood of the alignment, a covariation term and the basepair probabilities.
any
yes
yes
no

a multiple alignment tool for RNA sequences using iterative alignment based on Sankoff's algorithm with sharply reduced computational time and memory.
any
yes
yes
no

a multiple alignment tool for RNA sequences using progressive alignment based on pairwise structural alignment algorithm of SCARNA.
any
yes
yes
no

A method for joint prediction of alignment and common secondary structures of two RNA sequences using a probabilistic model based on pseudo free energies obtained from precomputed base pairing and alignment probabilities.
2
yes
yes
no

Folds alignments using a SCFG trained on rRNA alignments.
input
yes
no

Formally integrates both the energy-based and evolution-based approaches in one model to predict the folding of multiple aligned RNA sequences by a maximum expected accuracy scoring. The structural probabilities are calculated by RNAfold and Pfold.
any
input
yes
no

PMcomp is a variant of Sankoff's algorithm for simultaneous folding and alignment, which takes as input pre-computed base pair probability matrices from McCaskill's algorithm as produced by RNAfold -p. Thus the method can also be viewed as way to compare base pair probability matrices. PMmulti is a wrapper program that does progressive multiple alignments by repeatedly calling pmcomp
yes
yes
no

uses RNAlpfold to compute the secondary structure of the provided sequences. A modified version of T-Coffee is then used to compute the multiple sequence alignment having the best agreement with the sequences and the structures. R-Coffee can be combined with any existing sequence alignment method.
any
yes
yes
no

The structure based sequence alignment (SBSA) algorithm within RNA123 utilizes a novel suboptimal version of the Needleman-Wunsch global sequence alignment method that fully accounts for secondary structure in the template and query. It also utilizes two separate substitution matrices that are optimized for RNA helices and single stranded regions. The SBSA algorithm provides >90% accurate sequence alignments even for structures as large as bacterial 23S rRNA (~2800 nts).
any
yes
yes
yes

Folds precomputed alignments using a combination of free-energy and a covariation measures. Ships with the Vienna package.
any
input
yes
no

enumerates the near-optimal abstract shape space, and predicts as the consensus an abstract shape common to all sequences, and for each sequence, the thermodynamically best structure which has this abstract shape.
any
no
yes
no

Compare and align RNA secondary structures via a "forest alignment" approach.
any
yes
input
no

Frequent stem pattern miner from unaligned RNA sequences is a software tool to extract the structural motifs from a set of RNA sequences.
any
no
yes
no

A probabilistic sampling approach that combines intrasequence base pairing probabilities with intersequence base alignment probabilities. This is used to sample possible stems for each sequence and compare these stems between all pairs of sequences to predict a consensus structure for two sequences. The method is extended to predict the common structure conserved among multiple sequences by using a consistency-based score that incorporates information from all the pairwise structural alignments.
any
yes
yes
yes

Stem Candidate Aligner for RNA (Scarna) is a fast, convenient tool for structural alignment of a pair of RNA sequences. It aligns two RNA sequences and calculates the similarities of them, based on the estimated common secondary structures. It works even for pseudoknotted secondary structures.
2
yes
yes
no

simultaneously inferring RNA structures including pseudoknots, alignments, and trees using a Bayesian MCMC framework.
any
yes
yes
yes

a program for pairwise RNA structural alignment based on probabilistic models of RNA structure known as Pair stochastic context-free grammars.
any
yes
yes
no

an alignment tool designed to provide multiple alignments of non-coding RNAs following a fast progressive strategy. It combines the thermodynamic base pairing information derived from RNAfold calculations in the form of base pairing probability vectors with the information of the primary sequence.
yes
no
no

A tool for predicting non-coding RNA secondary structures including pseudoknots. It takes in input an alignment of RNA sequences and returns the predicted secondary structure(s).It combines criteria of stability, conservation and covariation in order to search for stems and pseudoknots. Users can change different parameters values, set (or not) some known stems (if there are) which are taken into account by the system, choose to get several possible structures or only one, search for pseudoknots or not, etc.
any
yes
yes
yes

a webserver that makes it possible to simultaneously use a number of state of the art methods for performing multiple alignment and secondary structure prediction for noncoding RNA sequences.
yes
yes
no

a program for analysis of multiple sequence alignments using phylogenetic grammars, that may be viewed as a flexible generalization of the "Pfold" program.
any
yes
yes
no

* Number of sequences: <any|num>. * Alignment: predicts an alignment, <input|yes|no>. * Structure: predicts structure, <input|yes|no>. * Knots: pseudoknot prediction, <yes|no>.
Inter molecular interactions: RNA-RNA

Many ncRNAs function by binding to other RNAs. For example, miRNAs regulate protein coding gene expression by binding to 3' UTRs, small nucleolar RNAs guide post-transcriptional modifications by binding to rRNA, U4 spliceosomal RNA and U6 spliceosomal RNA bind to each other forming part of the spliceosome and many small bacterial RNAs regulate gene expression by antisense interactions E.g. GcvB, OxyS and RyhB.

Name
Description
Intra-molecular structure
Comparative
Link
References
GUUGle
A utility for fast determination of RNA-RNA matches with perfect hybridization via A-U, C-G, and G-U base pairing.
no
no
IntaRNA
Efficient target prediction incorporating the accessibility of target sites
yes
no
Computes the full unpseudoknotted partition function of interacting strands in dilute solution. Calculates the concentrations, mfes, and base-pairing probabilities of the ordered complexes below a certain complexity. Also computes the partition function and basepairing of single strands including a class of pseudoknotted structures. Also enables design of ordered complexes.
yes
no
Predicts bimolecular secondary structures with and without intramolecular structure. Also predicts the hybridization affinity of a short nucleic acid to an RNA target.
yes
no
calculates the partition function and thermodynamics of RNA-RNA interactions. It considers all possible joint secondary structure of two interacting nucleic acids that do not contain pseudoknots, interaction pseudoknots, or zigzags.
yes
no
Based upon RNAduplex with bonuses for covarying sites
no
yes
works much like RNAfold, but allows to specify two RNA sequences which are then allowed to form a dimer structure.
yes
no
computes optimal and suboptimal secondary structures for hybridization. The calculation is simplified by allowing only inter-molecular base pairs.
no
no
a tool for finding the minimum free energy hybridisation of a long and a short RNA.
no
no
calculates the thermodynamics of RNA-RNA interactions. RNA-RNA binding is decomposed into two stages. (1) First the probability that a sequence interval (e.g. a binding site) remains unpaired is computed. (2) Then the binding energy given that the binding site is unpaired is calculated as the optimum over all possible types of bindings.
yes
no
*
Inter molecular interactions: MicroRNA:UTR

MicroRNAs regulate protein coding gene expression by binding to 3' UTRs, there are tools specifically designed for predicting these interactions. For an evaluation of target prediction methods on high-throughput experimental data see (Selbach et al., Nature 2008) [78] and (Alexiou et al., Bioinformatics 2009)[79]

Name
Description
Species Specific
Intra-molecular structure
Comparative
Link
References
DIANA-microT 3.0 is an algorithm based on several parameters calculated individually for each microRNA and it combines conserved and non-conserved microRNA recognition elements into a final prediction score.
human, mouse
no
yes
An animal miRNA target prediction tool based on miRNA-target complementarity and thermodynamic data.
no
no
no
microRNA target gene prediction using a support vector machine.
no
no
no
Combinatorial microRNA target predictions.
8 vertebrates
no
yes
Incorporates the role of target-site accessibility, as determined by base-pairing interactions within the mRNA, in microRNA target recognition.
no
yes
no
The first link (predictions) provides RNA22 predictions for all protein coding transcripts in human, mouse, roundworm, and fruit fly. It allows you to visualize the predictions within a cDNA map and also find transcripts where multiple miR's of interest target. The second web-site link (custom) first finds putative microRNA binding sites in the sequence of interest, then identifies the targeted microRNA.
no
no
no
a tool for finding the minimum free energy hybridisation of a long and a short RNA.
no
no
no
Sylamer is a method for finding significantly over or under-represented words in sequences according to a sorted gene list. Typically it is used to find significant enrichment or depletion of microRNA or siRNA seed sequences from microarray expression data.
no
no
no
TAREF stands for TARget REFiner. It predicts microRNA targets on the basis of multiple feature information derived from the flanking regions of the predicted target sites where traditional structure prediction approach may not be successful to assess the openness. It also provides an option to use encoded pattern to refine filtering.
Yes
no
no
p-TAREF stands for plant TARget REFiner. It identifies plant microRNA targets on the basis of multiple feature information derived from the flanking regions of the predicted target sites where traditional structure prediction approach may not be successful to assess the openness. It also provides an option to use encoded pattern to refine filtering. It first time employed power of machine learning approach with scoring scheme through Support Vector Regression(SVR) while considering structural and alignment aspects of targeting in plants with plant specific models. p-TAREF has been implemented in concurrent architecture in server as well as standalone form, making it one of the very few available target identification tools able to run concurrently on simple desktops while performing huge transcriptome level analysis accurately and fast. Besides this, it also provides an option to experimentally validate the predicted targets, on the spot, using expression data, which has been integrated in its back-end, to draw confidence on prediction along with SVR score.p-TAREF performance benchmarking has been done extensively through different tests and compared with other plant miRNA target identification tools. p-TAREF was found better performing.
Yes
no
no
Predicts biological targets of miRNAs by searching for the presence of conserved 8mer and 7mer sites that match the seed region of each miRNA. Predictions are ranked using site number, site type, and site context, which includes factors that influence target-site accessibility.
vertebrates, flies, nematodes
evaluated indirectly
yes
*
 ncRNA gene prediction software

Name
Description
Number of sequences
Alignment
Structure
Link
References
Assessing a multiple sequence alignment for the existence of an unusual stable and conserved RNA secondary structure.
any
input
yes
a comparative method for identifying functional RNA structures in multiple-sequence alignments. It is based on a probabilistic model-construction called a phylo-SCFG and exploits the characteristic differences of the substitution process in stem-pairing and unpaired regions to make its predictions.
any
input
yes
heuristic search for statistically significant conservation of RNA secondary structure in deep multiple sequence alignments.
any
input
yes
This is the code from Elena Rivas that accompanies a submitted manuscript "Noncoding RNA gene detection using camparative sequence analysis". QRNA uses comparative genome sequence analysis to detect conserved RNA secondary structures, including both ncRNA genes and cis-regulatory RNA structures.
2
input
yes
program for predicting structurally conserved and thermodynamic stable RNA secondary structures in multiple sequence alignments. It can be used in genome wide screens to detect functional RNA structures, as found in noncoding RNAs and cis-acting regulatory elements of mRNAs.
any
input
yes
a program for analysis of multiple sequence alignments using phylogenetic grammars, that may be viewed as a flexible generalization of the "Evofold" program.
any
yes
yes
* Number of sequences: <any|num>. * Alignment: predicts an alignment, <input|yes|no>. * Structure: predicts structure, <input|yes|no>.

Family specific gene prediction software


Name
Description
Family
Link
References
ARAGORN
ARAGORN detects tRNA and tmRNA in nucleotide sequences.
Given a search query, candidate homologs are identified using BLAST search and then tested for their known miRNA properties, such as secondary structure, energy, alignment and conservation, in order to assess their fidelity.
RISCbinder
Prediction of guide strand of microRNAs.
A SVM-based approach that, in conjunction with a non-stringent filter for consensus secondary structures, is capable of recognizing microRNA precursors in multiple sequence alignments.
RNAmmer
RNAmmer uses HMMER to annotate rRNA genes in genome sequences. Profiles were built using alignments from the European ribosomal RNA database[105] and the 5S Ribosomal RNA Database.[106]
Uses a combination of RNA secondary structure prediction and machine learning that is designed to recognize the two major classes of snoRNAs, box C/D and box H/ACA snoRNAs, among ncRNA candidate sequences.
Search for C/D box methylation guide snoRNA genes in a genomic sequence.
snoSeeker includes two snoRNA-searching programs, CDseeker and ACAseeker, specific to the detection of C/D snoRNAs and H/ACA snoRNAs, respectively. snoSeeker has been used to scan four human–mammal whole-genome alignment (WGA) sequences and identified 54 novel candidates including 26 orphan candidates as well as 266 known snoRNA genes.
a program for the detection of transfer RNA genes in genomic sequence.
.
RNA homology search software

Name
Description
Link
References
"Easy RNA Profile IdentificatioN" is an RNA motif search program reads a sequence alignement and secondary structure, and automatically infers a statistical "secondary structure profile" (SSP). An original Dynamic Programming algorithm then matches this SSP onto any target database, finding solutions and their associated scores.
"INFERence of RNA ALignment" is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs).
"pair hidden Markov models on tree structures" is an extension of pair hidden Markov models defined on alignments of trees.
A slow and rigorous or fast and heuristic sequence-based filter for covariance models.
Takes a single RNA sequence with its secondary structure and utilizes a local alignment algorithm to search a database for homologous RNAs.
.
Benchmarks

<><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><> <><>
Name
Description
Structure
Alignment
Phylogeny
Links
References
A comprehensive comparison of comparative RNA structure prediction approaches
yes
no
no
BRalibase II
A benchmark of multiple sequence alignment programs upon structural RNAs
no
yes
no
BRalibase 2.1
A benchmark of multiple sequence alignment programs upon structural RNAs
no
yes
no
BRalibase III
A critical assessment of the performance of homology search methods on noncoding RNA
no
yes
no
CompaRNA
An independent comparison of single-sequence RNA secondary structure prediction programs
yes
no
no
* Alignment: benchmarks alignment tools <yes|no>. * Structure: benchmarks structure prediction tools <yes|no>.

Alignment viewers/editors


Name
Description
Alignment
Structure
Link
References
A tool for Synchronous RNA Sequence and Secondary Structure Alignment and Editing
yes
yes
Colorstock, a command-line script using ANSI terminal color; SScolor, a Perl script that generates static HTML pages; and Raton, an AJAX web application generating dynamic HTML. Each tool can be used to color RNA alignments by secondary structure and to visually highlight compensatory mutations in stems.
yes
yes
a multiple alignment viewer written in Java.
yes
no
a multiple alignment editor written in Java.
yes
no
a major mode for the Emacs text editor. It provides functionality to aid the viewing and editing of multiple sequence alignments of structured RNAs.
yes
yes
A graphical sequence editor for working with structural alignments of RNA.
yes
yes
* Alignment: view and edit an alignment, <yes|no>. * Structure: view and edit structure, <yes|no>

Inverse Folding/RNA design


Name
Description
Link
References
An RNA folding game that challenges players to come up with sequences that fold into a target RNA structure. The best sequences for a given puzzle are synthesized and their structures are probed through chemical mapping. The sequences are then scored by the data's agreement to the target structure and feedback is provided to the players.
--
Although NUPACK can be used to get useful statistics and properties of an RNA's structure as mentioned above, it's main goal is design of new sequences that fold into a desired structure.
The ViennaRNA package provides RNAInverse, an algorithm for designing sequences with desired structure.

Secondary structure viewers/editors


Name
Description
Link
References
PseudoViewer
Automatically visualizing RNA pseudoknot structures as planar graphs.
RNA Movies
browse sequential paths through RNA secondary structure landscapes
RNA2D3D
a program for generating, viewing, and comparing 3-dimensional models of RNA
RNAView/RnamlView
Use RNAView to automatically identify and classify the types of base pairs that are formed in nucleic acid structures. Use RnamlView to arrange RNA structures.
VARNA
a tool for the automated drawing, visualization and annotation of the secondary structure of RNA, designed as a companion software for web servers and databases

A list of computer programs that are used for nucleic acids simulations

Min - Optimization, MD - Molecular dynamics, MC - Monte Carlo, REM - Replica exchange method,

Crt - Cartesian coordinates. Int - Internal coordinates Exp - Explicit water. Imp - Implicit water.

Lig - Ligands interactions. HA - Hardware accelerated.


Name
View
3D
Model
Build
Min
MD
MC
REM
Crt
Int
Exp
Imp
Lig
HA
Comments
License
Homepage
+
+
+
+
+
+
+
+
+
+
+
Free
+
+
+
+
+
+
+
+
Commercial
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
CHARMM Force Field
Commercial
+
+
+
+
+
+
Commercial
+
+
+
+
Commercial
+
+
+
+
+
+
Common MD
+
+
+
+
+
+
+
Molecular Operating Environment
Commercial
+
Nucleic Acid Builder
+
+
+
+
+
+
+
NAnoscale Molecular Dynamics
Free