The next-generation sequencing (NGS) solutions:
Integrated solutions
* CLCbio
Genomics Workbench - de novo and reference assembly of Sanger, Roche
FLX, Illumina, Helicos, and SOLiD data. Commercial next-gen-seq software that
extends the CLCbio Main Workbench software. Includes SNP detection, CHiP-seq,
browser and other features. Commercial. Windows, Mac OS X and Linux.
* Galaxy - Galaxy =
interactive and reproducible genomics. A job webportal.
* Genomatix
- Integrated Solutions for Next Generation Sequencing data analysis.
* JMP Genomics
- Next gen visualization and statistics tool from SAS. They are working with NCGR to refine this tool and produce others.
* NextGENe
- de novo and reference assembly of Illumina, SOLiD and Roche FLX data.
Uses a novel Condensation Assembly Tool approach where reads are joined via
"anchors" into mini-contigs before assembly. Includes SNP detection,
CHiP-seq, browser and other features. Commercial. Win or MacOS.
* SeqMan
Genome Analyser - Software for Next Generation sequence assembly of
Illumina, Roche FLX and Sanger data integrating with Lasergene Sequence
Analysis software for additional analysis and visualization capabilities. Can
use a hybrid templated/de novo approach. Commercial. Win or Mac OS X.
* SHORE
- SHORE, for Short Read, is a mapping and analysis pipeline for short DNA
sequences produced on a Illumina Genome Analyzer. A suite created by the 1001
Genomes project. Source for POSIX.
* SlimSearch -
Fledgling commercial product.
Align/Assemble to a reference
* BFAST
- Blat-like Fast Accurate Search Tool. Written by Nils Homer, Stanley F. Nelson
and Barry Merriman at UCLA.
* Bowtie -
Ultrafast, memory-efficient short read aligner. It aligns short DNA sequences
(reads) to the human genome at a rate of 25 million reads per hour on a typical
workstation with 2 gigabytes of memory. Uses a Burrows-Wheeler-Transformed
(BWT) index. Link to discussion thread here. Written by Ben Langmead and
Cole Trapnell. Linux, Windows, and Mac OS X.
* BWA - Heng Lee's
BWT Alignment program - a progression from Maq. BWA is a fast light-weighted
tool that aligns short sequences to a sequence database, such as the human
reference genome. By default, BWA finds an alignment within edit distance 2 to
the query sequence. C++ source.
* ELAND
- Efficient Large-Scale Alignment of Nucleotide Databases. Whole genome
alignments to a reference genome. Written by Illumina author Anthony J. Cox for
the Solexa 1G machine.
* Exonerate
- Various forms of pairwise alignment (including Smith-Waterman-Gotoh) of
DNA/protein against a reference. Authors are Guy St C Slater and Ewan Birney
from EMBL. C for POSIX.
* GenomeMapper
- GenomeMapper is a short read mapping tool designed for accurate read
alignments. It quickly aligns millions of reads either with ungapped or gapped
alignments. A tool created by the 1001 Genomes project. Source for POSIX.
* GMAP - GMAP
(Genomic Mapping and Alignment Program) for mRNA and EST Sequences. Developed
by Thomas Wu and Colin Watanabe at Genentec. C/Perl for Unix.
* gnumap - The
Genomic Next-generation Universal MAPper (gnumap) is a program designed to
accurately map sequence data obtained from next-generation sequencing machines
(specifically that of Solexa/Illumina) back to a genome of any size. It seeks
to align reads from nonunique repeats using statistics. From authors at Brigham
Young University. C source/Unix.
* MAQ -
Mapping and Assembly with Qualities (renamed from MAPASS2). Particularly
designed for Illumina with preliminary functions to handle ABI SOLiD data.
Written by Heng Li from the Sanger Centre. Features extensive supporting tools
for DIP/SNP detection, etc. C++ source
* MOSAIK
- MOSAIK produces gapped alignments using the Smith-Waterman algorithm.
Features a number of support tools. Support for Roche FLX, Illumina, SOLiD, and
Helicos. Written by Michael Strömberg at Boston College. Win/Linux/MacOSX
* MrFAST and MrsFAST
- mrFAST & mrsFAST are designed to map short reads generated with the
Illumina platform to reference genome assemblies; in a fast and
memory-efficient manner. Robust to INDELs and MrsFAST has a bisulphite mode.
Authors are from the University of Washington. C as source.
* MUMmer - MUMmer
is a modular system for the rapid whole genome alignment of finished or draft
sequence. Released as a package providing an efficient suffix tree library,
seed-and-extend alignment, SNP detection, repeat detection, and visualization
tools. Version 3.0 was developed by Stefan Kurtz, Adam Phillippy, Arthur L
Delcher, Michael Smoot, Martin Shumway, Corina Antonescu and Steven L Salzberg
- most of whom are at The Institute for Genomic Research in Maryland, USA. POSIX
OS required.
* Novocraft -
Tools for reference alignment of paired-end and single-end Illumina reads. Uses
a Needleman-Wunsch algorithm. Can support Bis-Seq. Commercial. Available free
for evaluation, educational use and for use on open not-for-profit projects.
Requires Linux or Mac OS X.
* PASS
- It supports Illumina, SOLiD and Roche-FLX data formats and allows the user to
modulate very finely the sensitivity of the alignments. Spaced seed intial
filter, then NW dynamic algorithm to a SW(like) local alignment. Authors are
from CRIBI in Italy. Win/Linux.
* RMAP - Assembles 20
- 64 bp Illumina reads to a FASTA reference genome. By Andrew D. Smith and
Zhenyu Xuan at CSHL. (published in BMC Bioinformatics). POSIX OS required.
* SeqMap
- Supports up to 5 or more bp mismatches/INDELs. Highly tunable. Written by Hui
Jiang from the Wong lab at Stanford. Builds available for most OS's.
* SHRiMP - Assembles
to a reference sequence. Developed with Applied Biosystem's colourspace genomic
representation in mind. Authors are Michael Brudno and Stephen Rumble at the
University of Toronto. POSIX.
* Slider-
An application for the Illumina Sequence Analyzer output that uses the
probability files instead of the sequence files as an input for alignment to a
reference sequence or a set of reference sequences. Authors are from BCGSC.
Paper is here.
* SOAP - SOAP (Short
Oligonucleotide Alignment Program). A program for efficient gapped and ungapped
alignment of short oligonucleotides onto reference sequences. The updated
version uses a BWT. Can call SNPs and INDELs. Author is Ruiqiang Li at the
Beijing Genomics Institute. C++, POSIX.
* SSAHA
- SSAHA (Sequence Search and Alignment by Hashing Algorithm) is a tool for
rapidly finding near exact matches in DNA or protein databases using a hash
table. Developed at the Sanger Centre by Zemin Ning, Anthony Cox and James
Mullikin. C++ for Linux/Alpha.
* SOCS - Aligns
SOLiD data. SOCS is built on an iterative variation of the Rabin-Karp string
search algorithm, which uses hashing to reduce the set of possible matches,
drastically increasing search speed. Authors are Ondov B, Varadarajan A,
Passalacqua KD and Bergman NH.
* SWIFT - The SWIFT suit is a software collection for fast
index-based sequence comparison. It contains: SWIFT — fast local alignment
search, guaranteeing to find epsilon-matches between two sequences. SWIFT
BALSAM — a very fast program to find semiglobal non-gapped alignments based on
k-mer seeds. Authors are Kim Rasmussen (SWIFT) and Wolfgang Gerlach (SWIFT
BALSAM)
* SXOligoSearch - SXOligoSearch is a commercial platform
offered by the Malaysian based Synamatix. Will align Illumina reads against a range of
Refseq RNA or NCBI genome builds for a number of organisms. Web Portal. OS
independent.
* Vmatch - A versatile
software tool for efficiently solving large scale sequence matching tasks.
Vmatch subsumes the software tool REPuter, but is much more general, with a
very flexible user interface, and improved space and time requirements.
Essentially a large string matching toolbox. POSIX.
* Zoom - ZOOM (Zillions Of Oligos Mapped) is designed to map
millions of short reads, emerged by next-generation sequencing technology, back
to the reference genomes, and carry out post-analysis. ZOOM is developed to be
highly accurate, flexible, and user-friendly with speed being a critical
priority. Commercial. Supports Illumina and SOLiD data.
De novo Align/Assemble
* ABySS
- Assembly By Short Sequences. ABySS is a de novo sequence assembler that is
designed for very short reads. The single-processor version is useful for
assembling genomes up to 40-50 Mbases in size. The parallel version is implemented
using MPI and is capable of assembling larger genomes. By Simpson JT and others
at the Canada's Michael Smith Genome Sciences Centre. C++ as source.
* ALLPATHS - ALLPATHS: De novo assembly of whole-genome
shotgun microreads. ALLPATHS is a whole genome shotgun assembler that can
generate high quality assemblies from short reads. Assemblies are presented in
a graph form that retains ambiguities, such as those arising from polymorphism,
thereby providing information that has been absent from previous genome
assemblies. Broad Institute.
* Edena - Edena (Exact
DE Novo Assembler) is an assembler dedicated to process the millions of very
short reads produced by the Illumina Genome Analyzer. Edena is based on the
traditional overlap layout paradigm. By D. Hernandez, P. François, L.
Farinelli, M. Osteras, and J. Schrenzel. Linux/Win.
* EULER-SR
- Short read de novo assembly. By Mark J. Chaisson and Pavel A. Pevzner
from UCSD (published in Genome Research). Uses a de Bruijn graph approach.
* MIRA2 -
MIRA (Mimicking Intelligent Read Assembly) is able to perform true hybrid
de-novo assemblies using reads gathered through 454 sequencing technology (GS20
or GS FLX). Compatible with 454, Solexa and Sanger data. Linux OS required.
* SEQAN
- A Consistency-based Consensus Algorithm for De Novo and Reference-guided
Sequence Assembly of Short Reads. By Tobias Rausch and others. C++, Linux/Win.
* SHARCGS - De novo
assembly of short reads. Authors are Dohm JC, Lottaz C, Borodina T and
Himmelbauer H. from the Max-Planck-Institute for Molecular Genetics.
* SSAKE
- The Short Sequence Assembly by K-mer search and 3' read Extension (SSAKE) is
a genomics application for aggressively assembling millions of short nucleotide
sequences by progressively searching for perfect 3'-most k-mers using a DNA
prefix tree. Authors are René Warren, Granger Sutton, Steven Jones and Robert
Holt from the Canada's Michael Smith Genome Sciences Centre. Perl/Linux.
* SOAPdenovo - Part
of the SOAP suite. See above.
* VCAKE -
De novo assembly of short reads with robust error correction. An improvement on
early versions of SSAKE.
* Velvet
- Velvet is a de novo genomic assembler specially designed for short read
sequencing technologies, such as Solexa or 454. Need about 20-25X coverage and
paired reads. Developed by Daniel Zerbino and Ewan Birney at the European
Bioinformatics Institute (EMBL-EBI).
SNP/Indel Discovery
* ssahaSNP
- ssahaSNP is a polymorphism detection tool. It detects homozygous SNPs and
indels by aligning shotgun reads to the finished genome sequence. Highly
repetitive elements are filtered out by ignoring those kmer words with high
occurrence numbers. More tuned for ABI Sanger reads. Developers are Adam Spargo
and Zemin Ning from the Sanger Centre. Compaq Alpha, Linux-64, Linux-32,
Solaris and Mac
* PolyBayesShort
- A re-incarnation of the PolyBayes SNP discovery tool developed by Gabor Marth
at Washington University. This version is specifically optimized for the
analysis of large numbers (millions) of high-throughput next-generation
sequencer reads, aligned to whole chromosomes of model organism or mammalian
genomes. Developers at Boston College. Linux-64 and Linux-32.
* PyroBayes
- PyroBayes is a novel base caller for pyrosequences from the 454 Life Sciences
sequencing machines. It was designed to assign more accurate base quality
estimates to the 454 pyrosequences. Developers at Boston College.
Genome Annotation/Genome Browser/Alignment Viewer/Assembly Database
* EagleView
- An information-rich genome assembler viewer. EagleView can display a dozen
different types of information including base quality and flowgram signal.
Developers at Boston College.
* LookSeq
- LookSeq is a web-based application for alignment visualization, browsing and
analysis of genome sequence data. LookSeq supports multiple sequencing
technologies, alignment sources, and viewing modes; low or high-depth read
pileups; and easy visualization of putative single nucleotide and structural
variation. From the Sanger Centre.
* MapView -
MapView: visualization of short reads alignment on desktop computer. From the Evolutionary
Genomics Lab at Sun-Yat Sen University, China. Linux.
* SAM
- Sequence Assembly Manager. Whole Genome Assembly (WGA) Management and
Visualization Tool. It provides a generic platform for manipulating, analyzing
and viewing WGA data, regardless of input type. Developers are Rene Warren,
Yaron Butterfield, Asim Siddiqui and Steven Jones at Canada's Michael Smith
Genome Sciences Centre. MySQL backend and Perl-CGI web-based frontend/Linux.
* STADEN -
Includes GAP4. GAP5 once completed will handle next-gen sequencing data. A
partially implemented test version is available here
* XMatchView - A visual tool for analyzing cross_match
alignments. Developed by Rene Warren and Steven Jones at Canada's Michael Smith
Genome Sciences Centre. Python/Win or Linux.
Counting e.g. CHiP-Seq, Bis-Seq, CNV-Seq
* BS-Seq - The source code and data for the "Shotgun
Bisulphite Sequencing of the Arabidopsis Genome Reveals DNA Methylation
Patterning" Nature paper by Cokus et al. (Steve Jacobsen's lab at UCLA). POSIX.
* CHiPSeq -
Program used by Johnson et al. (2007) in their Science publication
* CNV-Seq -
CNV-seq, a new method to detect copy number variation using high-throughput
sequencing. Chao Xie and Martti T Tammi at the National University of
Singapore. Perl/R.
* FindPeaks - perform analysis of ChIP-Seq experiments. It
uses a naive algorithm for identifying regions of high coverage, which
represent Chromatin Immunoprecipitation enrichment of sequence fragments,
indicating the location of a bound protein of interest. Original algorithm by
Matthew Bainbridge, in collaboration with Gordon Robertson. Current code and
implementation by Anthony Fejes. Authors are from the Canada's Michael Smith
Genome Sciences Centre. JAVA/OS independent. Latest versions available as part
of the Vancouver
Short Read Analysis Package
* MACS -
Model-based Analysis for ChIP-Seq. MACS empirically models the length of the
sequenced ChIP fragments, which tends to be shorter than sonication or library
construction size estimates, and uses it to improve the spatial resolution of
predicted binding sites. MACS also uses a dynamic Poisson distribution to
effectively capture local biases in the genome sequence, allowing for more
sensitive and robust prediction. Written by Yong Zhang and Tao Liu from Xiaole
Shirley Liu's Lab.
* PeakSeq
- PeakSeq: Systematic Scoring of ChIP-Seq Experiments Relative to Controls. a
two-pass approach for scoring ChIP-Seq data relative to controls. The first
pass identifies putative binding sites and compensates for variation in the
mappability of sequences across the genome. The second pass filters out sites that
are not significantly enriched compared to the normalized input DNA and
computes a precise enrichment and significance. By Rozowsky J et al. C/Perl.
* QuEST
- Quantitative Enrichment of Sequence Tags. Sidow and Myers Labs at Stanford.
From the 2008 publication Genome-wide analysis of transcription factor binding sites
based on ChIP-Seq data. (C++)
* SISSRs - Site Identification from Short Sequence Reads. BED
file input. Raja Jothi @ NIH. Perl.
**See also this thread for ChIP-Seq, until I get time to update this
list.
Alternate Base Calling
* Rolexa - R-based framework for base calling of Solexa data.
Project publication
* Alta-cyclic
- "a novel Illumina Genome-Analyzer (Solexa) base caller"
Transcriptomics
* ERANGE -
Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq. Supports Bowtie,
BLAT and ELAND. From the Wold lab.
* G-Mo.R-Se
- G-Mo.R-Se is a method aimed at using RNA-Seq short reads to build de novo
gene models. First, candidate exons are built directly from the positions of
the reads mapped on the genome (without any ab initio assembly of the reads),
and all the possible splice junctions between those exons are tested against
unmapped reads. From CNS in France.
* MapNext - MapNext: A software tool for spliced and
unspliced alignments and SNP detection of short sequence reads. From the
Evolutionary Genomics Lab at Sun-Yat Sen University, China.
* QPalma
- Optimal Spliced Alignments of Short Sequence Reads. Authors are Fabio De
Bona, Stephan Ossowski, Korbinian Schneeberger, and Gunnar Rätsch. A paper is available.
* RSAT
- RSAT: RNA-Seq Analysis Tools. RNASAT is developed and maintained by Hui Jiang
at Stanford University.
* TopHat - TopHat is
a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to
mammalian-sized genomes using the ultra high-throughput short read aligner
Bowtie, and then analyzes the mapping results to identify splice junctions between
exons. TopHat is a collaborative effort between the University of Maryland and
the University of California, Berkeley
More:
1.
In a commercial package, NCGR
uses GMAP (http://www.gene.com/share/gmap/)
to alignment Solexa reads. GMAP is free, though.
2. Synamatix has SXOligoSearch (http://synasite.mgrc.com.my:8080/sxo...ligoSearch.php). It
is commercial and from the online decription it looks very promising.
3. SOAP (http://soap.genomics.org.cn)
by Ruiqiang Li, as has been pointed by ECO.
4. Maq is also able to find SNPs with its own alignment. It has a graphical
viewer, but again for its own alignment format.
5. Illumina has a software list: http://www.illumina.com/pagesnrn.ilmn?ID=245.
But most of the listed softwares have been quoted here. :-)
6. Anthony Fejes discussed some softwares in his blog (http://www.fejes.ca/labels/DNA.html).
May be helpful to someone, too.
7. SSAHA has been optimized for short-reads, too. But yes, SSAHASNP appears in
your "SNP/INDEL discovery" category.
8. Ladeana from Gabor's group has recently published a paper on Nature Methods,
using their MASAIC and PolyBayesShort.
9.
Exonerate (http://www.ebi.ac.uk/~guy/exonerate/)
MUMmer (http://mummer.sourceforge.net/)
10. Galaxy --
https://main.g2.bx.psu.edu
11. The pipeline Bowtie, Tophat and Cufflinks at the CBCB:
http://www.cbcb.umd.edu/software
http://www.cbcb.umd.edu/software;
http://tophat.cbcb.umd.edu;
http://cufflinks.cbcb.umd.edu
-- Align read with Bowtie, map splice junctions with Tophat and estimate transcript abundances with Cufflinks
12. T-MeV @ TIGR or MapMan
13. Web-based EPCLUST --
http://www.bioinf.ebc.ee/EP/EP/EPCLUST; or Cyber-T --
http://molgen51.biol.rug.nl/cybert/index.shtml
14. HomerTools --
http://biowhat.ucsd.edu/homer/ngs/homerTools.html
15. Blast2GO --
http://www.blast2go.com/b2ghome
16. T-ACE: Transcriptome Analysis and Comparison Explorer --
http://www.ikmb.uni-kiel.de/tace
17. Expander --
http://acgt.cs.tau.ac.il/expander/expander.html
ref: http://seqanswers.com/forums/showthread.php?t=43
18. T-ACE: Transcriptome Analysis and Comparison Explorer -- http://www.ikmb.uni-kiel.de/tace/
19. FunNet for co-expression and functional network analyses -- http://funnet.ws
20. MIRA 3 - Whole Genome Shotgun and EST Sequence Assembler -- http://sourceforge.net/projects/mira-assembler
21. TranscriptomeBrowser -- http://tagc.univ-mrs.fr/tbrowser
22. T-profiler -- http://www.t-profiler.org/
23. RNA-seq: http://cufflinks.cbcb.umd.edu/
http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html
http://www.bioconductor.org/
http://www.bioconductor.org/packages/release/bioc/html/edgeR.html
Tophat -- http://tophat.cbcb.umd.edu/
Cufflink -- http://cufflinks.cbcb.umd.edu/
§ Scripture
– is a method for transcriptome reconstruction that relies solely on RNA-Seq
reads and an assembled genome to build a transcriptome ab initio.
§ Cufflinks – assembles
transcripts, estimates their abundances, and tests for differential expression
and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and
assembles the alignments into a parsimonious set of transcripts. Cufflinks then
estimates the relative abundances of these transcripts based on how many reads
support each one.
§ SpliceMap
– SpliceMap is a de novo splice junction discovery tool. It offers high
sensitivity and support for arbitrarily long RNA-seq read lengths.
§ TopHat – is a fast
splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to
mammalian-sized genomes using the ultra high-throughput short read aligner
Bowtie, and then analyzes the mapping results to identify splice junctions
between exons.
§ PALMapper – a combination of the spliced alignment method
QPALMA with the short read alignment tool GenomeMapper. The resulting method,
called PALMapper, efficiently computes both spliced and unspliced alignments at
high accuracy while taking advantage of base quality information and splice site
predictions.
§ RNA-MATE – A
recursive mapping strategy for high-throughput RNA-sequencing data.
§ ERANGE – Mapping
and Quantifying Mammalian Transcriptomes by RNA-Seq
§ SeqMap
– A Tool For Mapping Millions Of Short Sequences To The Genome.
§ Bioconductor –
Bioconductor is an open source and open development software project for the
analysis and comprehension of genomic data.
§ BWA – BWA
is a fast light-weighted tool that aligns relatively short sequences (queries)
to a sequence database (targe), such as the human reference genome.
§ CisGenome
– An integrated tool for tiling array, ChIP-seq, genome and cis-regulatory
element analysis.
§ GenePattern – is a powerful genomic analysis platform that
provides access to more than 100 tools for gene expression analysis,
proteomics, SNP analysis and common data processing tasks. A web-based
interface provides easy access to these tools and allows the creation of
multi-step analysis pipelines that enable reproducible in silico research.
§ Galaxy – Mapping pipeline
for Illumina, 454, and SOLiD sequencing data.
§ MAQ – stands for Mapping
and Assembly with Quality It builds assembly by mapping short reads to
reference sequences.
§ UCSC Genome Browser – This
site contains the reference sequence and working draft assemblies for a large
collection of genomes. It also provides portals to the ENCODE and Neandertal
projects.