Friday, 24 February 2012

Jinjiang's WebWatcher on Biology (4) - Protein-DNA databases


Protein-DNA binding databases (TF-DNA & ligand-DNA; excluding histone-DNA):
This is just the paper; the web interface for the database is not available yet (28.12.2011).
FlyTF currently contains 129 proteins for which PWMs are available.
TRANSFAC consists of free and paid sections. Provided binding sites are experimentally proved. Human TF weight matrices may be viewed through the web interface of UCSC Genome Browser.
The JASPAR CORE database contains a curated, non-redundant set of profiles, derived from published collections of experimentally defined transcription factor binding sites for eukaryotes. The prime difference from TRANSFAC is the open access to the data.
KDBI is a collection of experimentally determined kinetic data of protein-protein, protein-RNA, protein-DNA, protein-ligand, RNA-ligand, DNA-ligand binding events described in the literature.
ProNIT currently contains more than 4900 entries. Each entry has the protein and nucleic acid information, experimental conditions and the following binding thermodynamic data: dissociation constant Kd, energies, stoichiometry of binding and activity (Km and kcat).
UniPROBE contains data on the preferences of proteins for all possible sequence variants ('words') of length k ('k-mers'), as well as position weight matrix (PWM) and graphical sequence logo representations of the k-mer data. In total, the database currently hosts DNA binding data for 391 nonredundant proteins (individual proteins or in some cases heterodimers) from a diverse collection of organisms.
This is a personal collection. Currently contains ~50 matrices (Last checked: 06.10.2010).


Calculating TF affinity (binding constant) from weight matrices and directly from experiments:*
TRAP calculates binding affinity based on the matrix description of a given TF and a set of DNA sequences to be annotated (input). It requires the specification of two biophysically-motivated parameters. The freely available program code is written in C. Further details are available in the paper by Roider et al., 2007.
STAP uses a biophysical model to analyzes transcription factor (TF)-DNA binding data, such as ChIP-chip or ChIPSeq data. The program assumes that the measured affinity of a sequence to a TF (TF_exp) in some ChIP-chip or ChIP-seq experiment is determined by: 1) the number and strength of binding sites of TF_exp in this sequence; 2) the presence of other sites that may interact cooperatively with the sites of TF_exp in the neighborhood. Specifically, it takes as input a set of DNA sequences, their binding affinities to some TF as measured by experiments (TF_exp), and the position weight matrices (PWMs) of a set of TFs, including TF_exp. It will learn the relevant parameters of the biophysical model of TF-DNA interaction, including those of TF-DNA interaction and those of TF-TF cooperative interactions. **To be tested.
    The input to MatrixREDUCE is a sequence file in FASTA format and an expression data file in tab-delimited text format (missing values are allowed). Output data include PSAMs in numeric and graphical format, parameters of the fitted model, and an HTML summary page.
    BayesPI integrates Bayesian model regularization with biophysical modeling of protein-DNA interactions and nucleosome positioning to study protein-DNA interactions, using a high-throughput dataset. **To be tested.
    The scoring function calibrated against crystallographic data on protein-DNA contacts can recover PWMs, sometimes outperforming experimental PWMs. **To be tested
    *Section under construction. Check again later and feel free to submit your links and comments


    General-purpose numbers relevant for gene regulation:

    No comments:

    Post a Comment