Verbundprojekt JCB (FSU) Jenaer Centrum für Bioinformatik FSU/Kern,
Projekt D.5 (Stochastische, constraintbasierte Ansätze zur Beschreibung von regulativen Sequenzen, Forschungsvorhaben 0312704K
Description of the project:
The identification of transcription factor binding sites in promoter sequences is an important problem, since it reveals information about the transcriptional regulation of genes. For analysing transcriptional regulation,computational approaches for predicting putative binding sites are applied. Commonly used stochastic models for binding sites are positionspecific score matrices, which show weak predictive power. The objective of subproject D5 was the development of modelling approaches for description and recognition of regulatory DNA sequences in order to improve the prediction performance, especially by taking additional structural properties into account. In a first step, we have focused on single transcription factor binding sites (TFBS). Since it is obvious that the traditionally sequencepositioncentered view on TFBS with independence assumptions between the motif positions is not characteristic enough, we abstracted from the these motif models for TFBS and represented them as sets of characteristic properties of which the majority can be derived from sequence information (sequence dependent structure contribution, coarse base profiles in the neighbourhood of the site, matches to small consensus sequences). Bayesian networks have recently attracted considerable attention for data modelling and classification since they are a sophisticated stochastical framework to model features of various value ranges and their statistical dependencies. A subtask of learning these Bayesian networks from sets of known TFBS samples is the automatic selection of a highly predictive property subset, for which we applied adaptions of sequential search algorithms. Most recently we have released a web application, BioBayesNet, which facilitates the use of Bayesian networks to external scientists. It allows to calculate properties from uploaded sequences, to search for discriminating property subsets and to learn and use Bayesian networks. This application is suitable not only for TFBS but for any sequence analysis task. In a second phase of the project we did research in an integration of the TFBS models in a model for regulatory modules which consists of arrangements of TFBS on promoter sequences. The resulting model is based on Hidden Markov models which runs on sequences of feature value vectors and uses the TFBS Bayesian networks as state output distributions. Another important situation occurring in analysis of regulatory sequences is that you are given a set of unaligned sequences. In this case, the only information is that the sequences contain a common motif. This problem is called motif discovery, and a standard approach uses EMbased algorithms (MEME). Again we were interested to include additional structural information. For that purpose, we have developed an extension to MEME which guides the EM algorithm to promising motif start positions using structural features. These features determine an informative prior on possible start positions.
Additional information: http://www.imbjena.de/jcb
contact person: Prof. Dr. Rolf Backofen
Email: backofen@informatik.unifreiburg.de
Runtime:
Start of project: 01.02.2003 End of project: 31.12.2007
Project Management:
AlbertLudwigsUniversity Freiburg
Prof. Dr. Rolf Backofen Bioinformatik Prof. Dr. Rolf Backofen GeorgesKöhlerAllee 106 79110 Freiburg Germany
Phone: +49 (0) 7612037461 Fax: +49 (0) 7612037462 Email: backofen@informatik.unifreiburg.de
http://www.bioinf.unifreiburg.de
Actual Research Report
Financing:
Keywords:
