Verbundprojekt JCB (FSU) Jenaer Centrum für Bioinformatik FSU/Kern, Projekt D.5 (Stochastische, constraint-basierte Ansätze zur Beschreibung von regulativen Sequenzen, Forschungsvorhaben 0312704K

Description of the project:
The identification of transcription factor binding sites in promoter sequences is an important problem, since it reveals information about the transcriptional regulation of genes. For analysing transcriptional regulation,computational approaches for predicting putative binding sites are applied. Commonly used stochastic models for binding sites are position-specific score matrices, which show weak predictive power. The objective of subproject D5 was the development of modelling approaches for description and recognition of regulatory DNA sequences in order to improve the prediction performance, especially by taking additional structural properties into account. In a first step, we have focused on single transcription factor binding sites (TFBS). Since it is obvious that the traditionally sequence-position-centered view on TFBS with independence assumptions between the motif positions is not characteristic enough, we abstracted from the these motif models for TFBS and represented them as sets of characteristic properties of which the majority can be derived from sequence information (sequence- dependent structure contribution, coarse base profiles in the neighbourhood of the site, matches to small consensus sequences). Bayesian networks have recently attracted considerable attention for data modelling and classification since they are a sophisticated stochastical framework to model features of various value ranges and their statistical dependencies. A subtask of learning these Bayesian networks from sets of known TFBS samples is the automatic selection of a highly predictive property subset, for which we applied adaptions of sequential search algorithms. Most recently we have released a web application, BioBayesNet, which facilitates the use of Bayesian networks to external scientists. It allows to calculate properties from uploaded sequences, to search for discriminating property subsets and to learn and use Bayesian networks. This application is suitable not only for TFBS but for any sequence analysis task. In a second phase of the project we did research in an integration of the TFBS models in a model for regulatory modules which consists of arrangements of TFBS on promoter sequences. The resulting model is based on Hidden Markov models which runs on sequences of feature value vectors and uses the TFBS Bayesian networks as state output distributions. Another important situation occurring in analysis of regulatory sequences is that you are given a set of unaligned sequences. In this case, the only information is that the sequences contain a common motif. This problem is called motif discovery, and a standard approach uses EM-based algorithms (MEME). Again we were interested to include additional structural information. For that purpose, we have developed an extension to MEME which guides the EM algorithm to promising motif start positions using structural features. These features determine an informative prior on possible start positions.

Additional information:
contact person: Prof. Dr. Rolf Backofen
Start of project: 01.02.2003
End of project: 31.12.2007
Project Management:
Albert-Ludwigs-University Freiburg
Prof. Dr. Rolf Backofen
Prof. Dr. Rolf Backofen
Georges-Köhler-Allee 106
79110 Freiburg

Phone: +49 (0) 761-203-7461
Fax: +49 (0) 761-203-7462
Actual Research Report
  • BMBF