%0 Journal Article
%T GRISOTTO: A greedy approach to improve combinatorial algorithms for motif discovery with prior knowledge
%A Alexandra M Carvalho
%A Arlindo L Oliveira
%J Algorithms for Molecular Biology
%D 2011
%I BioMed Central
%R 10.1186/1748-7188-6-13
%X We extend RISOTTO, a combinatorial algorithm for motif discovery, by post-processing its output with a greedy procedure that uses prior information. PSP's from different sources are combined into a scoring criterion that guides the greedy search procedure. The resulting method, called GRISOTTO, was evaluated over 156 yeast TF ChIP-chip sequence-sets commonly used to benchmark prior-based motif discovery algorithms. Results show that GRISOTTO is at least as accurate as other twelve state-of-the-art approaches for the same task, even without combining priors. Furthermore, by considering combined priors, GRISOTTO is considerably more accurate than the state-of-the-art approaches for the same task. We also show that PSP's improve GRISOTTO ability to retrieve motifs from mouse ChiP-seq data, indicating that the proposed algorithm can be applied to data from a different technology and for a higher eukaryote.The conclusions of this work are twofold. First, post-processing the output of combinatorial algorithms by incorporating prior information leads to a very efficient and effective motif discovery method. Second, combining priors from different sources is even more beneficial than considering them separately.An important part of gene regulation is mediated by specific proteins, called transcription factors (TF), which influence the transcription of a particular gene by binding to specific sites on DNA sequences, called transcription factor binding sites (TFBS). Such binding sites are relatively short segments of DNA, normally 5 to 25 nucleotides long. Discovering TFBS's is a challenging task, mainly because they exhibit a high degree of degeneracy making them difficult to distinguish from random artifacts. For this reason, algorithms for motifs discovery often suffer from impractical high false positive rates and return noisy models that are not useful to characterize TFBS's. Some extra knowledge, carefully selected from the literature, has been incorporated in motif dis
%U http://www.almob.org/content/6/1/13