%0 Journal Article
%T TRII: A Probabilistic Scoring of Drosophila melanogaster Translation Initiation Sites
%A Michael P Weir
%A Michael D Rice
%J EURASIP Journal on Bioinformatics and Systems Biology
%D 2010
%I BioMed Central
%R 10.1155/2010/814127
%X Understanding how biological machines work in the context of genomes, transcriptomes, and proteomes requires appropriate languages and representations for successful modeling of their biological processes. Information theory provides one of the foundations for this goal and underlies sequence motif-finding algorithms such as MEME [1]. For example, information theory gives us powerful ways to analyze and score sequence motifs in RNAs that are targeted by biological machines such as the spliceosome or ribosome [2每4]. The approach reveals, for each nucleotide position in the motif, which nucleotide choices are preferred and which are avoided. For any single RNA sequence, the collective deviations from the preferred nucleotides must be sufficiently small for the machine to successfully function on that RNA.In this study, several analytical approaches are integrated to increase the power of these scoring methods using Drosophila translation initiation sites as a model setting. As an introduction, we describe first the information theoretic basis for these scoring methods. Motifs of functional importance can be quantitatively assessed through their sequence conservation, measured as information content in sets of aligned sequences [2, 5, 6]. The information at each nucleotide position for a set of aligned RNA sequences is defined by the expressionThe summation represents the uncertainty based on the frequencies of occurrence of the nucleotides at position . The sampling correction factor depends on and decreases toward 0 as the value of increases [3].It is sometimes important to take into account nonrandom background nucleotide frequencies. For example, the mean frequencies of each nucleotide in Drosophila cDNAs deviate significantly from 0.25 [3], and this fact may influence how spliceosomes or ribosomes perceive RNA molecules. The relative information (often called relative entropy) at each nucleotide position is defined by the expressionwhere is the background frequenc
%U http://bsb.eurasipjournals.com/content/2010/1/814127