%0 Journal Article %T Estimation of alternative splicing isoform frequencies from RNA-Seq data %A Marius Nicolae %A Serghei Mangul %A Ion I M£¿ndoiu %A Alex Zelikovsky %J Algorithms for Molecular Biology %D 2011 %I BioMed Central %R 10.1186/1748-7188-6-9 %X In this paper we present a novel expectation-maximization algorithm for inference of isoform- and gene-specific expression levels from RNA-Seq data. Our algorithm, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information when available. The open source Java implementation of IsoEM is freely available at http://dna.engr.uconn.edu/software/IsoEM/ webcite.Empirical experiments on both synthetic and real RNA-Seq datasets show that IsoEM has scalable running time and outperforms existing methods of isoform and gene expression level estimation. Simulation experiments confirm previous findings that, for a fixed sequencing cost, using reads longer than 25-36 bases does not necessarily lead to better accuracy for estimating expression levels of annotated isoforms and genes.Ubiquitous regulatory mechanisms such as the use of alternative transcription start and polyadenylation sites, alternative splicing, and RNA editing result in multiple messenger RNA (mRNA) isoforms being generated from a single genomic locus. Most prevalently, alternative splicing is estimated to take place for over 90% of the multi-exon human genes across diverse cell types [1], with as much as 68% of multi-exon genes expressing multiple isoforms in a clonal cell line of colorectal cancer origin [2]. Not surprisingly, the ability to reconstruct full length isoform sequences and accurately estimate their expression levels is widely believed to be critical for unraveling gene functions and transcription regulation mechanisms [3].Three key interrelated computational problems arise in the context of transcriptome analysis: gene expression level estimation (GE), isoform expression level estimation (IE), and novel isoform discovery (ID). Targeted GE using methods such as quantitative PCR has long been a staple of genetic studies. The co %U http://www.almob.org/content/6/1/9