%0 Journal Article %T A new approach for detecting low-level mutations in next-generation sequence data %A Mingkun Li %A Mark Stoneking %J Genome Biology %D 2012 %I BioMed Central %R 10.1186/gb-2012-13-5-r34 %X Next-generation sequencing (NGS) is now widely used in biological and medical studies. Most re-sequencing studies have the goal of identifying homozygous or heterozygous mutations in diploid genomes (that is, mutations present at 50% or 100% frequency in sequence reads), and use this information to study genome evolution, infer population history, or identify causal genes/mutations in disease-association studies [1,2]. However, some applications require the identification of low-level mutations (LLMs) that are present at frequencies well below 50% within the population of molecules that is typically sequenced in an NGS study; examples include heteroplasmic mutations in mitochondrial DNA (mtDNA) genomes [3], somatic mutations in tumors [4], or mutations in pooled DNA samples [5].Challenges in detecting true LLMs come from sequencing error, library contamination, PCR artifacts, and so on. Sequencing error is the most common problem; for instance, the Illumina Genome Analyzer, which is one of the most popular NGS platforms, has an average error rate of 0.01 [6]. Moreover, sequencing error is unevenly distributed along the genome and may be influenced by the sequence context, position on the read, and molecule structure, resulting in sequencing error 'hot spots' where the error rate can be ten-fold greater (or more) than the genome average [3,7-10]. Unfortunately, those features resulting in sequencing error hot spots have not been fully characterized, thus making it difficult to distinguish sequencing errors from true LLMs [10].Detecting 'true' mutations involves genotype estimation (that is, the mutation frequency is expected to be 0%, 50%, or 100% for diploid data), and methods exist to provide accurate inference at a coverage of around 20¡Á [2,11]. By contrast, even though much higher sequencing depth is typically obtained for NGS studies designed to detect LLMs (often ¡Ý1,000¡Á), the challenge remains to distinguish LLMs from sequencing errors [12]. Recently, several %U http://genomebiology.com/2012/13/5/R34