%0 Journal Article %T PANDAseq: paired-end assembler for illumina sequences %A Andre P Masella %A Andrea K Bartram %A Jakub M Truszkowski %A Daniel G Brown %A Josh D Neufeld %J BMC Bioinformatics %D 2012 %I BioMed Central %R 10.1186/1471-2105-13-31 %X PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods.PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over na£¿ve assembly with negligible loss of "good" sequence.Single-gene sequencing has become the benchmark for studying microbial taxonomic composition of environmental samples, by amplification of hypervariable regions of the 16S rRNA gene. Next-generation sequencing platforms, such as Illumina, are now adapted for the generation of multi-million-member sequence libraries for sample comparisons [1-4]. The PCR amplicons used for sequencing typically encompass one or more 16S rRNA gene hypervariable regions and amplicon lengths typically extend beyond the sequencing limit of the Illumina single-read method, which is typically less than 150 bases. Because the Illumina platform can generate amplicon sequences in a paired-end format, based on each template's position on the flow cell, paired reads can be directly matched and assembled. The prefiltering step of the genome assembly software PHRAP can be used to assemble reads [3]. Although the Needleman-Wunsch algorithm [5] embedded in Merger (http://emboss.sourceforge.net/apps/release/6.2/emboss/apps/merger.html webcite) has been used to assemble Illumina paired-end reads [6], PANDAseq makes use of Illumina-specific properties, including the low probability of gap-inclusion.Assembly of the Illumina paired-end sequences can be done na£¿vely requiring perfect match in the region of overlap, to produce large numbers of %U http://www.biomedcentral.com/1471-2105/13/31