The use of next-generation DNA sequencing technologies has greatly facilitated reference-guided variant detection in complex plant genomes. However, complications may arise when regions adjacent to a read of interest are used for marker assay development, or when reference sequences are incomplete, as short reads alone may not be long enough to ascertain their uniqueness. Here, the possibility of generating longer sequences in discrete regions of the large and complex genome of maize is demonstrated, using a modified version of a paired-end RAD library construction strategy. Reads are generated from DNA fragments first digested with a methylation-sensitive restriction endonuclease, sheared, enriched with biotin and a selective PCR amplification step, and then sequenced at both ends. Sequences are locally assembled into contigs by subgrouping pairs based on the identity of the read anchored by the restriction site. This strategy applied to two maize inbred lines (B14 and B73) generated 183,609 and 129,018 contigs, respectively, out of which at least 76% were >200?bps in length. A subset of putative single nucleotide polymorphisms from contigs aligning to the B73 reference genome with at least one mismatch was resequenced, and 90% of those in B14 were confirmed, indicating that this method is a potent approach for variant detection and marker development in species with complex genomes or lacking extensive reference sequences. 1. Introduction DNA-based genetic markers are pivotal tools for applications as diverse as QTL mapping, marker assisted selection, association mapping, and fine mapping for the detection of genes linked to a particular phenotype [1]. Among the variety of genetic markers that have been developed, those derived from single nucleotide polymorphisms (SNPs) have become the marker of choice for many mapping applications because of their abundance and the availability of high-throughput and cost-effective technologies for detection and diagnostics [2–4]. One popular tool for SNP identification and detection has been the construction of reduced-representation libraries (RRL) and their sequencing with massively parallel sequencing platforms, in species as varied as cattle, worm, soybean, rice, maize, or common bean [5–10]. However, one major limitation of such platforms is the relatively short length of individual sequencing reads. While the availability of a high quality reference sequence may render short reads sufficient for alignment and subsequent SNP detection, this limitation may be further compounded in crop species due to (1) the
References
[1]
S. R. Eathington, T. M. Crosbie, M. D. Edwards, R. S. Reiter, and J. K. Bull, “Molecular markers in a commercial breeding program,” Crop Science, vol. 47, pp. S154–S163, 2007.
[2]
J. Shendure and H. Ji, “Next-generation DNA sequencing,” Nature Biotechnology, vol. 26, no. 10, pp. 1135–1145, 2008.
[3]
S. Deschamps and M. A. Campbell, “Utilization of next-generation sequencing platforms in plant genomics and genetic variant discovery,” Molecular Breeding, vol. 25, no. 4, pp. 553–570, 2010.
[4]
R. K. Varshney, S. N. Nayak, G. D. May, and S. A. Jackson, “Next-generation sequencing technologies and their implications for crop genetics and breeding,” Trends in Biotechnology, vol. 27, no. 9, pp. 522–530, 2009.
[5]
L. W. Hillier, G. T. Marth, A. R. Quinlan et al., “Whole-genome sequencing and variant discovery in C. elegans,” Nature Methods, vol. 5, no. 2, pp. 183–188, 2008.
[6]
C. P. Van Tassell, T. P. L. Smith, L. K. Matukumalli et al., “SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries,” Nature Methods, vol. 5, no. 3, pp. 247–252, 2008.
[7]
M. A. Gore, M. H. Wright, E. S. Ersoz, et al., “Large-scale discovery of gene-enriched SNPs,” The Plant Genome, vol. 2, pp. 121–133, 2009.
[8]
S. Deschamps, M. la Rota, J. P. Ratashak, et al., “Rapid genome-wide single nucleotide polymorphism discovery in soybean and rice via deep resequencing of reduced representation libraries with the Illumina Genome Analyzer,” The Plant Genome, vol. 3, pp. 53–68, 2010.
[9]
D. L. Hyten, Q. Song, E. W. Fickus et al., “High-throughput SNP discovery and assay development in common bean,” BMC Genomics, vol. 11, no. 1, article 475, 2010.
[10]
X. Wu, C. Ren, T. Joshi, T. Vuong, D. Xu, and H. T. Nguyen, “SNP discovery by high-throughput sequencing in soybean,” BMC Genomics, vol. 11, no. 1, article 469, 2010.
[11]
M. Margulies, M. Egholm, W. E. Altman, et al., “Genome sequencing in microfabricated high-density picolitre reactors,” Nature, vol. 437, pp. 376–380, 2005.
[12]
J. Eid, A. Fehr, J. Gray, et al., “Real-time DNA sequencing from single polymerase molecules,” Science, vol. 323, pp. 133–138, 2009.
[13]
J. B. Hiatt, R. P. Patwardhan, E. H. Turner, C. Lee, and J. Shendure, “Parallel, tag-directed assembly of locally derived short sequence reads,” Nature Methods, vol. 7, no. 2, pp. 119–122, 2010.
[14]
P. D. Etter, J. L. Preston, S. Bassham, W. A. Cresko, and E. A. Johnson, “Local de novo assembly of rad paired-end contigs using short sequencing reads,” PLoS ONE, vol. 6, no. 4, article e18561, 2011.
[15]
E. M. Willing, M. Hoffmann, J. D. Klein, D. Weigel, and C. Dreyer, “Paired-end RAD-seq for de novo assembly and marker design without available reference,” Bioinformatics, vol. 27, no. 16, pp. 2187–2193, 2011.
[16]
B. Langmead, C. Trapnell, M. Pop, and S. L. Salzberg, “Ultrafast and memory-efficient alignment of short DNA sequences to the human genome,” Genome Biology, vol. 10, no. 3, article R25, 2009.
[17]
D. R. Zerbino and E. Birney, “Velvet: algorithms for de novo short read assembly using de Bruijn graphs,” Genome Research, vol. 18, no. 5, pp. 821–829, 2008.
[18]
P. D. Rabinowicz, K. Schutz, N. Dedhia et al., “Differential methylation of genes and retrotransposons facilitates shotgun sequencing of the maize genome,” Nature Genetics, vol. 23, no. 3, pp. 305–308, 1999.
[19]
P. D. Rabinowicz, R. Citek, M. A. Budiman et al., “Differential methylation of genes and repeats in land plants,” Genome Research, vol. 15, no. 10, pp. 1431–1440, 2005.