全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Correction of Spatial Bias in Oligonucleotide Array Data

DOI: 10.1155/2013/167915

Full-Text   Cite this paper   Add to My Lib

Abstract:

Background. Oligonucleotide microarrays allow for high-throughput gene expression profiling assays. The technology relies on the fundamental assumption that observed hybridization signal intensities (HSIs) for each intended target, on average, correlate with their target’s true concentration in the sample. However, systematic, nonbiological variation from several sources undermines this hypothesis. Background hybridization signal has been previously identified as one such important source, one manifestation of which appears in the form of spatial autocorrelation. Results. We propose an algorithm, pyn, for the elimination of spatial autocorrelation in HSIs, exploiting the duality of desirable mutual information shared by probes in a common probe set and undesirable mutual information shared by spatially proximate probes. We show that this correction procedure reduces spatial autocorrelation in HSIs; increases HSI reproducibility across replicate arrays; increases differentially expressed gene detection power; and performs better than previously published methods. Conclusions. The proposed algorithm increases both precision and accuracy, while requiring virtually no changes to users’ current analysis pipelines: the correction consists merely of a transformation of raw HSIs (e.g., CEL files for Affymetrix arrays). A free, open-source implementation is provided as an R package, compatible with standard Bioconductor tools. The approach may also be tailored to other platform types and other sources of bias. 1. Background Microarray technology, a fairly recent yet already well-established and extensively dissected method, allows for the simultaneous quantification of expression levels of entire genomes or subsets thereof [1]. In situ oligonucleotide arrays are by far the most popular type, representing at the time of writing 70% of all arrays deposited in the Gene Expression Omnibus (GEO), a public microarray database, in the last year; of these, 58% are Affymetrix GeneChips [2]. These are designed such that each gene is targeted by multiple perfectly complementary oligonucleotide probes at various locations along its sequence (forming a probe set); copies of each of these probes are covalently linked to a solid surface at a predetermined location on a grid; a labelled RNA sample is allowed to hybridize to each of these probes; and finally a hybridization signal intensity (HSI) is obtained for each probe [3]. The technology relies on the assumption that, on average, HSIs observed in a given probe set correlate with the true concentration of the given mRNA

References

[1]  M. Schena, D. Shalon, R. W. Davis, and P. O. Brown, “Quantitative monitoring of gene expression patterns with a complementary DNA microarray,” Science, vol. 270, no. 5235, pp. 467–470, 1995.
[2]  R. Edgar, M. Domrachev, and A. E. Lash, “Gene Expression Omnibus: NCBI gene expression and hybridization array data repository,” Nucleic Acids Research, vol. 30, no. 1, pp. 207–210, 2002.
[3]  D. J. Lockhart, H. Dong, M. C. Byrne et al., “Expression monitoring by hybridization to high-density oligonucleotide arrays,” Nature Biotechnology, vol. 14, no. 13, pp. 1675–1680, 1996.
[4]  D. W. Selinger, K. J. Cheung, R. Mei et al., “RNA expression analysis using a 30 base pair resolution Escherichia coli genome array,” Nature Biotechnology, vol. 18, no. 12, pp. 1262–1268, 2000.
[5]  A. J. Hartemink, D. K. Gifford, T. S. Jaakkola, and R. A. Young, “Maximum likelihood estimation of optimal scaling factors for expression array normalization,” Microarrays: Optical Technologies and Informatics, vol. 2, no. 23, pp. 132–140, 2001.
[6]  L. M. Cope, R. A. Irizarry, H. A. Jaffee, Z. Wu, and T. P. Speed, “A benchmark for Affymetrix GeneChip expression measures,” Bioinformatics, vol. 20, no. 3, pp. 323–331, 2004.
[7]  R. Gentleman, Bioinformatics and Computational Biology Solutions Using R and Bioconductor, Springer Science and Business Media, New York, NY, USA, 2005.
[8]  F. Naef and M. O. Magnasco, “Solving the riddle of the bright mismatches: labeling and effective binding in oligonucleotide arrays,” Physical Review E, vol. 68, no. 1, part 1, Article ID 011906, 2003.
[9]  Z. Wu and R. A. Irizarry, “Stochastic models inspired by hybridization theory for short oligonucleotide arrays,” Journal of Computational Biology, vol. 12, no. 6, pp. 882–893, 2005.
[10]  M. Reimers and J. N. Weinstein, “Quality assessment of microarrays: visualization of spatial artifacts and quantitation of regional biases,” BMC Bioinformatics, vol. 6, article 166, 2005.
[11]  M. Suárez-Fari?as, A. Haider, and K. M. Wittkowski, “"Harshlighting" small blemishes on microarrays,” BMC Bioinformatics, vol. 6, article 65, 2005.
[12]  G. J. G. Upton and J. C. Lloyd, “Oligonucleotide arrays: information from replication and spatial structure,” Bioinformatics, vol. 21, no. 22, pp. 4162–4168, 2005.
[13]  J. M. Arteaga-Salas, A. P. Harrison, and G. J. G. Upton, “Reducing spatial flaws in oligonucleotide arrays by using neighborhood information,” Statistical Applications in Genetics and Molecular Biology, vol. 7, no. 1, article 29, 2008.
[14]  W. B. Langdon, G. J. Upton, R. da Silva Camargo, and A. P. Harrison, “A survey of spatial defects in Homo Sapiens Affymetrix GeneChips,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. 4, pp. 647–653, 2010.
[15]  R. Z. Gharaibeh, A. A. Fodor, and C. J. Gibas, “Software note: using probe secondary structure information to enhance Affymetrix GeneChip background estimates,” Computational Biology and Chemistry, vol. 31, no. 2, pp. 92–98, 2007.
[16]  V. G. Ratushna, J. W. Weller, and C. J. Gibas, “Secondary structure in the target as a confounding factor in synthetic oligomer microarray design,” BMC Genomics, vol. 6, article 31, 2005.
[17]  H. Wei, P. F. Kuan, S. Tian et al., “A study of the relationships between oligonucleotide properties and hybridization signal intensities from NimbleGen microarray datasets,” Nucleic Acids Research, vol. 36, no. 9, pp. 2926–2938, 2008.
[18]  M. P. Samanta, W. Tongprasit, H. Sethi, C. Chin, and V. Stolc, “Global identification of noncoding RNAs in Saccharomyces cerevisiae by modulating an essential RNA processing pathway,” Proceedings of the National Academy of Sciences of the United States of America, vol. 103, no. 11, pp. 4192–4197, 2006.
[19]  G. J. G. Upton, O. Sanchez-Graillet, J. Rowsell et al., “On the causes of outliers in Affymetrix GeneChip data,” Briefings in Functional Genomics and Proteomics, vol. 8, no. 3, pp. 199–212, 2009.
[20]  A. A. Ahmed, M. Vias, N. G. Iyer, C. Caldas, and J. D. Brenton, “Microarray segmentation methods significantly influence data precision,” Nucleic Acids Research, vol. 32, no. 5, article e50, 2004.
[21]  J. T. Leek and J. D. Storey, “Capturing heterogeneity in gene expression studies by surrogate variable analysis,” PLoS Genetics, vol. 3, no. 9, pp. 1724–1735, 2007.
[22]  R. A. Irizarry, B. Hobbs, F. Collin et al., “Exploration, normalization, and summaries of high density oligonucleotide array probe level data,” Biostatistics, vol. 4, no. 2, pp. 249–264, 2003.
[23]  F. Naef, D. A. Lim, N. Patil, and M. Magnasco, “DNA hybridization to mismatched templates: a chip study,” Physical Review E, vol. 65, no. 4, part 1, Article ID 040902, 2002.
[24]  Affymetrix, “GeneChip Gene 1.0 ST Array System,” Santa Clara, Calif, USA, 2007.
[25]  Y. H. Yang, S. Dudoit, P. Luu, and T. P. Speed, “Normalization for cDNA microarray data,” Microarrays: Optical Technologies and Informatics, vol. 2, no. 23, pp. 141–152, 2001.
[26]  S. Dudoit, Y. H. Yang, M. J. Callow, and T. P. Speed, “Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments,” Statistica Sinica, vol. 12, no. 1, pp. 111–139, 2002.
[27]  B. M. Bolstad, R. A. Irizarry, M. Astrand, and T. P. Speed, “A comparison of normalization methods for high density oligonucleotide array data based on variance and bias,” Bioinformatics, vol. 19, no. 2, pp. 185–193, 2003.
[28]  J. A. Berger, S. Hautaniemi, A. J?rvinen, H. Edgren, S. K. Mitra, and J. Astola, “Optimized LOWESS normalization parameter selection for DNA microarray data,” BMC Bioinformatics, vol. 5, article 194, 2004.
[29]  M. E. Ritchie, J. Silver, A. Oshlack et al., “A comparison of background correction methods for two-colour microarrays,” Bioinformatics, vol. 23, no. 20, pp. 2700–2707, 2007.
[30]  S. L. Carter, A. C. Eklund, B. H. Mecham, I. S. Kohane, and Z. Szallasi, “Redefinition of Affymetrix probe sets by sequence overlap with cDNA microarray probes reduces cross-platform inconsistencies in cancer-associated gene expression measurements,” BMC Bioinformatics, vol. 6, article 107, 2005.
[31]  C. Wu, R. Carta, and Zhang, “Sequence dependence of cross-hybridization on short oligo microarrays,” Nucleic Acids Research, vol. 33, no. 9, p. e84, 2005.
[32]  H. Binder, J. Brücker, and C. J. Burden, “Nonspecific hybridization scaling of microarray expression estimates: a physicochemical approach for chip-to-chip normalization,” Journal of Physical Chemistry B, vol. 113, no. 9, pp. 2874–2895, 2009.
[33]  Y. H. Yang, S. Dudoit, P. Luu et al., “Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation,” Nucleic Acids Research, vol. 30, no. 4, p. e15, 2002.
[34]  C. Workman, L. J. Jensen, H. Jarmer et al., “A new non-linear normalization method for reducing variability in DNA microarray experiments,” Genome Biology, vol. 3, no. 9, research0048, 2002.
[35]  C. Colantuoni, G. Henry, S. Zeger, and J. Pevsner, “Local mean normalization of microarray element signal intensities across an array surface: quality control and correction of spatially systematic artifacts,” BioTechniques, vol. 32, no. 6, pp. 1316–1320, 2002.
[36]  D. L. Wilson, M. J. Buckley, C. A. Helliwell, and I. W. Wilson, “New normalization methods for cDNA microarray data,” Bioinformatics, vol. 19, no. 11, pp. 1325–1332, 2003.
[37]  D. Baird, P. Johnstone, and T. Wilson, “Normalization of microarray data using a spatial mixed model analysis which includes splines,” Bioinformatics, vol. 20, no. 17, pp. 3196–3205, 2004.
[38]  A. L. Tarca, J. E. Cooke, and J. Mackay, “A robust neural networks approach for spatial and intensity-dependent normalization of cDNA microarray data,” Bioinformatics, vol. 21, no. 11, pp. 2674–2683, 2005.
[39]  P. Neuvial, P. Hupé, I. Brito et al., “Spatial normalization of array-CGH data,” BMC Bioinformatics, vol. 7, article 264, 2006.
[40]  H. S. Chai, T. M. Therneau, K. R. Bailey, and J. A. Kocher, “Spatial normalization improves the quality of genotype calling for Affymetrix SNP 6.0 arrays,” BMC Bioinformatics, vol. 11, article 356, 2010.
[41]  J. M. Arteaga-Salas, H. Zuzan, W. B. Langdon, G. J. G. Upton, and A. P. Harrison, “An overview of image-processing methods for affymetrix genechips,” Briefings in Bioinformatics, vol. 9, no. 1, pp. 25–33, 2008.
[42]  T. H. Stokes, R. A. Moffitt, J. H. Phan, and M. D. Wang, “Chip artifact CORRECTion (caCORRECT): a bioinformatics system for quality assurance of genomics and proteomics array data,” Annals of Biomedical Engineering, vol. 35, no. 6, pp. 1068–1080, 2007.
[43]  R. A. Irizarry, Z. Wu, and H. A. Jaffee, “Comparison of Affymetrix GeneChip expression measures,” Bioinformatics, vol. 22, no. 7, pp. 789–794, 2006.
[44]  N. O. Stitziel, B. G. Mar, J. Liang, and C. A. Westbrook, “Membrane-associated and secreted genes in breast cancer,” Cancer Research, vol. 64, no. 23, pp. 8682–8687, 2004.
[45]  D. Magda, P. Lecane, R. A. Miller et al., “Motexafin gadolinium disrupts zinc metabolism in human cancer cell lines,” Cancer Research, vol. 65, no. 9, pp. 3837–3845, 2005.
[46]  “Latin Square Data for Expression Algorithm Assessment,” http://www.affymetrix.com/support/technical/sample_data/datasets.affx.
[47]  C. Cheng and L. M. Li, “Sub-array normalization subject to differentiation,” Nucleic Acids Research, vol. 33, no. 17, pp. 5565–5573, 2005.
[48]  C. Li and W. H. Wong, “Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection,” Proceedings of the National Academy of Sciences of the United States of America, vol. 98, no. 1, pp. 31–36, 2001.
[49]  B. Bolstad, J. Brettschneider, K. Simpson, L. Cope, R. Irizarry, and T. P. Speed, “Quality assessment of affymetrix GeneChip data,” in Bioinformatics and Computational Biology Using R and Bioconductor, R. Gentleman, V. Carey, W. Huber, R. Irizarry, and S. Dudoit, Eds., Springer, 2005.
[50]  R. C. Geary, “The contiguity ratio and statistical mapping,” The Incorporated Statistician, vol. 5, no. 3, pp. 115–146, 1954.
[51]  P. A. Moran, “Notes on continuous stochastic phenomena,” Biometrika, vol. 37, no. 1-2, pp. 17–23, 1950.
[52]  Z. Wu, R. A. Irizarry, R. Gentleman, F. Martinez-Murillo, and F. Spencer, “A model-based background adjustment for oligonucleotide expression arrays,” Journal of the American Statistical Association, vol. 99, no. 468, pp. 909–917, 2004.
[53]  B. M. Bolstad, Low-level analysis of high-density oligonucleotide array data: background, normalization and summarization [Ph.D. thesis in biostatistics], University of California, Berkeley, 2004.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133