全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM)

DOI: 10.3390/microorganisms1010137

Keywords: metagenome, oligonucleotide composition, influenza virus, big data, peptide composition, bioinformatics, SOM, genome signature, microbial community

Full-Text   Cite this paper   Add to My Lib

Abstract:

With the remarkable increase of genomic sequence data of microorganisms, novel tools are needed for comprehensive analyses of the big sequence data available. The self-organizing map (SOM) is an effective tool for clustering and visualizing high-dimensional data, such as oligonucleotide composition on one map. By modifying the conventional SOM, we developed batch-learning SOM (BLSOM), which allowed classification of sequence fragments (e.g., 1 kb) according to phylotypes, solely depending on oligonucleotide composition. Metagenomics studies of uncultivable microorganisms in clinical and environmental samples should allow extensive surveys of genes important in life sciences. BLSOM is most suitable for phylogenetic assignment of metagenomic sequences, because fragmental sequences can be clustered according to phylotypes, solely depending on oligonucleotide composition. We first constructed oligonucleotide BLSOMs for all available sequences from genomes of known species, and by mapping metagenomic sequences on these large-scale BLSOMs, we can predict phylotypes of individual metagenomic sequences, revealing a microbial community structure of uncultured microorganisms, including viruses. BLSOM has shown that influenza viruses isolated from humans and birds clearly differ in oligonucleotide composition. Based on this host-dependent oligonucleotide composition, we have proposed strategies for predicting directional changes of virus sequences and for surveilling potentially hazardous strains when introduced into humans from non-human sources.

References

[1]  Nei, M. Molecular Evolutionary Genetics; Columbia University Press: New York, NY, USA, 1987.
[2]  Kumar, S.; Nei, M.; Dudley, J.; Tamura, K. MEGA: A biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief. Bioinform. 2008, 9, 299–306, doi:10.1093/bib/bbn017.
[3]  Tamura, K.; Peterson, D.; Peterson, N.; Stecher, G.; Nei, M.; Kumar, S. MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 2011, 28, 2731–2739, doi:10.1093/molbev/msr121.
[4]  Kanaya, S.; Kinouchi, M.; Abe, T.; Kudo, Y.; Yamada, Y.; Nishi, T.; Mori, H.; Ikemura, T. Analysis of codon usage diversity of bacterial genes with a self-organizing map (SOM): Characterization of horizontally transferred genes with emphasis on the E. coli O157 genome. Gene 2001, 276, 89–99, doi:10.1016/S0378-1119(01)00673-4.
[5]  Abe, T.; Kanaya, S.; Kinouchi, M.; Ichiba, Y.; Kozuki, T.; Ikemura, T. Informatics for unveiling hidden genome signatures. Genome Res. 2003, 13, 693–702, doi:10.1101/gr.634603.
[6]  Abe, T.; Sugawara, H.; Kanaya, S.; Ikemura, T. Sequences from almost all prokaryotic, eukaryotic, and viral genomes available could be classified according to genomes on a large-scale Self-Organizing Map constructed with the Earth Simulator. J. Earth Simulator 2006, 6, 17–23.
[7]  Karlin, S.; Campbell, A.M.; Mrazek, J. Comparative DNA analysis across diverse genomes. Annu. Rev. Genet. 1998, 32, 185–225, doi:10.1146/annurev.genet.32.1.185.
[8]  Kohonen, T. Self-organized formation of topologi-cally correct feature maps. Biol. Cybern. 1982, 43, 59–69, doi:10.1007/BF00337288.
[9]  Kohonen, T.; Oja, E.; Simula, O.; Visa, A.; Kangas, J. Engineering applications of the self-organizing map. Proc. IEEE 1996, 84, 1358–1384, doi:10.1109/5.537105.
[10]  Abe, T.; Sugawara, H.; Kinouchi, M.; Kanaya, S.; Ikemura, T. Novel phylogenetic studies of genomic sequence fragments derived from uncultured microbe mixtures in environmental and clinical samples. DNA Res. 2005, 12, 281–290.
[11]  Abe, T.; Sugawara, H.; Kinouchi, M.; Kanaya, S.; Ikemura, T. A large-scale Self-Organizing Map (SOM) unveils sequence characteristics of a wide range of eukaryote genomes. Gene 2006, 365, 27–34, doi:10.1016/j.gene.2005.09.040.
[12]  Iwasaki, Y.; Wada, K.; Wada, Y.; Abe, T.; Ikemura, T. Notable clustering of transcription-factor-binding motifs in human pericentric regions and its biological significance. Chromosome Res. 2013, 21, 461–474, doi:10.1007/s10577-013-9371-y.
[13]  Iwasaki, Y.; Abe, T.; Wada, K.; Itoh, M.; Ikemura, T. Prediction of directional changes of influenza a virus genome sequences with emphasis on pandemic H1N1/09 as a model case. DNA Res. 2011, 18, 125–136, doi:10.1093/dnares/dsr005.
[14]  Bernardi, G.; Olofsson, B.; Filipski, J.; Zerial, M.; Salinas, J.; Cuny, G.; Meunier-Rotival, M.; Rodier, F. The mosaic genome of warm-blooded vertebrates. Science 1985, 228, 953–958.
[15]  Ikemura, T. Codon usage and transfer RNA content in unicellular and multicellular organisms. Mol. Biol. Evol. 1985, 2, 13–34.
[16]  Ikemura, T.; Aota, S. Global variation in G + C content along vertebrate genome DNA: Possible correlation with chromosome band structures. J. Mol. Biol. 1988, 203, 1–13, doi:10.1016/0022-2836(88)90086-1.
[17]  Ikemura, T.; Wada, K. Evident diversity of codon usage patterns of human genes with respect to chromosome banding patterns and chromosome numbers; relation between nucleotide sequence data and cytogenetic data. Nucleic Acids Res. 1991, 19, 4333–4339, doi:10.1093/nar/19.16.4333.
[18]  Bernardi, G. Structural and Evolutionary Genomics: Natural Selection in Genome Evolution; Elsevier Science: New York, NY, USA, 2004.
[19]  Amann, R.I.; Ludwig, W.; Schleifer, K.H. Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol. Rev. 1995, 59, 143–169.
[20]  DeLong, E.F. Microbial population genomics and ecology. Curr. Opin. Microbiol. 2002, 5, 520–524, doi:10.1016/S1369-5274(02)00353-3.
[21]  Lorenz, P.; Liebeton, K.; Niehaus, F.; Eck, J. Screening for novel enzymes for biocatalytic processes: Accessing the metagenome as a resource of novel functional sequence space. Curr. Opin. Biotechnol. 2002, 13, 572–577, doi:10.1016/S0958-1669(02)00345-2.
[22]  Hugenholtz, P.; Pace, N.R. Identifying microbial diversity in the natural environment: A molecular phylogenetic approach. Trends Biotechnol. 1996, 14, 190–197, doi:10.1016/0167-7799(96)10025-1.
[23]  Rondon, M.R.; August, P.R.; Bettermann, A.D.; Brady, S.F.; Grossman, T.H.; Liles, M.R.; Loiacono, K.A.; Lynch, B.A.; MacNeil, I.A.; Minor, C.; et al. Cloning the soil metagenome: A strategy for accessing the genetic and functional diversity of uncultured microorganisms. Appl. Environ. Microbiol. 2000, 66, 2541–2547, doi:10.1128/AEM.66.6.2541-2547.2000.
[24]  Schloss, P.D.; Handelsman, J. Biotechnological prospects from metagenomics. Curr. Opin. Biotechnol. 2003, 14, 303–310, doi:10.1016/S0958-1669(03)00067-3.
[25]  DeLong, E.F.; Preston, C.M.; Mincer, T.; Rich, V.; Hallam, S.J.; Frigaard, N.U.; Martinez, A.; Sullivan, M.B.; Edwards, R.; Brito, B.R.; et al. Community genomics among stratified microbial assemblages in the ocean’s interior. Science 2006, 311, 496–503.
[26]  Frias-Lopez, J.; Shi, Y.; Tyson, G.W.; Coleman, M.L.; Schuster, S.C.; Chisholm, S.W.; Delong, E.F. Microbial community gene expression in ocean surface waters. Proc. Natl. Acad. Sci. USA 2008, 105, 3805–3810.
[27]  Qin, J.; Li, R.; Raes, J.; Arumugam, M.; Burgdorf, K.S.; Manichanh, C.; Nielsen, T.; Pons, N.; Levenez, F.; Yamada, T.; et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 2010, 464, 59–65, doi:10.1038/nature08821.
[28]  Kurokawa, K.; Itoh, T.; Kuwahara, T.; Oshima, K.; Toh, H.; Toyoda, A.; Takami, H.; Morita, H.; Sharma, V.K.; Srivastava, T.P.; et al. Comparative metagenomics revealed commonly enriched gene sets in human gut microbiomes. DNA Res. 2007, 14, 169–181, doi:10.1093/dnares/dsm018.
[29]  Venter, J.C.; Remington, K.; Heidelberg, J.F.; Halpern, A.L.; Rusch, D.; Eisen, J.A.; Wu, D.; Paulsen, I.; Nelson, K.E.; Nelson, W.; et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science 2004, 304, 66–74, doi:10.1126/science.1093857.
[30]  Edwards, R.A.; Rohwer, F. Viral metagenomics. Nat. Rev. Microbiol. 2005, 3, 504–510, doi:10.1038/nrmicro1163.
[31]  Tyson, G.W.; Chapman, J.; Hugenholtz, P.; Allen, E.E.; Ram, R.J.; Richardson, P.M.; Solovyev, V.V.; Rubin, E.M.; Rokhsar, D.S.; Banfield, J.F. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 2004, 428, 37–43, doi:10.1038/nature02340.
[32]  Dick, G.J.; Andersson, A.F.; Baker, B.J.; Simmons, S.L.; Thomas, B.C.; Yelton, A.P.; Banfield, J.F. Community-wide analysis of microbial genome sequence signatures. Genome Biol. 2009, 10, doi:10.1186/gb-2009-10-8-r85.
[33]  Ultsch, A.; Moerchen, F. ESOM-Maps: Tools for Clustering, Visualization, and Classification with Emergent SOM; Technical Report No. 46; University of Marburg: Marburg, Germany, 2005.
[34]  Nakao, R.; Abe, T.; Nijhof, A.M.; Yamamoto, S.; Jongejan, F.; Ikemura, T.; Sugimoto, C. A novel approach, based on BLSOMs (Batch Learning Self-Organizing Maps), to the microbiome analysis of ticks. ISME J. 2013, 7, 1003–1015, doi:10.1038/ismej.2012.171.
[35]  Hayashi, H.; Abe, T.; Sakamoto, M.; Ohara, H.; Ikemura, T.; Sakka, K.; Benno, Y. Direct cloning of genes encoding novel xylanases from human gut. Can. J. Microbiol. 2005, 51, 251–259, doi:10.1139/w04-136.
[36]  Uchiyama, T.; Abe, T.; Ikemura, T.; Watanabe, K. Substrate-induced gene-expression screening of environmental metagenome libraries for isolation of catabolic genes. Nat. Biotechnol. 2005, 23, 88–93.
[37]  Kosaka, T.; Kato, S.; Shimoyama, T.; Ishii, S.; Abe, T.; Watanabe, K. The genome of Pelotomaculum thermopropionicum reveals niche-associated evolution in anaerobic microbiota. Genome Res. 2008, 18, 442–448, doi:10.1101/gr.7136508.
[38]  Centers for Disease Control and Prevention. Swine influenza A (H1N1) infection in two children—South California, March–April 2009. Morb. Mortal. Wkly. Rep. 2009, 58, 400–402.
[39]  Smith, G.J.; Vijaykrishna, D.; Bahl, J.; Lycett, S.J.; Worobey, M.; Pybus, O.G.; Ma, S.K.; Cheung, C.L.; Raghwani, J.; Bhatt, S.; et al. Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature 2009, 459, 1122–1125, doi:10.1038/nature08182.
[40]  Garten, R.J.; Davis, C.T.; Russell, C.A.; Shu, B.; Lindstrom, S.; Balish, A.; Sessions, W.M.; Xu, X.; Skepner, E.; Deyde, V.; et al. Antigenic and genetic characteristics of swine-origin 2009 A (H1N1) influenza viruses circulating in humans. Science 2009, 325, 197–201, doi:10.1126/science.1176225.
[41]  Bush, R.M.; Bende, C.A.; Subbarao, K.; Cox, N.J.; Fitch, W.M. Predicting the evolution of human influenza A. Science 1999, 286, 1921–1925, doi:10.1126/science.286.5446.1921.
[42]  Suzuki, Y.; Gojobori, T. A method for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 1999, 16, 1315–1328, doi:10.1093/oxfordjournals.molbev.a026042.
[43]  Iwasaki, Y.; Abe, T.; Wada, Y.; Wada, K.; Ikemura, T. Novel bioinformatics strategies for prediction of directional sequence changes in influenza virus genomes and for surveillance of potentially hazardous strains. BMC Infect. Dis. 2013, 13, 386–394, doi:10.1186/1471-2334-13-386.
[44]  Bao, Y.; Bolotov, P.; Dernovoy, D.; Kiryutin, B.; Zaslavsky, L.; Tatusova, T.; Ostell, J.; Lipman, D. The influenza virus resource at the National Center for Biotechnology Information. J. Virol. 2008, 82, 596–601, doi:10.1128/JVI.02005-07.
[45]  Iwasaki, Y.; Ikemura, T.; Wada, K; Wada, Y.; Abe, T. Novel Bioinformatics Method to Analyze More than 10,000 Influenza Virus Strains Easily at Once Batch-Learning Self Organizing Map (BLSOM). In Advance in Viral Genome Research; Borrelli, J.A., Giannini, Y.D., Eds.; Nova Science Publishers: New York, NY, USA, 2013; pp. 95–112.
[46]  Garcia-Sastre, A. Inhibition of interferon-mediated antiviral responses by influenza A viruses and other negative-strand RNA viruses. Virology 2001, 279, 375–384, doi:10.1006/viro.2000.0756.
[47]  Voinnet, O. Induction and suppression of RNA silencing: Insights from viral infections. Nat. Rev. Genet. 2005, 6, 206–220, doi:10.1038/nrg1555.
[48]  Nelson, M.I.; Holmes, E.C. The evolution of epidemic influenza. Nat. Rev. Genet. 2007, 8, 196–205, doi:10.1038/nrg2053.
[49]  Alexey, A.; Moelling, K. Dicer is involved in protection against influenza A virus infection. J. Gen. Virol. 2007, 88, 2627–2635, doi:10.1099/vir.0.83103-0.
[50]  Liu, D.; Shi, W.; Shi, Y.; Wang, D.; Xiao, H.; Li, W.; Bi, Y.; Wu, Y.; Li, X.; Yan, J.; et al. Origin and diversity of novel avian influenza A H7N9 viruses causing human infection: Phylogenetic, structural, and coalescent analyses. Lancet 2013, 381, 1926–1932, doi:10.1016/S0140-6736(13)60938-1.
[51]  Rabadan, R.; Levine, A.J.; Robins, H. Comparison of avian and human influenza A viruses reveals a mutational bias on the viral genomes. J. Virol. 2006, 80, 11887–11891, doi:10.1128/JVI.01414-06.
[52]  Berhane, Y.; Ojkic, D.; Neufeld, J.; Leith, M.; Hisanaga, T.; Kehler, H.; Ferencz, A.; Wojcinski, H.; Cottam-Birt, C.; Suderman, M.; et al. Molecular characterization of pandemic H1N1 influenza viruses isolated from turkeys and pathogenicity of a human pH1N1 isolate in turkeys. Avian Dis. 2010, 54, 1275–1285, doi:10.1637/9422-061410-Reg.1.
[53]  Abe, T.; Wada, K.; Iwasaki, Y.; Ikemura, T. Novel bioinformatics for inter- and intraspecies comparison of genome signatures in plant genomes. Plant Biotechnol. 2009, 26, 469–477, doi:10.5511/plantbiotechnology.26.469.
[54]  Uehara, H.; Iwasaki, Y.; Wada, C.; Ikemura, T.; Abe, T. A novel bioinformatics strategy for searching industrially useful genome resources from metagenomic sequence libraries. Genes Genet. Sys. 2011, 86, 53–66, doi:10.1266/ggs.86.53.
[55]  Abe, T.; Kanaya, S.; Uehara, H.; Ikemura, T. A novel bioinformatics strategy for function prediction of poorly-characterized protein genes obtained from metagenome analyses. DNA Res. 2009, 16, 287–298, doi:10.1093/dnares/dsp018.
[56]  Ikeda, S.; Abe, T.; Nakamura, Y.; Kibinge, N.; Hirai Morita, A.; Nakatani, A.; Ono, N.; Ikemura, T.; Nakamura, K.; Altaf-Ul-Amin, M.; et al. Systematization of the protein sequence diversity in enzymes related to secondary metabolic pathways in plants, in the context of big data biology inspired by the KNApSAcK Motorcycle database. Plant Cell Physiol. 2013, 54, 711–727, doi:10.1093/pcp/pct041.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133