全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Exploring Biomolecular Literature with EVEX: Connecting Genes through Events, Homology, and Indirect Associations

DOI: 10.1155/2012/582765

Full-Text   Cite this paper   Add to My Lib

Abstract:

Technological advancements in the field of genetics have led not only to an abundance of experimental data, but also caused an exponential increase of the number of published biomolecular studies. Text mining is widely accepted as a promising technique to help researchers in the life sciences deal with the amount of available literature. This paper presents a freely available web application built on top of 21.3 million detailed biomolecular events extracted from all PubMed abstracts. These text mining results were generated by a state-of-the-art event extraction system and enriched with gene family associations and abstract generalizations, accounting for lexical variants and synonymy. The EVEX resource locates relevant literature on phosphorylation, regulation targets, binding partners, and several other biomolecular events and assigns confidence values to these events. The search function accepts official gene/protein symbols as well as common names from all species. Finally, the web application is a powerful tool for generating homology-based hypotheses as well as novel, indirect associations between genes and proteins such as coregulators. 1. Introduction The field of natural language processing for biomolecular texts (BioNLP) aims at large-scale text mining in support of life science research. Its primary motivation is the enormous amount of available scientific literature, which makes it essentially impossible to rapidly gain an overview of prior research results other than in a very narrow domain of interest. Among the typical use cases for BioNLP applications are support for database curation, linking experimental data with relevant literature, content visualization, and hypothesis generation—all of these tasks require processing and summarizing large amounts of individual research articles. Among the most heavily studied tasks in BioNLP is the extraction of information about known associations between biomolecular entities, primarily genes, and gene products, and this task has recently seen much progress in two general directions. First, relationships between biomolecular entities are now being extracted in much greater detail. Until recently, the focus was on extracting untyped and undirected binary relations which, while stating that there is some relationship between two objects, gave little additional information about the nature of the relationship. Recognizing that extracting such relations may not provide sufficient detail for wider adoption of text mining in the biomedical community, the focus is currently shifting towards a more

References

[1]  J.-D. Kim, T. Ohta, S. Pyysalo, Y. Kano, and J. Tsujii, “Overview of BioNLP'09 shared task on event extraction,” in Proceedings of the BioNLP Workshop Companion Volume for Shared Task, pp. 1–9, Association for Computational Linguistics, 2009.
[2]  J.-D. Kim, S. Pyysalo, T. Ohta, R. Bossy, N. Nguyen, N. Nguyen, and J. Tsujii, “Overview of BioNLP shared task 2011,” in Proceedings of the BioNLP Workshop Companion Volume for Shared Task, pp. 1–6, Association for Computational Linguistics, 2011.
[3]  R. Homann and A. Valencia, “A gene network for navigating the literature,” Nature Genetics, vol. 36, no. 7, aricle 664, 2004.
[4]  T. Ohta, Y. Miyao, T. Ninomiya, et al., “An intelligent search engine and GUI-based efficient MEDLINE search tool based on deep syntactic parsing,” in Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, pp. 17–20, Association for Computational Linguistics, 2006.
[5]  D. Rebholz-Schuhmann, H. Kirsch, M. Arregui, S. Gaudan, M. Riethoven, and P. Stoehr, “EBIMed—text crunching to gather facts for proteins from Medline,” Bioinformatics, vol. 23, no. 2, pp. e237–e244, 2007.
[6]  M. A. Hearst, A. Divoli, H. H. Guturu et al., “BioText search engine: beyond abstract search,” Bioinformatics, vol. 23, no. 16, pp. 2196–2197, 2007.
[7]  S. Xu, J. McCusker, and M. Krauthammer, “Yale Image Finder (YIF): a new search engine for retrieving biomedical images,” Bioinformatics, vol. 24, no. 17, pp. 1968–1970, 2008.
[8]  S. Agarwal, H. Yu, and I. Kohane, “BioNOT: a searchable database of biomedical negated sentences,” BMC Bioinformatics, vol. 12, Article ID 420, 2011.
[9]  J. Bj?rne, F. Ginter, S. Pyysalo, J. Tsujii, and T. Salakoski, “Scaling up biomedical event extraction to the entire PubMed,” in Proceedings of the BioNLP Workshop Companion Volume for Shared Task, pp. 28–36, Association for Computational Linguistics, 2010.
[10]  S. Van Landeghem, F. Ginter, Y. Van de Peer, and T. Salakoski, “EVEX: a PubMed-scale resource for homology-based generalization of text mining predictions,” in Proceedings of the BioNLP Workshop Companion Volume for Shared Task, pp. 28–37, Association for Computational Linguistics, 2011.
[11]  R. Leaman and G. Gonzalez, “BANNER: an executable survey of advances in biomedical named entity recognition,” Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, pp. 652–663, 2011.
[12]  J. Bj?rne, F. Ginter, and T. Salakoski, “Generalizing biomedical event extraction,” BMC Bioinformatics, vol. 13, supplement 8, article S4, 2012.
[13]  The UniProt Consortium, “Ongoing and future developments at the universal protein resource,” Nucleic Acids Research, vol. 39, supplement 1, pp. D214–D219, 2011.
[14]  E. W. Sayers, T. Barrett, D. A. Benson et al., “Database resources of the National Center for Biotechnology Information,” Nucleic Acids Research, vol. 38, supplement 1, pp. D5–D16, 2009.
[15]  P. Flicek, M. R. Amode, D. Barrell et al., “Ensembl 2011,” Nucleic Acids Research, vol. 39, no. 1, pp. D800–D806, 2011.
[16]  P. J. Kersey, D. Lawson, E. Birney et al., “Ensembl genomes: extending ensembl across the taxonomic space,” Nucleic Acids Research, vol. 38, supplement 1, pp. D563–D569, 2009.
[17]  K. Crammer and Y. Singer, “Ultraconservative online algorithms for multiclass problems,” Journal of Machine Learning Research, vol. 3, no. 4-5, pp. 951–991, 2003.
[18]  E. Segal, M. Shapira, A. Regev et al., “Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data,” Nature Genetics, vol. 34, no. 2, pp. 166–176, 2003.
[19]  J. Bj?rne, F. Ginter, S. Pyysalo, J. Tsujii, and T. Salakoski, “Complex event extraction at PubMed scale,” Bioinformatics, vol. 26, no. 12, Article ID btq180, pp. i382–i390, 2010.
[20]  S. Kaewphan, S. Kreula, S. Van Landeghem, Y. Van de Peer, P. Jones, and F. Ginter, “Integrating large-scale text mining and co-expression networks: targeting NADP(H) metabolism in E. coli with event extraction,” in Proceedings of the 3rd Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM '12), 2012.
[21]  T. Ohta, S. Pyysalo, and J. Tsujii, “From pathways to biomolecular events: opportunities and challenges,” in Proceedings of the BioNLP Workshop Companion Volume for Shared Task, pp. 105–113, Association for Computational Linguistics, 2011.
[22]  J. A. Carballo and R. S. Cha, “Meiotic roles of Mec1, a budding yeast homolog of mammalian ATR/ATM,” Chromosome Research, vol. 15, no. 5, pp. 539–550, 2007.
[23]  Y. Loewenstein, D. Raimondo, O. C. Redfern et al., “Protein function annotation by homology-based inference,” Genome Biology, vol. 10, no. 2, article 207, 2009.
[24]  S. Proost, M. Van Bel, L. Sterck et al., “PLAZA: a comparative genomics resource to study gene and genome evolution in plants,” Plant Cell, vol. 21, no. 12, pp. 3718–3731, 2009.
[25]  R. Kato and H. Ogawa, “An essential gene, ESR1, is required for mitotic cell growth, DNA repair and meiotic recombination in Saccharomyces cerevisiae,” Nucleic Acids Research, vol. 22, no. 15, pp. 3104–3112, 1994.
[26]  P. Stenetorp, G. Topi?, S. Pyysalo, T. Ohta, J.-D. Kim, and J. Tsujii, “BioNLP Shared Task 2011: supporting resources,” in Proceedings of the BioNLP Workshop Companion Volume for Shared Task, pp. 112–120, Portland, Oregon, USA, 2011.
[27]  J. Bj?rne and T. Salakoski, “Generalizing biomedical event extraction,” in Proceedings of the BioNLP Workshop Companion Volume for Shared Task, pp. 183–191, Association for Computational Linguistics, 2011.
[28]  S. Pyysalo, T. Ohta, and J. Tsujii, “Overview of the entity relations (REL) supporting task of BioNLP Shared Task 2011,” in Proceedings of the BioNLP Workshop Companion Volume for Shared Task, pp. 83–88, Association for Computational Linguistics, 2011.
[29]  S. Van Landeghem, J. Bj?rne, T. Abeel, B. De Baets, T. Salakoski, and Y. Van de Peer, “Semantically linking molecular entities in literature through entity relationships,” BMC Bioinformatics, vol. 13, supplement 8, article S6, 2012.
[30]  Z. Lu, H. Y. Kao, C. H. Wei, et al., “The gene normalization task in BioCreative III,” BMC Bioinformatics, vol. 12, supplement 8, article S2, 2011.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413