全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Do Peers See More in a Paper Than Its Authors?

DOI: 10.1155/2012/750214

Full-Text   Cite this paper   Add to My Lib

Abstract:

Recent years have shown a gradual shift in the content of biomedical publications that is freely accessible, from titles and abstracts to full text. This has enabled new forms of automatic text analysis and has given rise to some interesting questions: How informative is the abstract compared to the full-text? What important information in the full-text is not present in the abstract? What should a good summary contain that is not already in the abstract? Do authors and peers see an article differently? We answer these questions by comparing the information content of the abstract to that in citances—sentences containing citations to that article. We contrast the important points of an article as judged by its authors versus as seen by peers. Focusing on the area of molecular interactions, we perform manual and automatic analysis, and we find that the set of all citances to a target article not only covers most information (entities, functions, experimental methods, and other biological concepts) found in its abstract, but also contains 20% more concepts. We further present a detailed summary of the differences across information types, and we examine the effects other citations and time have on the content of citances. 1. Introduction Text mining research in biosciences is concerned with how to extract biologically interesting information from journal articles and other written documents. To date, much of biomedical text processing has been performed on titles, abstracts, and other metadata available for journal articles in PubMed1, as opposed to using full text. While the advantages of full text compared to abstracts have been widely recognized [1–5], until relatively recently, full text was rarely available online, and intellectual property constraints remain even to the present day. These latter constraints are loosening as open access (OA) publications are gaining popularity and online full text is gradually becoming the norm. This trend started in October 2006, when the Wellcome Trust2, a major UK funding body, changed the conditions of grants, requiring that “research papers partly or wholly funded by the Wellcome Trust must be made freely accessible via PubMed Central3 (PMC) (or UK PubMed Central once established) as soon as possible, and in any event no later than six months after publication” [6]. Canadian Institutes of Health Research followed, as did the National Institute of Health (NIH) in the USA in April 2008.4 Moreover, many publishers founded and promoted OA initiatives, namely, BioMed Central5 (BMC) and the Public Library of Science6

References

[1]  http://www.ncbi.nlm.nih.gov/pubmed/
[2]  http://www.wellcome.ac.uk/
[3]  http://www.ncbi.nlm.nih.gov/pmc/
[4]  http://grants.nih.gov/grants/guide/notice-files/NOT-OD-08-033.html
[5]  http://www.biomedcentral.com/
[6]  http://www.plos.org/
[7]  Our study also helps answer the question: what abstract claims are not (strongly) supported by the full text? We hypothesize that these would be those claims that are cited very infrequently or not cited at all, but a separate study is required to answer this question.
[8]  Note that here we assume that peers base their citations on full text and not only on the abstract. While this is a strong assumption, we believe that it generally holds in the research community. Our previous studies have shown that biomedical researchers like to verify reported results, for example, by looking at the methods that were used and by exploring the images and the tables in the full text. This has also motivated us to create a specialized search engine, the BioText Search Engine (http://biosearch.berkeley.edu/), for searching the figures and tables contained in open access journals, which is described in [54, 55].
[9]  CiteSeerX: http://citeseer.ist.psu.edu/
[10]  DBLP: http://www.informatik.uni-trier.de/~ley/db/
[11]  Google Scholar: http://scholar.google.com/
[12]  Microsoft Academic Search: http://academic.research.microsoft.com/
[13]  ACM Digital Library: http://dl.acm.org/
[14]  IEEE Xplore Digital Library: http://ieeexplore.ieee.org/Xplore/
[15]  ACL Anthology: http://aclweb.org/anthology-new/
[16]  ArnetMiner: http://arnetminer.org/
[17]  EMNLP 2009: http://conferences.inf.ed.ac.uk/emnlp09/
[18]  http://www.nlm.nih.gov/mesh/
[19]  http://discover.nci.nih.gov/mim/index.jsp
[20]  http://isiknowledge.com/
[21]  The data on the analysis considering the extended tree IDs can be found in the supplementary material available online at http://dx.doi.org/10.1155/2012/750214. The majority of results discussed in this paper refer to higher MeSH level annotation representing broader entities and concepts.
[22]  I. Mani and M. Maybury, Advances in Automatic Text Summarization, MIT Press, 1999.
[23]  H. Yu, V. Hatzivassiloglou, C. Friedman, A. Rzhetsky, and W. Wilbur, “Automatic extraction of gene and protein synonyms from MEDLINE and journal articles,” in Proceedings of the AMIA Symposium (AMIA '02), pp. 919–923, 2002.
[24]  P. K. Shah, C. Perez-Iratxeta, P. Bork, and M. A. Andrade, “Information extraction from full text scientific articles: where are the keywords?” BMC Bioinformatics, vol. 4, article 20, 2003.
[25]  M. J. Schuemie, M. Weeber, B. J. A. Schijvenaars et al., “Distribution of information in biomedical abstracts and full-text publications,” Bioinformatics, vol. 20, no. 16, pp. 2597–2604, 2004.
[26]  H. T. Dang, “Overview of DUC 2005,” in Proceedings of the HLT/EMNLP Workshop on Text Summarization DUC, 2005.
[27]  M. Walport and R. Kiley, “Open access, UK PubMed central and the wellcome trust,” Journal of the Royal Society of Medicine, vol. 99, no. 9, pp. 438–439, 2006.
[28]  K. B. Cohen, H. L. Johnson, K. Verspoor, C. Roeder, and L. E. Hunter, “The structural and content aspects of abstracts versus bodies of full text journal articles are different,” BMC Bioinformatics, vol. 11, article 492, 2010.
[29]  E. Garfield, “Can citation indexing be automated,” National Bureau of Standards Miscellaneous Publication, vol. 269, pp. 189–192, 1965.
[30]  M. Liu, “Progress in documentation. the complexities of citation practice: a review of citation studies,” Journal of Documentation, vol. 49, no. 4, pp. 370–408, 1993.
[31]  M. Moravcsik and P. Murugesan, “Some results on the function and quality of citations,” Social Studies of Science, vol. 5, pp. 86–92, 1975.
[32]  E. Garfield, “Citation indexes for science,” Science, vol. 122, no. 3159, pp. 108–111, 1955.
[33]  C. L. Giles, K. D. Bollacker, and S. Lawrence, “CiteSeer: an automatic citation indexing system,” in Proceedings of the 3rd ACM Conference on Digital Libraries, pp. 89–98, ACM Press, June 1998.
[34]  F. Menczer, “Correlated topologies in citation networks and the Web,” European Physical Journal B, vol. 38, no. 2, pp. 211–221, 2004.
[35]  M. E. J. Newman, “The structure of scientific collaboration networks,” Proceedings of the National Academy of Sciences of the United States of America, vol. 98, no. 2, pp. 404–409, 2001.
[36]  C. Duy, V. Hoang, and M.-Y. Kan, “Towards automated related work summarization,” in Proceedings of the 23rd International Conference on Computational Linguistics (COLING '10), pp. 427–435, Posters, 2010.
[37]  H. D. White, “Citation analysis and discourse analysis revisited,” Applied Linguistics, vol. 25, no. 1, pp. 89–116, 2004.
[38]  P. Nakov, A. Schwartz, and M. Hearst, “Citances: citation sentences for semantic analysis of bioscience text,” in Proceedings of the Workshop on Search and Discovery in Bioinformatics (SIGIR '04), 2004.
[39]  A. Elkiss, S. Shen, A. Fader, G. Erkan, D. States, and D. Radev, “Blind men and elephants: what do citation summaries tell us about a research article?” Journal of the American Society for Information Science and Technology, vol. 59, no. 1, pp. 51–62, 2008.
[40]  S. Mohammad, B. Dorr, M. Egan et al., “Using citations to generate surveys of scientific paradigms,” in Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL '09), pp. 584–592, Boulder, Colo, USA, 2009.
[41]  H. D. White and B. C. Griffith, “Author cocitation: a literature measure of intellectual structure,” Journal of the American Society for Information Science, vol. 32, no. 3, pp. 163–171, 1981.
[42]  A. Aris, B. Shneiderman, V. Qazvinian, and D. Radev, “Visual overviews for discovering key papers and influences across research fronts,” Journal of the American Society for Information Science and Technology, vol. 60, no. 11, pp. 2219–2228, 2009.
[43]  S. Teufel and M. Moens, “Summarizing scientific articles: experiments with relevance and rhetorical status,” Computational Linguistics, vol. 28, no. 4, pp. 409–445, 2002.
[44]  H. Nanba, N. Kando, and M. Okumura, “Classification of research papers using citation links and citation types: towards automatic review article generation,” in Proceedings of the American Society for Information Science SIG Classification Research Workshop: Classification for User Support and Learning, pp. 117–134, 2000.
[45]  S. Bradshaw, “Reference directed indexing: redeeming relevance for subject search in citation indexes,” in Proceedings of the 7th European Conference on Research and Advanced Technology for Digital Libraries, 2003.
[46]  R. Mercer and C. Di Marco, “A design methodology for a biomedical literature indexing tool using the rhetoric of science,” in Proceedings of the BioLink Workshop in Conjunction with NAACL/HLT, pp. 77–84, 2004.
[47]  I. Tbahriti, C. Chichester, F. Lisacek, and P. Ruch, “Using argumentation to retrieve articles with similar citations: an inquiry into improving related articles search in the MEDLINE digital library,” International Journal of Medical Informatics, vol. 75, no. 6, pp. 488–495, 2006.
[48]  B. Rosario and M. Hearst, “Multi-way relation classification: application to protein-protein interactions,” in Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT '05), 2005.
[49]  A. Kolchinsky, A. Abi-Haidar, J. Kaur, A. A. Hamed, and L. M. Rocha, “Classification of protein-protein interaction full-text documents using text and citation network features,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. 3, pp. 400–411, 2010.
[50]  B. Aljaber, N. Stokes, J. Bailey, and J. Pei, “Document clustering of scientific texts using citation contexts,” Information Retrieval, vol. 13, no. 2, pp. 101–131, 2010.
[51]  B. Aljaber, D. Martinez, N. Stokes, and J. Bailey, “Improving MeSH classification of biomedical articles using citation contexts,” Journal of Biomedical Informatics, vol. 44, no. 5, pp. 881–896, 2011.
[52]  W. Lehnert, C. Cardie, and E. Riloff, “Analyzing research papers using citation sentences,” in Proceedings of the 12th Annual Conference of the Cognitive Science Society, pp. 511–518, Lawrence Erlbaum Associates, 1990.
[53]  S. Teufel, A. Siddharthan, and D. Tidhar, “An annotation scheme for citation function,” in Proceedings of Sigdial-06, Sydney, Australia, 2006.
[54]  S. Teufel, A. Siddharthan, and D. Tidhar, “Automatic classification of citation function,” in Proceedings of EMNLP-06, Sydney, Australia, 2006.
[55]  S. Teufel and M. Y. Kan, “Robust argumentative zoning for sensemaking in scholarly documents,” in Advanced Language Technologies for Digital Libraries, vol. 6699 of Lecture Notes in Computer Science, pp. 154–170, Springer, Berlin, Germany, 2011.
[56]  C. Schwartz, A. Divoli, and M. Hearst, “Multiple alignment of citation sentences with conditional random fields and posterior decoding,” in Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL '07), pp. 847–857, 2007.
[57]  E. Amitay and C. Paris, “Automatically summarising web sites: is there a way around it,” in Proceedings of the 9th International Conference on Information and knowledge Management, pp. 173–179, ACM Press, 2000.
[58]  J. Y. Delort, B. Bouchon-Meunier, and M. Rifqi, “Enhanced web document summarization using hyperlinks,” in Proceedings of the 14th ACM Conference on Hypertext and Hypermedia, pp. 208–215, August 2003.
[59]  A. Schwartz and M. Hearst, “Summarizing key concepts using citation sentences,” in Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis (BioNLP '06), pp. 134–135, New York, NY, USA, 2006.
[60]  V. Qazvinian and D. Radev, “Scientific paper summarization using citation summary networks,” in Proceedings of the 22nd International Conference on Computational Linguistics (COLING '08), vol. 1, pp. 689–696, Manchester, UK, 2008.
[61]  Q. Mei and C. Zhai, “Generating impact-based summaries for scientific literature,” in Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL '08), pp. 816–824, Columbus, Ohio, USA, 2008.
[62]  V. Qazvinian and D. Radev, “Identifying non-explicit citing sentences for citation-based summarization,” in Proceedings of the 48th Annual Meeting of the Association for Computational Linguisticsproceedings of (ACL '10), pp. 555–564, 2010.
[63]  S. Wan, C. Paris, and R. Dale, “Whetting the appetite of scientists: producing summaries tailored to the citation context,” in Proceedings of theACM/IEEE Joint Conference on Digital Libraries (JCDL '09), pp. 59–68, June 2009.
[64]  N. Craswell, D. Hawking, and S. Robertson, “Effective site finding using link anchor information,” in Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 250–257, ACM Press, 2001.
[65]  S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. Kleinberg, “Automatic resource compilation by analyzing hyperlink structure and associated text,” in Proceedings of the 7th International Conference on World Wide Web 7, pp. 65–74, Elsevier Science Publishers B.V., 1998.
[66]  J. Fürnkranz, “Exploiting structural information for text classification on the www,” in Proceedings of the 3rd International Symposium on Advances in Intelligent Data Analysis, pp. 487–498, Springer, 1999.
[67]  J. Rennie and A. McCallum, “Using reinforcement learning to spider the web efficiently,” in Proceedings of the 16th International Conference on Machine Learning, pp. 335–343, Morgan Kaufmann Publishers, 1999.
[68]  M. Richardson and P. Domingos, “The intelligent surfer: probabilistic combination of link and content information in pagerank,” in Proceedings of the Advances in Neural Information Processing Systems, vol. 14, MIT Press, 2002.
[69]  G. Bhalotia, P. Nakov, A. Schwartz, and M. Hearst, “BioText team report for the TREC, 2003 Genomics track,” in Proceedings of the 13th Text REtrieval Conference (TREC '04), Gaithersburg, Md, USA, 2004.
[70]  P. Nakov and A. Divoli, “BioText report for the second BioCreAtIvE challenge,” in Proceedings of BioCreAtIvE II Workshop, Madrid, Spain, April 2007.
[71]  A. Ritchie, S. Teufel, and S. Robertson, “How to find better index terms through citations,” in Proceedings of the Workshop on How Can Computational Linguistics Improve Information Retrieval? pp. 25–32, Sydney, Australia, 2006.
[72]  D. Bergmark, “Automatic extraction of reference linking information from online documents,” Technical Report CSTR 2000-1821, Cornell Digital Library Research Group, 2000.
[73]  D. Bergmark, P. Phempoonpanich, and S. Zhao, “Scraping the ACM digital library,” SIGIR Forum, vol. 35, no. 2, pp. 1–7, 2001.
[74]  B. Powley and R. Dale, “Evidence-based information extraction for high-accuracy citation extraction and author name recognition,” in Proceedings of the 8th RIAO International Conference on Large-Scale Semantic Access to Content, 2007.
[75]  M. A. Hearst, A. Divoli, H. H. Guturu et al., “BioText Search Engine: beyond abstract search,” Bioinformatics, vol. 23, no. 16, pp. 2196–2197, 2007.
[76]  A. Divoli, M. A. Wooldridge, and M. A. Hearst, “Full text and figure display improves bioscience literature search,” PLoS ONE, vol. 5, no. 4, Article ID e9619, 2010.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413