全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Interlinking Developer Identities within and across Open Source Projects: The Linked Data Approach

DOI: 10.1155/2013/584731

Full-Text   Cite this paper   Add to My Lib

Abstract:

Software developers use various software repositories in order to interact with each other or to solve related problems. These repositories provide a rich source of information for a wide range of tasks. However, one issue to overcome in order to make this information useful is the identification and interlinking of multiple identities of developers. In this paper, we propose a Linked Data-based methodology to interlink and integrate multiple identities of a developer found in different software repositories of a project as well as across repositories of multiple projects. Providing such interlinking will enable us to keep track of a developer’s activity not only within a single project but also across multiple projects. The methodology will be presented in general and applied to 5 Apache projects as a case study. Further, we show that the few methods suggested so far are not always appropriate to overcome the developer identification problem. 1. Introduction and Motivation In Software Engineering, many tools with underlying repositories have been introduced to support the collaboration and coordination in distributed software development. Research has shown that these software repositories contain rich amount of information about software projects. By mining the information contained in these software repositories, practitioners can depend less on their experience and more on the historical data [1]. However, software repositories are commonly used only as record-keeping repositories and rarely for design decision processes [2]. Examples of software repositories are [3] source control repositories, bug repositories, archived communication, and so forth. Developers (we will use the term “developer” to represent the core developers, contributors, bug reporters and users of an open source project) use these repositories to interact with each other or to solve software-related problems. By extracting rich information from these repositories, one can guide decision processes in modern software development. For example, source code and bugs are quite often discussed on bug repositories and project mailing lists. Data in these software repositories could be analyzed to extract bug and source code related discussions, which could be linked to the actual bug description and source code. This could allow keeping track of developers discussion related to a bug or source code in different software repositories. Developers are required to adopt an identity for each software repository they want to use. For example, they are required to adopt an email address in

References

[1]  A. E. Hassan, “The road ahead for mining software repositories,” in Proceedings of the 16th Frontiers of Software Maintenance (FoSM '08), pp. 48–57, October 2008.
[2]  S. Diehl, H. C. Gall, and A. E. Hassan, “Guest editors introduction: special issue on mining software repositories,” Empirical Software Engineering, vol. 14, no. 3, pp. 257–261, 2009.
[3]  A. E. Hassan, A. Mockus, R. C. Holt, and P. M. Johnson, “Guest editors' introduction: special issue on mining software repositories,” IEEE Transactions on Software Engineering, vol. 31, no. 6, pp. 426–428, 2005.
[4]  G. Robles and J. M. Gonzalez-Barahona, “Developer identification methods for integrated data from various sources,” SIGSOFT Software Engineering Notes, vol. 30, no. 4, pp. 1–5, 2005.
[5]  S. F. De Sousa, M. A. Balieiro, J. M. Dos, and C. R. B. DeSouza, “Multiple social networks analysis of FLOSS projects using Sargas,” in Proceedings of the 42nd Annual Hawaii International Conference on System Sciences (HICSS '09), January 2009.
[6]  A. Meneely, L. Williams, W. Snipes, and J. Osborne, “Predicting failures with developer networks and social network analysis,” in Proceedings of the 16th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (SIGSOFT '08), pp. 13–23, ACM, New York, NY, USA, November 2008.
[7]  G. Madey, V. Freeh, and R. Tynan, “The open source software development phenomenon: an analysis based on social network theory,” in Proceedings of the Americas Conference on Information Systems (AMCIS '02), pp. 1806–1813, 2002.
[8]  S. Christley and G. Madey, “Analysis of activity in the Open Source Software development community,” in Proceedings of the 40th Annual Hawaii International Conference on System Sciences (HICSS '07), IEEE Computer Society, Washington, DC, USA, January 2007.
[9]  C. Bird, A. Gourley, P. Devanbu, M. Gertz, and A. Swaminathan, “Mining email social networks,” in Proceedings of the International Workshop on Mining Software Repositories (MSR ’06), pp. 137–143, 2006.
[10]  M. Conklin, “Project entity matching across FLOSS repositories,” IFIP International Federation for Information Processing, vol. 234, pp. 45–57, 2007.
[11]  H. Kopcke and E. Rahm, “Frameworks for entity matching: a comparison,” Data & Knowledge Engineering, vol. 69, no. 2, pp. 197–210, 2010.
[12]  F. Naumann and M. Herschel, “An introduction to duplicate detection,” Synthesis Lectures on Data Management, vol. 2, no. 1, pp. 1–87, 2010.
[13]  J. Volz, C. Bizer, M. Gaedke, and G. Kobilarov, “Silk a link discovery framework for the web of data,” in Proceedings of the 2nd Workshop about Linked Data on the Web (LDOW '09), Madrid, Spain, 2009.
[14]  A. Iqbal, O. Ureche, M. Hausenblas, and G. Tummarello, “LD2SD: linked data driven software development,” in Proceedings of the 21st International Conference on Software Engineering and Knowledge Engineering (SEKE '09), pp. 240–245, July 2009.
[15]  G. Klyne, J. J. Carroll, and B. McBride, “Resource Description Framework (RDF): Concepts and Abstract Syntax),” W3C Recommendation 10 February 2004, RDF Core Working Group, 2004.
[16]  T. Heath and C. Bizer, “Linked data: evolving the web into a global data space,” Synthesis Lectures on the Semantic Web: Theory and Technology, vol. 1, no. 1, pp. 1–136, 2011.
[17]  http://ant.apache.org/.
[18]  http://hadoop.apache.org/.
[19]  http://logging.apache.org/.
[20]  http://lucene.apache.org/.
[21]  http://maven.apache.org/.
[22]  E. Ukkonen, “Algorithms for approximate string matching,” Information and Control, vol. 64, no. 1–3, pp. 100–118, 1985.
[23]  A. Iqbal and M. Hausenblas, “Integrating developer-related information across open source repositories,” in Proceedings of the IEEE 13th International Conference on Information Reuse and Integration (IRI '12), 2012.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413