%0 Journal Article
%T A P2P Framework for Developing Bioinformatics Applications in Dynamic Cloud Environments
%A Chun-Hung Richard Lin
%A Chun-Hao Wen
%A Ying-Chih Lin
%A Kuang-Yuan Tung
%A Rung-Wei Lin
%A Chun-Yuan Lin
%J International Journal of Genomics
%D 2013
%I Hindawi Publishing Corporation
%R 10.1155/2013/361327
%X Bioinformatics is advanced from in-house computing infrastructure to cloud computing for tackling the vast quantity of biological data. This advance enables large number of collaborative researches to share their works around the world. In view of that, retrieving biological data over the internet becomes more and more difficult because of the explosive growth and frequent changes. Various efforts have been made to address the problems of data discovery and delivery in the cloud framework, but most of them suffer the hindrance by a MapReduce master server to track all available data. In this paper, we propose an alternative approach, called PRKad, which exploits a Peer-to-Peer (P2P) model to achieve efficient data discovery and delivery. PRKad is a Kademlia-based implementation with Round-Trip-Time (RTT) as the associated key, and it locates data according to Distributed Hash Table (DHT) and XOR metric. The simulation results exhibit that our PRKad has the low link latency to retrieve data. As an interdisciplinary application of P2P computing for bioinformatics, PRKad also provides good scalability for servicing a greater number of users in dynamic cloud environments. 1. Introduction Today new technologies in genomics/proteomics generate biological data with an exponential growth. Current Next Generation Sequencing (NGS) technologies can produce gigabase-scales of DNA and RNA sequencing data within a day at a reasonable cost [1每3]. Cloud computing has been regarded as a key approach for processing such a planet-size data, and hence, many bioinformatics applications have been migrated to the cloud environments [4每7]. Bioinformatics clouds are heavily dependent on data, as data are fundamentally crucial for receiving biological insights. The analyses are commonly based on the extensive and repeated use of comparative parallel process via Data-as-a-Service (DaaS) on the web [8每10], most notably in the gene expression analysis. The data are likely to be updated constantly. The sources and users of the data would be connected by various devices over the internet. The effectiveness for locating the deluged data in cloud computing is often overlooked, but it is a key problem. From the aspect of retrieving the up-to-date data with less complexity and delay, we settled the existing problems in data discovery. Along these lines, the high computing ability of P2P framework is adopted as a dynamic cloud infrastructure to resolve the challenge caused by massive datasets [11每13]. Bioinformatics usually requires the collection, organization, and analysis of large
%U http://www.hindawi.com/journals/ijg/2013/361327/