全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

The “Chimera”: An Off-The-Shelf CPU/GPGPU/FPGA Hybrid Computing Platform

DOI: 10.1155/2012/241439

Full-Text   Cite this paper   Add to My Lib

Abstract:

The nature of modern astronomy means that a number of interesting problems exhibit a substantial computational bound and this situation is gradually worsening. Scientists, increasingly fighting for valuable resources on conventional high-performance computing (HPC) facilities—often with a limited customizable user environment—are increasingly looking to hardware acceleration solutions. We describe here a heterogeneous CPU/GPGPU/FPGA desktop computing system (the “Chimera”), built with commercial-off-the-shelf components. We show that this platform may be a viable alternative solution to many common computationally bound problems found in astronomy, however, not without significant challenges. The most significant bottleneck in pipelines involving real data is most likely to be the interconnect (in this case the PCI Express bus residing on the CPU motherboard). Finally, we speculate on the merits of our Chimera system on the entire landscape of parallel computing, through the analysis of representative problems from UC Berkeley’s “Thirteen Dwarves.” 1. Computationally Bound Problems in Astronomical Data Analysis Many of the great discoveries in astronomy from the last two decades resulted directly from breakthroughs in the processing of data from observatories. For example, the revelation that the Universe is expanding relied directly upon a newly automated supernova detection pipeline [1], and similar cases apply to the homogeneity of the microwave background [2] and strong evidence for the existence of dark matter and dark energy [3]. Most of these discoveries had a significant computational bound and would not have been possible without a breakthrough in data analysis techniques and/or technology. One is led to wonder the astounding discoveries that could be made without such a computational bound. Many observatories currently have “underanalyzed” datasets that await reduction but languish with a prohibitive computational bound. One solution to this issue is to make use of distributed computing, that is, the idle CPUs of networked participants, such as the SETI@HOME project [4]. It is clear that a number of common data analysis techniques are common across disciplines. For example, LIGO’s Einstein@HOME distributed computing project, designed to search gravitational wave data for spinning neutron stars, recently discovered three very unusual binary pulsar systems in Arecibo radio telescope data [5]. These are far from the only “underanalyzed” datasets from existing observatories, and this situation is expected to only compound as we look forward to an

References

[1]  B. P. Schmidt, N. B. Suntzeff, M. M. Phillips et al., “The high-Z supernova search: measuring cosmic deceleration and global curvature of the universe using type Ia supernovae,” Astrophysical Journal, vol. 507, no. 1, pp. 46–63, 1998.
[2]  G. F. Smoot, C. L. Bennett, A. Kogut et al., “Structure in the COBE differential microwave radiometer first-year maps,” Astrophysical Journal, vol. 396, no. 1, pp. L1–L5, 1992.
[3]  D. N. Spergel, L. Verde, H. V. Peiris et al., “First-year Wilkinson Microwave Anisotropy Probe (WMAP) observations: determination of cosmological parameters,” Astrophysical Journal, Supplement Series, vol. 148, no. 1, pp. 175–194, 2003.
[4]  University of California, 2011, http://setiathome.berkeley.edu/.
[5]  B. Knispel, B. Allen, J. M. Cordes et al., “Pulsar discovery by global volunteer computing,” Science, vol. 329, no. 5997, p. 1305, 2010.
[6]  T. Cornwell and B. Humphreys, “Data processing for ASKAP and SKA,” 2010, http://www.atnf.csiro.au/people/tim.cornwell/presentations/nzpathwaysfeb2010.pdf.
[7]  M. Hilbert and P. López, “The world's technological capacity to store, communicate, and compute information,” Science, vol. 332, no. 6025, pp. 60–65, 2011.
[8]  H. Meuer, E. Strohmaier, J. Dongarra, and H. Simon, “Top 500 supercomputers,” 2011, http://www.top500.org.
[9]  LSST Corporation, “Large synoptic survey telescope,” 2011, http://www.lsst.org/lsst/.
[10]  The Australian National University, “The SkyMapper survey telescope,” 2010, http://msowww.anu.edu.au/skymapper/.
[11]  Pico Computing, Inc., “Using FPGA clusters for fast password recovery,” 2010, http://www.scribd.com/doc/26191199/Using-FPGA-Clusters-for-Fast-Password-Recovery.
[12]  C. Duhigg, “Stock traders find speed pays, in milliseconds,” 2009, http://www.nytimes.com/2009/07/24/business/24trading.html.
[13]  J. Ericson, “Processing Avatar,” 2009, http://www.information-management.com/newsletters/avatar_data_processing-10016774-1.html.
[14]  J. Jackson, “Supercomputing top500 brews discontent,” 2010, http://www.pcworld.idg.com.au/article/368598/supercomputing_top500_brews_discontent/.
[15]  N. Hasasneh, I. Bell, C. Jesshope, W. Grass, B. Sick, and K. Waldschmidt, “Scalable and partitionable asynchronous arbiter for micro-threaded chip multiprocessors,” in Proceedings of the 19th International Conference on Architecture of Computing Systems (ARCS '06), W. Grass, B. Sick, and K. Waldschmidt, Eds., vol. 3894 of Lecture Notes in Computer Science, pp. 252–267, Frankfurt, Germany, 2006.
[16]  P. E. Ross, “A computer for the clouds,” 2008, http://spectrum.ieee.org/computing/hardware/a-computer-for-the-clouds.
[17]  M. Wehner, L. Oliker, and J. Shalf, “Low-power supercomputers (ieee spectrum),” 2009, http://spectrum.ieee.org/computing/embedded-systems/lowpower-supercomputers.
[18]  K. Underwood, “FPGAs vs. CPUs: trends in peak floating-point performance,” in Proceedings of the 12th International Symposium on Field-Programmable Gate Arrays (FPGA '04), pp. 171–180, New York, NY, USA, February 2004.
[19]  S. K. Chung, L. Wen, D. Blair, K. Cannon, and A. Datta, “Application of graphics processing units to search pipelines for gravitational waves from coalescing binaries of compact objects,” Classical and Quantum Gravity, vol. 27, no. 13, Article ID 135009, 2010.
[20]  B. R. Barsdell, D. G. Barnes, and C. J. Fluke, “Analysing astronomy algorithms for graphics processing units and beyond,” Monthly Notices of the Royal Astronomical Society, vol. 408, no. 3, pp. 1936–1944, 2010.
[21]  M. Bergano, F. Fernandes, L. Cupido et al., “Digital complex correlator for a C-band polarimetry survey,” Experimental Astronomy, vol. 30, no. 1, pp. 23–37, 2011.
[22]  L. De Souza, J. D. Bunton, D. Campbell-Wilson, R. J. Cappallo, and B. Kincaid, “A radio astronomy correlator optimized for the Xilinx Virtex-4 SX FPGA,” in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL '07), pp. 62–67, August 2007.
[23]  B. Klein, S. D. Philipp, I. Kr?mer, C. Kasemann, R. Güsten, and K. M. Menten, “The APEX digital fast fourier transform spectrometer,” Astronomy and Astrophysics, vol. 454, no. 2, pp. L29–L32, 2006.
[24]  J. Frigo, D. Palmer, M. Gokhale, and M. Popkin Paine, “Gamma-ray pulsar detection using reconfigurable computing hardware,” in Proceedings of the 11th IEEE Symposium Field-Programmable Custom Computing Machines, pp. 155–161, Washington, DC, USA, 2003.
[25]  F. Belletti, M. Guidetti, A. Maiorano et al., “Janus: an FPGA-based system for high-performance scientific computing,” Computing in Science and Engineering, vol. 11, no. 1, Article ID 4720223, pp. 48–58, 2009.
[26]  T. Xiang and B. Khaled, “High-performance quasi-Monte Carlo financial simulation: FPGA vs. GPP vs. GPU,” in Proceedings of the ACM Transactions on Reconfigurable Technology and Systems, vol. 3, pp. 26:1–26:22, New York, NY, USA, 2010.
[27]  M. Awad, “FPGA supercomputing platforms: a survey,” in Proceedings of the 19th International Conference on Field Programmable Logic and Applications (FPL '09), pp. 564–568, Prague, Czech Republic, 2009.
[28]  D. B. Thomas, L. Howes, and W. Luk, “A comparison of CPUs, GPUs, FPGAs, and massively processor arrays for random number generation,” in Proceedings of the 7th ACM SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '09), pp. 63–72, Monterey, Calif, USA, 2009.
[29]  B. Cope, P. Y. K. Cheung, W. Luk, and S. Witt, “Have GPUs made FPGAs redundant in the field of video processing?” in Proceeding of the IEEE International Conference on Field Programmable Technology, vol. 1, pp. 111–118, December 2005.
[30]  B. Cope, “Implementation of 2D convolution on FPGA, GPU and CPU,” Tech. Rep., Imperial College, London, UK, 2006.
[31]  D. H. Jones, A. Powell, C. -S. Bouganis, and P. Y.K. Cheung, “GPU versus FPGA for high productivity computing,” in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL '10), pp. 119–124, 2010.
[32]  D. Yang, J. Sun, J. Lee et al., “Performance comparison of cholesky decomposition on GPUs and FPGAs,” in Proceedings of the Symposium Application Accelerators in High Performance Computing (SAAHPC '10), Knoxville, Tenn, USA, 2010.
[33]  S. Che, J. Li, J. W. Sheaffer, K. Skadron, and J. Lach, “Accelerating compute-intensive applications with GPUs and FPGAs,” in Proceedings of the Symposium on Application Specific Processors (SASP '08), pp. 101–107, Anaheim, Calif, USA, 2008.
[34]  S. J. Park, D. R. Shires, and B. J. Henz, “Coprocessor computing with FPGA and GPU,” in Proceedings of the Department of Defense High Performance Computing Modernization Program: Users Group Conference—Solving the Hard Problems, pp. 366–370, Seattle, Wash, USA, 2008.
[35]  R. Inta and D. J. Bowman, “An FPGA/GPU/CPU hybrid platform for solving hard computational problems,” in Proceedings of the eResearch Australasia, Gold Coast, Australia, 2010.
[36]  M. Showerman, J. Enos, A. Pant et al., “QP: a heterogeneous multi-accelerator cluster,” in Proceedings of the 10th LCI International Conference on High-Performance Cluster Computing, vol. 7800, pp. 1–8, Boulder, Colo, USA, 2009.
[37]  K. H. Tsoi and W. Luk, “Axel: a heterogeneous cluster with FPGAs and GPUs,” in Proceedings of the International Symposium on Field Programmable Gate Arrays (FPGA '01), pp. 115–124, Monterey, Calif, USA, 2010.
[38]  E. J. Kelmelis, J. P. Durbano, J. R. Humphrey, F. E. Ortiz, and P. F. Curt, “Modeling and simulation of nanoscale devices with a desktop supercomputer,” in Proceedings of the Nanomodeling II, vol. 6328, p. 62270N, 2006.
[39]  W. Kastl and T. Loimayr, “A parallel computing system with specialized coprocessors for cryptanalytic algorithms,” in Proceedings of the Sicherheit, pp. 73–83, Berlin, Germany, 2010.
[40]  K. D. Underwood and K. S. Hemmert, Reconfigurable Computing: The Theory and Practice of FPGA-Based Computing, chapter 31, Morgan Kaufmann Publishers, Burlington, Mass, USA, 2008.
[41]  K. Asanovic, “IEEE standard for binary floating-Point Arithmetic,” Tech. Rep. ANSI/IEEE Std., IEEE Standards Board, The Institute of Electrical and Electronics, 1985.
[42]  M. Langhammer, “Floating point datapath synthesis for FPGAs,” in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL '08), pp. 355–360, Heidelberg, Germany, 2008.
[43]  S. A. T. W. H. Press, W. T. Vettering, and B. P. Flannery, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, New York, NY, USA, 2nd edition, 1997.
[44]  P. J. Napier, D. S. Bagri, B. G. Clark et al., “Very long baseline array,” Proceedings of the IEEE, vol. 82, no. 5, pp. 658–672, 1994.
[45]  D. Bowman, M. Tahtali, and A. Lambert, “Rethinking image registration on customizable hardware,” in Image Reconstruction from Incomplete Data VI, vol. 7800 of Proceedings of SPIE, San Diego, Calif, USA, 2010.
[46]  LIGO Scientific Collaboration, 2010, http://www.ligo.caltech.edu/.
[47]  Virgo Scientific Collaboration, 2010, http://www.virgo.infn.it/.
[48]  P. K. Patel, Search for gravitational waves from a nearby neutron star using barycentric resampling, Ph.D. thesis, California Institute of Technology, Pasadena, Calif, USA, 2011.
[49]  D. Llamocca, M. Pattichis, and G. A. Vera, “Partial reconfigurable FIR filtering system using distributed arithmetic,” International Journal of Reconfigurable Computing, vol. 2010, Article ID 357978, 14 pages, 2010.
[50]  E. J. Candès, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Communications on Pure and Applied Mathematics, vol. 59, no. 8, pp. 1207–1223, 2006.
[51]  O. Maslennikow, V. Lepekha, A. Sergiyenko, A. Tomas, and R. Wyrzykowski, Parallel Implementation of Cholesky LLT—Algorithm in FPGA-Based Processor, Springer, Berlin, Germany, 2008.
[52]  D. Yang, H. Li, G. D. Peterson, and A. Fathy, “Compressed sensing based UWB receiver: hardware compressing and FPGA reconstruction,” in Proceedings of the 43rd Annual Conference on Information Sciences and Systems (CISS '09), pp. 198–201, Baltimore, Md, USA, 2009.
[53]  A. Septimus and R. Steinberg, “Compressive sampling hardware reconstruction,” in Proceedings of the IEEE International Symposium on Circuits and Systems: Nano-Bio Circuit Fabrics and Systems (ISCAS '10), pp. 3116–3119, Paris, France, 2010.
[54]  P. Colella, Defining Software Requirements for Scientific Computing, DARPA HPCS, 2004.
[55]  K. Asanovic and U C Berkeley Computer Science Deptartment, “The landscape of parallel computing research: a view from Berkeley,” Tech. Rep. UCB/EECS-2006-183, UC Berkeley, 2005.
[56]  J. Barnes and P. Hut, “A hierarchical O(N log N) force-calculation algorithm,” Nature, vol. 324, no. 6096, pp. 446–449, 1986.
[57]  R. Palmer, et al., “Parallel dwarfs,” 2011, http://paralleldwarfs.codeplex.com/.
[58]  S. Che, M. Boyer, J. Meng et al., “Rodinia: a benchmark suite for heterogeneous computing,” in Proceedings of the IEEE International Symposium on Workload Characterization (IISWC '09), pp. 44–54, Austin, Tex, USA, October 2009.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413