全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

An Evaluation of an Integrated On-Chip/Off-Chip Network for High-Performance Reconfigurable Computing

DOI: 10.1155/2012/564704

Full-Text   Cite this paper   Add to My Lib

Abstract:

As the number of cores per discrete integrated circuit (IC) device grows, the importance of the network on chip (NoC) increases. However, the body of research in this area has focused on discrete IC devices alone which may or may not serve the high-performance computing community which needs to assemble many of these devices into very large scale, parallel computing machines. This paper describes an integrated on-chip/off-chip network that has been implemented on an all-FPGA computing cluster. The system supports MPI-style point-to-point messages, collectives, and other novel communication. Results include the resource utilization and performance (in latency and bandwidth). 1. Introduction In 2007 the Spirit cluster was constructed. It consists of 64 FPGAs (no discrete microprocessors) connected in a 3D torus. Although the first integrated on-chip/off-chip network for this machine was presented in 2009 [1], the design has evolved significantly. Adjustments to the router and shifts to standard interfaces appeared as additional applications were developed. This paper describes the current implementation and the experience leading up to the present design. Since the network has been implemented in the FPGA’s programmable logic, all of the data presented has been directly measured; that is, this is not a simulation nor emulation of an integrated on-chip/off-chip network. A fundamental question when this project began was whether the network performance would continue to scale as the number of nodes increased. In particular, there were three concerns. First, would the relatively slow embedded processor cores limit the effective transmission speed of individual links? Second, were there enough resources? (Other research has focused on mesh-connected networks rather than crossbars due to limited resources [2–5].) Third, would the on-chip and off-chip network bandwidths be balanced so one does not limit the other? Although some of the data presented here has appeared in publications related to different aspects of the project, the aim of this paper is to provide a comprehensive evaluation of the on-chip/off-chip network. The results are overwhelmingly positive, supporting the hypothesis that the current design is scalable. The rest of this paper is organized as follows. In the next section some related work is presented for on-chip networks. In Section 3 we describe the reconfigurable computing cluster project and the small-scale cluster, Spirit. Following that, in Section 4, the specifics of the present on-chip/off-chip network design are detailed. The next

References

[1]  A. G. Schmidt, W. V. Kritikos, R. R. Sharma, and R. Sass, “AIREN: a novel integration of on-chip and off-chipFPGA networks,” in Proceedings of the 17th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '09), April 2009.
[2]  B. Sethuraman, P. Bhattacharya, J. Khan, and R. Vemuri, “LiPaR: A light-weight parallel router for FPGA-based networks-on-chip,” in Proceedings of the ACM Great Lakessymposium on VLSI (GLSVLSI '05), pp. 452–457, April 2005.
[3]  S. Kumar, A. Jantsch, J. Soininen, et al., “A network on chiparchitecture and design methodology,” in Proceedings of the IEEE Computer Society Annual Symposium on (VLSI '02), vol. 102, pp. 117–124, 2002.
[4]  R. Gindin, I. Cidon, and I. Keidar, “NoC-based FPGA: architecture and routing,” in Proceedings of the 1st International Symposium on Networks-on-Chip (NOCS '07), pp. 253–262, May 2007.
[5]  G. Schelle and D. Grunwald, “Exploring FPGA network on chip implementations across various application and network loads,” in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL '08), pp. 41–46, September 2008.
[6]  W. J. Dally and B. Towles, “Route packets, not wires: On-chip interconnection networks,” in Proceedings of the 38th Design Automation Conference, pp. 684–689, June 2001.
[7]  T. Bjerregaard and S. Mahadevan, “A survey of research and practices of network-on-chip,” ACM Computing Surveys, vol. 38, no. 1, pp. 71–121, 2006.
[8]  I. Walter, I. Cidon, A. Kolodny, and D. Sigalov, “The era of many-modules soc: revisiting the noc mapping problem,” in Proceedings of the 2nd International Workshop on Networkon Chip Architectures, ACM, 2009.
[9]  G. Michelogiannakis, D. Sanchez, W. J. Dally, and C. Kozyrakis, “Evaluating bufferless flow control for on-chip networks,” in Proceedings of the 4th ACM/IEEE International Symposium on Networks on Chip (NOCS '10), pp. 9–16, May 2010.
[10]  R. Sass, W. V. Kritikos, A. G. Schmidt, et al., “Reconfigurable Computing Cluster (RCC) Project: investigating the feasibility of FPGA-based petascale computing,” in Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’07), IEEE Computer Society, April 2007.
[11]  P. Kogge, et al., “Exascale computing study: technologychallenges in achieving exascale systems,,” Tech. Rep. TR-2008-13, DARPA Information Processing Techniques Office (IPTO), 2008, http://www.cse.nd.edu/Reports/2008/TR-2008-13.pdf.
[12]  K. D. Underwood, W. B. Ligon III, and R. R. Sass, “An analysis of the cost effectiveness of an adaptable computing cluster,” Cluster Computing, vol. 7, pp. 357–371, 2004.
[13]  I. Kuon and J. Rose, “Measuring the gap between FPGAs and ASICs,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 26, no. 2, pp. 203–215, 2007.
[14]  D. A. Buell, J. M. Arnold, and W. J. Kleinfelder, Splash2: FPGAs in a Custom Computing Machine, Wiley-IEEE Computer Society Press, 1996.
[15]  D. Burke, J. Wawrzynek, K. Asanovic, et al., “RAMP blue:Implementation of a manycore 1008 processor system,” in Proceedings of the Reconfigurable Systems Summer Institute, 2008.
[16]  C. Pedraza, E. Castillo, J. Castillo, et al., “Cluster architecture based onlow cost reconfigurable hardware,” in Proceedings of the International Conference on Field-Programmable Logic and Applications, pp. 595–598, IEEE Computer Society, 2008.
[17]  M. Saldana and P. Chow, “TMD-MPI: an MPI implementationfor multiple processors across multiple FPGAs,” in Proceedings of the International Conference on Field-Programmable Logic and Applications, pp. 1–6, IEEE Computer Society, 2006.
[18]  A. P. Michael Showerman and J. Enos, “QP: a heterogeneousmulti-accelerator cluster,” in Proceedings of the International Conferenceon High-Performance Cluster Computing, 2010.
[19]  P. P. Kuen Hung Tsoi, A. Tse, and W. Luk, “Programming framework for clusters with heterogeneous accelerators,” in Proceedings of the International Workshop on Highly-Efficient Accelerators and Reconfigurable Technologies, 2010.
[20]  NSF Center for High Performance Reconfigurable Computing (CHREC), “NOVO-G: adaptivelycustom research supercomputer,” April 2005, http://www.xilinx.com/support/documentation/sw manuals/edk92i ppc405 isaext guide.pdf.
[21]  RAMP, “Research accelerator for multiple processors,” August 2008, http://ramp.eecs.berkeley.edu.
[22]  R. Baxter, S. Booth, M. Bull et al., “Maxwell—a 64 FPGA supercomputer,” in Proceedings of the 2nd NASA/ESA Conference on Adaptive Hardware and Systems (AHS '07), pp. 287–294, August 2007.
[23]  Xilinx, “ML410 Development Platform—User’s Guide UG085,” 2007.
[24]  Y. Rajasekhar, W. V. Kritikos, A. G. Schmidt, and R. Sass, “Teaching FPGA system design via a remote laboratory facility,” in Proceedings of the 18th Annual Conference onField Programmable Logic and Applications (FPL '08), pp. 687–690, IEEEComputer Society, September 2008.
[25]  Xilinx, “Local Link Interface Specification SP006 (v2.0),” July 2005.
[26]  Xilinx, “LogiCORE IP Aurora v3.0 Users Guide 61,” 2008.
[27]  R. G. Jaganathan, K. D. Underwood, and R. Sass, “A configurable network protocol for cluster based communications using modular hardware primitives on an intelligent NIC,” in Proceedings of the 11th Annual IEEE Symposiumon Field-Programmable Custom Computing Machines (FCCM '03), IEEE Computer Society, 2003.
[28]  K. D. Underwood, An evaluation of the integration ofreconfigurable hardware with the network interface in clustercomputer systems, Ph.D. thesis, Clemson University, August 2002.
[29]  S. Gao, A. G. Schmidt, and R. Sass, “Hardware implementation of MPI barrier on an FPGA cluster,” in Proceedings of the 19th International Conference on Field Programmable Logic and Applications (FPL '09), pp. 12–17, August 2009.
[30]  S. Gao, A. G. Schmidt, and R. Sass, “Impact of reconfigurable hardware on accelerating MPI_reduce,” in Proceedings of the International Conference on Field Programmable Technology (FPT '10), IEEE Computer Society, December 2010.
[31]  J. Liang, R. Tessier, and O. Mencer, “Floating point unitgeneration and evaluation for fpgas,” in Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, pp. 185–194, April 2003.
[32]  S. Chen, R. Venkatesan, and P. Gillard, “Implementation of Vector Floating-point processing Unit on FPGAs for high performance computing,” in Proceedings of the Canadian Conference on Electrical and Computer Engineering (CCECE '08), pp. 881–885, May 2008.
[33]  W. Ligon III, S. McMillan, G. Monn, et al., “A re-evaluation of the practicality of floating-point operations on fpgas,” in Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM ’98), p. 206, IEEE Computer Society, Washington, DC, USA, 1998.
[34]  M. J. Beauchamp, S. Hauck, K. D. Underwood, and K. S. Hemmert, “Embedded floating-point units in FPGAs,” in Proceedings of the 14th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '06), pp. 12–20, February 2006.
[35]  K. S. Hemmert and K. D. Underwood, “An analysis of the double-precision floating-point FFT on FPGAs,” in Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '05), pp. 171–180, April 2005.
[36]  X. Wang, S. Braganza, and M. Leeser, “Advanced componentsin the variable precision floating-point library,” in Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '06), pp. 249–258, IEEE Computer Society, Los Alamitos, Calif, USA, 2006.
[37]  A. G. Schmidt and R. Sass, “Characterizing effective memory bandwidth of designs with concurrent High-Performance Computing cores,” in Proceedings of the 17th International Conference on Field Programmable Logic and Applications (FPL '07), pp. 601–604, Amsterdam, The Netherlands, August 2007.
[38]  B. Huang, A. G. Schmidt, A. A. Mendon, and R. Sass, “Investigating resilient high performance reconfigurable computing with minimally-invasive system monitoring,” in Proceedings of the 4th International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA '10), IEEE Computer Society, November 2010.
[39]  A. G. Schmidt, B. Huang, and R. Sass, “Checkpoint/restart and beyond: resilient high performance computing with FPGAs,” in Proceedings of the 19th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’11), IEEE Computer Society, May 2011.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133