全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Performance Modeling for FPGAs: Extending the Roofline Model with High-Level Synthesis Tools

DOI: 10.1155/2013/428078

Full-Text   Cite this paper   Add to My Lib

Abstract:

The potential of FPGAs as accelerators for high-performance computing applications is very large, but many factors are involved in their performance. The design for FPGAs and the selection of the proper optimizations when mapping computations to FPGAs lead to prohibitively long developing time. Alternatives are the high-level synthesis (HLS) tools, which promise a fast design space exploration due to design at high-level or analytical performance models which provide realistic performance expectations, potential impediments to performance, and optimization guidelines. In this paper we propose the combination of both, in order to construct a performance model for FPGAs which is able to visually condense all the helpful information for the designer. Our proposed model extends the roofline model, by considering the resource consumption and the parameters used in the HLS tools, to maximize the performance and the resource utilization within the area of the FPGA. The proposed model is applied to optimize the design exploration of a class of window-based image processing applications using two different HLS tools. The results show the accuracy of the model as well as its flexibility to be combined with any HLS tool. 1. Introduction Field programmable gate arrays (FPGAs) are a programmable and massively parallel architecture offering great performance potential for computing intensive applications. For applications with insufficient performance on a multicore CPU, FPGAs are a promising solution. However, the design effort for FPGAs requires detailed knowledge of hardware and significant time consumption. A performance analysis is required in order to estimate the achievable level of performance for a particular application, even before starting the implementation. Moreover, these models identify potential bottlenecks, the most appropriate optimizations, and maximum peaks of performance. On the other hand, the new generation of high-level synthesis (HLS) tools promised to reduce the development time and to automate the compilation and synthesis flow for FPGAs. By designing at high level, using C/C++ or even OpenCL languages, the compilers are able to generate parallel implementations of loops, containing large number of operations with limited data dependencies. Also, much of the debugging and verification can be performed at a high level rather than at the RTL code level, offering a faster design space exploration (DSE). The purpose of this work is to present an insight model, in a similar fashion to the roofline model proposed by William et al. [1], where the

References

[1]  S. Williams, A. Waterman, and D. Patterson, “Roofline: an insightful visual performance model for multicore architectures,” Communications of the ACM, vol. 52, no. 4, pp. 65–76, 2009.
[2]  Y. Sato, R. Nagaoka, A. Musa et al., “Performance tuning and analysis of future vector processors based on the roofline model,” in Proceedings of the 10th Workshop on MEmory Performance: DEaling with Applications, Systems and Architecture (MEDEA '09), pp. 7–14, ACM, September 2009.
[3]  M. Reichenbach, M. Schmidt, and D. Fey, “Analytical model for the optimization of self-organizing image processing systems utilizing cellular automata,” in Proceedings of the 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops (ISORCW '11), pp. 162–171, Newport Beach, Calif, USA, March 2011.
[4]  J. A. Lorenzo, J. C. Pichel, T. F. Pena, M. Suarez, and F. F. Rivera, “Study of Performance Issues on a SMP-NUMA System using the Roofline Model,” in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA '11), Las Vegas, Nev, USA, 2011.
[5]  H. Jia, Y. Zhang, G. Long, J. Xu, S. Yan, and Y. Li, “GPURoofline: amodel for guiding performance optimizations on GPUs,” in Proceedings of the 18th International Conference on Parallel Processing (Euro-Par '12), pp. 920–932, 2012.
[6]  K. H. Kim, K. Kim, and Q. H. Park, “Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model,” Computer Physics Communications, vol. 182, no. 6, pp. 1201–1207, 2011.
[7]  C. Nugteren and H. Corporaal, “The boat hull model: adapting the roofline model to enable performance prediction for parallel computing,” in Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'12), pp. 291–292, February 2012.
[8]  M. Spierings and R. van de Voort, “Embedded platform selection based on the roofline model: applied to video content analysis,” 2012.
[9]  B. da Silva, A. Braeken, E. H. D'Hollander, A. Touhafi, J. G. Cornelis, and J. Lemeire, “Performance and toolchain of a combined GPU/FPGA desktop,” in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '13), p. 274, 2013.
[10]  J. Park, P. C. Diniz, and K. R. S. Shayee, “Performance and area modeling of complete FPGA designs in the presence of loop transformations,” IEEE Transactions on Computers, vol. 53, no. 11, pp. 1420–1435, 2004.
[11]  L. Deng, K. Sobti, Y. Zhang, and C. Chakrabarti, “Accurate area, time and power models for FPGA-based implementations,” Journal of Signal Processing Systems, vol. 63, no. 1, pp. 39–50, 2011.
[12]  B. Holland, K. Nagarajan, and A. D. George, “RAT: RC amenability test for rapid performance prediction,” ACM Transactions on Reconfigurable Technology and Systems, vol. 1, no. 4, article 22, 2009.
[13]  J. Curreri, S. Koehler, A. D. George, B. Holland, and R. Garcia, “Performance analysis framework for high-level language applications in reconfigurable computing,” ACM Transactions on Reconfigurable Technology and Systems, vol. 3, no. 1, article 5, 2010.
[14]  S. Skalicky, S. Lopez, M. Lukowiak, J. Letendre, and M. Ryan, “Performance modeling of pipelined linear algebra architectures on FPGAs,” in Reconfigurable Computing: Architectures, Tools and Applications, pp. 146–153, Springer, Berlin, Germany, 2013.
[15]  B. da Silva, A. Braeken, E. H. D'Hollander, and A. Touhafi, “Performance and resource modeling for FPGAs using high-level synthesis tools,” PARCO, 2013.
[16]  S. Williams, J. Carter, L. Oliker, J. Shalf, and K. Yelick, “Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms,” Journal of Parallel and Distributed Computing, vol. 69, no. 9, pp. 762–777, 2009.
[17]  J. Villarreal, A. Park, W. Najjar, and R. Halstead, “Designing modular hardware accelerators in C with ROCCC 2.0,” in Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM '10), pp. 127–134, Charlotte, NC, USA, May 2010.
[18]  Z. Guo, B. Buyukkurt, and W. A. Najjar, “Input data reuse in compiling window operations onto reconfigurable hardware,” in Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES '04), pp. 249–256, June 2004.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133