全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Transparent Runtime Migration of Loop-Based Traces of Processor Instructions to Reconfigurable Processing Units

DOI: 10.1155/2013/340316

Full-Text   Cite this paper   Add to My Lib

Abstract:

The ability to map instructions running in a microprocessor to a reconfigurable processing unit (RPU), acting as a coprocessor, enables the runtime acceleration of applications and ensures code and possibly performance portability. In this work, we focus on the mapping of loop-based instruction traces (called Megablocks) to RPUs. The proposed approach considers offline partitioning and mapping stages without ignoring their future runtime applicability. We present a toolchain that automatically extracts specific trace-based loops, called Megablocks, from MicroBlaze instruction traces and generates an RPU for executing those loops. Our hardware infrastructure is able to move loop execution from the microprocessor to the RPU transparently, at runtime, and without changing the executable binaries. The toolchain and the system are fully operational. Three FPGA implementations of the system, differing in the hardware interfaces used, were tested and evaluated with a set of 15 application kernels. Speedups ranging from 1.26 to 3.69 were achieved for the best alternative using a MicroBlaze processor with local memory. 1. Introduction The performance of an embedded application running on a general-purpose processor (GPP) can be enhanced by moving the computationally intensive parts to specialized hardware units and/or to Reconfigurable Processing Units (RPUs) acting as acceleration coprocessors of the GPP [1, 2]. This is a common practice in embedded systems. However, doing so, manually or automatically, usually implies a hardware/software partitioning step over the input source code [3]. This step is static, requires the source code of the application, and does not promote code and performance portability as the hardware/software components are obtained for a specific target architecture. Dynamic partitioning and mapping of computations (hereafter simply referred as dynamic partitioning) [4–6] is a promising technique able to move computations from an GPP to the coprocessor in a transparent and flexible way, and may become an important contribution for the future reconfigurable embedded computing systems. In this paper, we present a system which can automatically map loops, detected by running a MicroBlaze executable binary, to an RPU. We focus on a special kind of trace-based loop, named Megablock [7], and transform Megablocks into graph representations which are then used to generate Megablock-tailored RPUs. Megablocks are repeating patterns of elementary units of the trace (e.g., basic blocks) in the instruction stream of the program being executed. The RPU

References

[1]  J. Henkel, “Low power hardware/software partitioning approach for core-based embedded systems,” in Proceedings of the 36th Annual Design Automation Conference (DAC '99), pp. 122–127, June 1999.
[2]  L. Jó?wiak, N. Nedjah, and M. Figueroa, “Modern development methods and tools for embedded reconfigurable systems: a survey,” Integration, the VLSI Journal, vol. 43, no. 1, pp. 1–33, 2010.
[3]  T. Wiangtong, P. Y. K. Cheung, and W. Luk, “Hardware/software codesign,” IEEE Signal Processing Magazine, vol. 22, no. 3, pp. 14–22, 2005.
[4]  R. Lysecky and F. Vahid, “Design and implementation of a MicroBlaze-based warp processor,” Transactions on Embedded Computing Systems, vol. 8, no. 3, article 22, 2009.
[5]  N. Clark, J. Blome, M. Chu, S. Mahlke, S. Biles, and K. Flautner, “An architecture framework for transparent instruction set customization in embedded processors,” in Proceedings of the 32nd Interntional Symposium on Computer Architecture (ISCA '05), pp. 272–283, June 2005.
[6]  A. C. S. Beck, M. B. Rutzig, G. Gaydadjiev, and L. Carro, “Transparent reconfigurable acceleration for heterogeneous embedded applications,” in Proceedings of the Conference on Design, Automation and Test in Europe (DATE '08), pp. 1208–1213, Munich, Germany, March 2008.
[7]  J. Bispo and J. M. P. Cardoso, “On identifying and optimizing instruction sequences for dynamic compilation,” in Proceedings of the International Conference on Field-Programmable Technology (FPT '10), pp. 437–440, Beijing, China, December 2010.
[8]  J. Bispo, N. Paulino, J. M. P. Cardoso, and J. C. Ferreira, “From instruction traces to specialized reconfigurable arrays,” in Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig '11), pp. 386–391, Cancun, Mexico, 2011.
[9]  J. Bispo, N. Paulino, J. C. Ferreira, and J. M. P. Cardoso, “Transparent trace-based binary acceleration for reconfigurable HW/SW systems,” IEEE Transactions on Industrial Informatics. In Press.
[10]  R. Lysecky, G. Stitt, and F. Vahid, “Warp processors,” ACM Transactions on Design Automation of Electronic Systems, vol. 11, no. 3, pp. 659–681, 2006.
[11]  N. Clark, M. Kudlur, H. Park, S. Mahlke, and K. Flautner, “Application-specific processing on a general-purpose core via transparent instruction set customization,” in Proceedings of the 37th International Symposium on Microarchitecture (MICRO '04), pp. 30–40, Portland, Ore, USA, December 2004.
[12]  A. Mehdizadeh, B. Ghavami, M. S. Zamani, H. Pedram, and F. Mehdipour, “An efficient heterogeneous reconfigurable functional unit for an adaptive dynamic extensible processor,” in Proceedings of the IFIP International Conference on Very Large Scale Integration (VLSI-SoC '07), pp. 151–156, October 2007.
[13]  H. Noori, F. Mehdipour, K. Murakami, K. Inoue, and M. S. Zamani, “An architecture framework for an adaptive extensible processor,” Journal of Supercomputing, vol. 45, no. 3, pp. 313–340, 2008.
[14]  A. C. Beck, M. B. Rutzig, G. Gaydadjiev, and L. Carro, “Run-time adaptable architectures for heterogeneous behavior embedded systems,” in Proceedings of the 4th International Workshop Reconfigurable Computing: Architectures, Tools and Applications, pp. 111–124, 2008.
[15]  J. Bispo, Mapping runtime-detected loops from microprocessors to reconfigurable processing units [Ph.D. thesis], Instituto Superior Técnico, 2012.
[16]  P. Faes, P. Bertels, J. Van Campenhout, and D. Stroobandt, “Using method interception for hardware/software co-development,” Design Automation for Embedded Systems, vol. 13, no. 4, pp. 223–243, 2009.
[17]  S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, “Effective compiler support for predicated execution using the hyperblock,” in Proceedings of the 25th Annual International Symposium on Microarchitecture, pp. 45–54, IEEE Computer Society Press, December 1992.
[18]  J. V. Leeuwen, Handbook of Theoretical Computer Science: Algorithms and Complexity, MIT Press, 1990.
[19]  J. Bispo and J. M. P. Cardoso, “Synthesis of regular expressions for FPGAs,” International Journal of Electronics, vol. 95, no. 7, pp. 685–704, 2008.
[20]  H. P. Rosinger, “Connecting Customized IP to the MicroBlaze Soft Processor Using the Fast Simplex Link (FSL) Channel,” XAPP529 (v1. 3), Xilinx2004.
[21]  I. Xilinx, “Microblaze processor reference guide v13. 4,” reference manual, 2011.
[22]  I. Xilinx, “ChipScope pro 11. 1 software and cores user guide (v11. 1),” 2009.
[23]  I. Xilinx, “Microblaze software reference guide v2. 2,” reference manual, 2002.
[24]  Y. Kim, J. Lee, A. Shrivastava, and Y. Paek, “Memory access optimization in compilation for coarse-grained reconfigurable architectures,” ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 16, p. 42, 2011.
[25]  S. J. Patel and S. S. Lumetta, “rePLay: a hardware framework for dynamic optimization,” IEEE Transactions on Computers, vol. 50, no. 6, pp. 590–608, 2001.

Full-Text

Contact Us

[email protected]

QQ:3279437679

WhatsApp +8615387084133